🚀 Article Data Processing

This is a really old script from IBM Watson times I'm trying to keep afresh :)

🌟 So, what it does?

🎯 Efficient Processing: Tokenize and filter articles (now a little bit faster)
🧠 Pre-trained Embeddings:
🔮 Data Augment: Expand your dataset
💾 Storage: I/O HDF5
🛠 Customizable: Hopefully! ;)

🚀 Quick Start

Clone the repo:

git clone https://github.com/0101011/analitika.git

Install dependencies:
```
pip install -r requirements.txt
```
Run the script:
```
python analitika.py
```

🎮 Usage

Place your raw_data.json in the data/ directory
(Optional) Add pre-trained embeddings to data/
Run the script:
```
python analitika.py
```
Find processed data in data/ as HDF5 and pickle files

⚙ Configuration

Customize the script by modifying these variables:

WHITELIST: Allowed characters
VOCAB_SIZE: Maximum vocabulary size
limit: Length constraints for articles

🤝 Contributing

Here are some ways you can contribute:

💡 My goal was to develop a package or CLI tool out of it. Maybe we'll come up with something.

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙌 Acknowledgements

NLTK for natural language processing
Gensim for word embeddings
HDF5 for Python for efficient data storage

Made with ❤️ by [Your Name]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 Article Data Processing

🌟 So, what it does?

🚀 Quick Start

📚 Table of Contents

🎮 Usage

⚙ Configuration

🤝 Contributing

📜 License

🙌 Acknowledgements

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
analitika.py		analitika.py
requirements.txt		requirements.txt
run_demo.py		run_demo.py

License

0101011/analitika

Folders and files

Latest commit

History

Repository files navigation

🚀 Article Data Processing

🌟 So, what it does?

🚀 Quick Start

📚 Table of Contents

🎮 Usage

⚙ Configuration

🤝 Contributing

📜 License

🙌 Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages