Economical statistics project
- Parsed all news from 2017-01-01, got all companies from the news, calculated sentiment score of news and word count
- Parsed 1 minute and 1 hour canles history from MOEX for all companies since 2017-01-01
- Calculated Linear Regression Coefficients (Prediction True % from 48.2% to 59.2%) and P-values
- All parsed data and sentiment scores in CSV format: CSV
- Simple Flash Backend for data visualization (pvalues + linreg coefficient done)
- News update every 2 minutes and price update every 5 minutes
- Real-Time analysis
- Folder "python/history_parsing"
- finam_news_parser.py + moex_history_parser.py (News + MOEX candle history)
- sentiment_analysis.py (Add sentiment scores to news)
- companies_extractor.py (Add extracted companies to news)
- linreg_params_final.py (Calculate all historical params for linreg calculation)
- linreg_coef_final.py (Calculate linreg coefficients + pvalues) -> companies.pickle
- history_{company_ticker}_1 - 1 minutes candle history
- history_{company_ticker}_24 - 1 day candle history
- all_parsed_news.csv - all parsed news w/o sent scores and companies
- all_parsed_news_with_sentiment_scores.csv - all parsed news with sentiment scores and w/o companies
- news_with_one_company.csv - all parsed news where EXACTLY ONE company exists
- news_with_one_or_more_company.csv - all parsed news where ONE OR MORE companies exist
- parsed_{company_ticker} - Calculated historical parameters for linreg
- companies.pickle - dump of companies list, containing ticker, name, linreg coefs and pvalues
To run full stack app with mysql database, visualization and real-time news and prices parsing (Take about 2.5GB space):
docker-compose up -d