List of components to scrape from news-websites
List of features that should be extracted when scraping news-websites:
- Title
- Content
- URL
- 'tags', if any, offered by the article
- Think of what to do for creating the indexes:
- Suggestion (Aryaman): Can process the summary of the article using standard tools from NLTK (like removal of stop words etcetera) and directly store that for an article (since the summaries provided by most websites are really concise)