List of components to scrape from news-websites

List of features that should be extracted when scraping news-websites:

Title
Content
URL
'tags', if any, offered by the article
Think of what to do for creating the indexes:
- Suggestion (Aryaman): Can process the summary of the article using standard tools from NLTK (like removal of stop words etcetera) and directly store that for an article (since the summaries provided by most websites are really concise)

Edited Jul 11, 2021 by Aryaman Jeendgar