Published March 29, 2025 | Version v1
Journal article Open

Advanced News Aggregation and Content Generation Using LLMs and NLP Algorithms

  • 1. Eagle Creek Software Services, Vermillion, South Dakota, USA
  • 2. Institute of Engineering, Tribhuvan University, Kathmandu, Nepal
  • 3. Jain University, Jayanagar, Bangalore, India
  • 4. Madan Bhandari Memorial College, Tribhuvan University, Kathmandu, Nepal

Description

The exponential growth of digital information has created a unique opportunity in the media industry to leverage Large Language Models (LLMs) for news aggregation and content generation. This paper explores the application of LLMs and Natural Language Processing (NLP) for the automation of news aggregation and content creation. This study presents a system designed to automatically fetch news headlines and articles from various news portals, with an initial implementation involving four distinct sources. Utilizing a combination of advanced natural language processing techniques, including OpenAI text-embedding-3-large, UMAP, HDBSCAN, and cosine similarity, the system clusters similar headlines to identify overlapping news topics. It then generates a unique, paraphrased headline and relevant content by synthesizing information from all similar news articles, leveraging the gpt-4o-2024-08-06. After generating the new headlines and content, the system takes it a step further by automatically deploying the curated articles to an autonomous news portal. This seamless integration into the news publishing workflow ensures that the system operates entirely autonomously, without the need for human intervention. The AI-based news portal, powered by this system, continuously updates with fresh and unique content, offering users timely and comprehensive news coverage. By automating the entire process—from data collection and clustering to content creation and publication—this approach transforms traditional news aggregation into a fully autonomous, AI-driven news platform, effectively reducing redundancy and enhancing the reader's experience in a fast-paced information environment.

Files

229-Article Text-410-1-10-20250329.pdf

Files (413.3 kB)

Name Size Download all
md5:3016d7ea7deba5312211bd08a887b27f
413.3 kB Preview Download