Fed Up with News Apps, I Added Some AI

Note: This article involves Tasker, AI, front-end development, and automation. It’s a bit technical.

Background

I’m all about avoiding low-value information. I usually follow specific channels for my interests, but I also need a way to catch major events in other fields, to avoid getting stuck in an echo chamber.

I used to listen to the radio while driving my family around, to get the news. The info fell into two categories:

Useless: Sports, entertainment, and military news (often unreliable or biased).
Potentially useful, but I had to listen to find out: Social news, trends, and tech-related social phenomena. Of course, much of it was fluff, like a celebrity hit-and-run.

During the Paris Olympics, my news time was swamped with Olympics coverage. I had to keep glancing at my car’s screen to skip stories, which was unsafe and annoying.

I’ve tried many news apps with audio. The headlines channels were full of uninteresting stuff. Subscribing to specific channels meant long, in-depth reports – not ideal for a short drive. Update frequencies also varied wildly; some channels would dominate, effectively silencing others.

Then it hit me: I can usually tell if a story is interesting just from the headline. Why not use AI for this? Could I filter out unwanted stories from a headlines channel?

The idea stuck.

Implementation

It wasn’t technically difficult, but I couldn’t find anything like it. Maybe it’s too niche, so I built it myself!

My phone was the obvious choice, since that’s where I listen to news. This avoids relying on other devices. What if I’m on vacation? Luckily, I’m familiar with Tasker, an Android app that’s essentially programming software.

Here’s the process:

Fetch the day’s top news.
Use AI to categorize headlines.
Filter out unwanted categories, saving the rest as text.
Convert the text to audio.
Automate this to run nightly.
Create a playlist for the audio news.
Auto-start the player when connected to my car’s Bluetooth.
Clear old news daily.

Building Blocks

This sounds complex, but I didn’t have to reinvent the wheel. I just needed to integrate existing tools. I created small modules (subtasks) for the core functions, ready for assembly.

Tasker Intro

Tasker is the backbone. It’s an automation tool that lets you combine hardware control, math, file operations, network requests, and logic into workflows. Think iPhone Shortcuts, but much more powerful – it’s programming software.

Basic usage is simple: mute the phone on company Wi-Fi, or start music on Bluetooth connection. More advanced uses, like file operations and network requests, require programming logic, but no actual coding.

Fetching Content

The first subtask browses the news source.

Input: News source link
Output: Code with the news list

It uses Tasker’s HTTP request. I just passed the info to the outer task. Wrapping it in a layer relates to subtask execution priority, which I’ll explain later.

Parsing XML

RSS news feeds provide XML, not directly readable news.

RSS is standardized. Each news item is an “item,” with “title,” “link,” and “description” tags.

Before parsing, I standardized the XML. Webpages sometimes use escaped characters (e.g., < as <). This subtask converts them back.

Input: XML with escaped characters
Output: Standard XML

Next, parsing. This subtask extracts content from specific XML tags, separating them with |||.

Input: Full XML, tag to extract
Output: All content within that tag

I use it to find all “item” tags (the news list). The outer task passes “item” as %par2, getting all news items separated by |||.

Extracting Content from HTML

The previous subtask gets the news list, but only the title and link are really useful. “Description” varies; some sources include the full text, others just a summary, with the full text on a details page.

This subtask extracts content from a page’s HTML, removing menus, comments, ads, etc.

Input: Full HTML, tag to extract
Output: First content within that tag

It’s complex because of nested HTML tags. It finds the tag’s end to define the content range, using string manipulation to mimic Javascript’s innerHTML.

The result is still HTML, so another subtask converts it to plain text – a built-in Tasker feature.

Input: HTML code
Output: Text content

AI Classification

This is the core: the program’s brain.

Input: Content for AI, AI model name
Output: AI response

Groq’s API is great, offering many open-source AI models. It’s simple: send text, get generated text back. The 2-second wait is due to the API’s 30 calls/minute limit.

Text to Speech

This subtask converts text files to audio in batches.

Input: Text file directory, audio output directory
Output: Batch of audio files

It uses Tasker’s “Say To File,” saving text as audio. “Say To File” is just the operation; the speech synthesis engine isn’t built-in.

I used Google’s local engine. Download the app from Google Play, and Tasker can use it.

The local engine is comparable to map software’s default voice. Google’s is decent, better than iFlytek’s, but still robotic.

Putting the Pieces Together

Now that we have our tools, and most of the hard parts are solved, let’s assemble everything.

Downloading and Filtering News

First, we’ll build the core task: downloading news from a single source, filtering it, and saving it as text files. This is the heart of the process.

Input: News source URL, HTML tag containing the article body
Output: News text files

I added a shortcut for the second input. If you enter <description>, it uses the description from the XML instead of fetching the article’s detail page. This works best with high-quality news sources, and you can set it in the parent task.

We fetch the full XML, clean up escaped characters, and remove some special content tags. Then, we extract the news list.

The news list is split into an array. We set up the AI prompt and a maximum article length (to avoid overly long articles). Then, we loop through each news item, read and convert the title to plain text, and send it to the AI for categorization.

Here’s the AI prompt. I kept it simple, just telling it what to do. Groq’s Gemma2 9b model works well for Chinese text, better than Llama3. A small open-source model is perfect for this, and it hasn’t made any mistakes.

We filter out sports, entertainment, and military news based on the AI’s categorization. Then, we get the news detail page link, fetch the full HTML, clean it up, and extract the content using the specified HTML tag.

We convert the article body from HTML to text, check its length, and filter out anything too long or short (likely image-based news). The remaining articles are saved as text files.

Priority Issues

During debugging, I couldn’t get content consistently. It took a while to realize the subtasks were running in parallel.

Tasker’s core feature, “Perform Task,” runs a subtask within the current task, passing data and receiving results.

It’s like function calls in programming. Tasker limits you to two parameters, but you can combine multiple parameters into a string using a separator, then split them in the subtask. This allows for any number of parameters. This nesting lets you build complex logic, making “Perform Task” a key programming feature in Tasker.

The “Perform Task” documentation mentions execution order. The parent task doesn’t wait for a triggered subtask to finish before continuing. Many of my subtasks fetch content or loop through page code, which takes time. If the parent task proceeds before the subtask returns a result, things break.

Following the documentation, I set the subtask’s Priority to %priority+1 (one higher than the parent). This forces the parent task to wait.

Downloading News from Multiple Sources

That was a complex task! Now, let’s use it.

I pass my RSS feeds and article body locations to the core task. It runs for each source.

Then, I created a separate task for batch conversion to speech, specifying the input (text news) and output (audio news) directories.

Scheduled Downloads and Conversion

These are the tasks, but how do they run? On Tasker’s Profiles page, you can add triggers for your tasks.

Every day at 4 AM, save all news as text files (takes 5-10 minutes).

Every day at 5 AM, convert the text news to audio.

The Final Result

When I wake up, there are two folders in the News directory.

text contains the text versions, which I can share.

audio contains the audio news. Some local news still gets in, but the AI is doing its job filtering out sports.

I created a “Daily News” playlist in my music player to read the audio folder.

Updating the content brings in the day’s news. I still have to update it manually, but I’m working on automating that.

Playback is automatic. My car’s Bluetooth connection opens the player, and I use AIMP player, which auto-plays on open. No interaction needed.

Finally, a task clears the news folders at 3 AM daily, preparing for the next cycle.

Epilogue

My homemade news program has been working great for a few days. I can drive without distraction. The robotic voice is the only minor issue. I might replace “Say To File” with a better TTS API later.

This process solved a problem and gave me reusable subtasks. The subtasks for fetching content, parsing XML, extracting HTML, and querying AI are generic. I can now build other programs, create web scrapers, and even AI agents on my phone. Mobile scraping is great: no server costs, and it runs 24/7. I’ll explore it further as needed.