Review on Victor42

Can AI Make PPTs Independently Now

hi@victor42.work (Victor42) — Fri, 23 May 2025 15:46:00 +0000

I ran an interesting test for AI agents: to create a presentation on the history of Earth’s geological eras, complete with text and images.

The task involved action planning, information gathering, content organization, layout design, and file format conversion. This allowed for an assessment of current AI agent capabilities, their practical usability, and potential bottlenecks.

I tested four AI agent products: Skywork, Coze Space, Manus, and Lovart. Here’s how they performed: 👇

Skywork

https://skywork.ai/

Skywork had the highest completion rate, being the only tool that successfully outputted a PPT file.

See the full result here: https://tiangong.cn/share/v2/ppt/1925788478895357952?dataType=outfile&outputId=1925788478895357952&outputType=gen_ppt&projectId=1925782838113832960&sharingId=1925797872445526016

Upon receiving the task, Skywork initiated a scope confirmation process. I provided as much detail as possible, and its final output was the most comprehensive among the agents tested.

Next, it planned by creating a task list, which it referred to throughout the execution.

The execution was lengthy, primarily involving searching and browsing. Here’s an excerpt:

After gathering sufficient information, it first drafted a PPT outline.

The final PPT generation involved creating about a dozen web pages, which were then displayed together.

The conversion to PPT slides and merging into a single file only occurred during download, making the process lengthy. Downloading the HTML format resulted in a folder containing these separate web pages.

However, the resulting PPT file wasn’t very practical. Due to inconsistent page dimensions during generation, each slide varied slightly in size, often leaving blank space at the bottom.

Additionally, minor layout errors from the web page generation phase meant the final result wasn’t perfect.

However, it required minimal manual adjustment, indicating considerable potential.

Coze Space

https://space.coze.cn/

Coze Space couldn’t directly generate PPTs, providing a document instead. However, since the format wasn’t critical, this was still considered a task completion.

See the full result here: https://space.coze.cn/s/bSmamok4LFg/

Its execution process was simpler but followed a similar pattern: planning, data gathering, sourcing web images, and content integration.

I specifically enabled two extensions for Coze Space—Feishu Docs and an image generation tool—to see if it would utilize them. It used neither. The report wasn’t written to Feishu Docs, nor were images generated and inserted. This was expected, as I hadn’t explicitly instructed it to use them. Besides, for this kind of report, web images are preferable to generated ones; aesthetics weren’t the priority.

Manus

https://manus.im/

Manus provided a text-only PDF, essentially failing the task.

See the full result here: https://manus.im/share/DdcDQMgzQ59pWvI2akPuiD?replay=1

Its execution process was logical, however.

Although there wasn’t a distinct planning step, the final file included a to-do list, suggesting an underlying plan.

It searched for images during execution but saved very few, with none saved successfully.

This resulted in a plain text report.

Lovart

https://www.lovart.ai/

This agent focuses on design, serving a different purpose. I included it for comparison to see its output.

See the full result here: https://www.lovart.ai/r/62cce51

Design-focused agents operate differently; Lovart treated this task as creating an infographic.

It began by seeking visual inspiration while gathering information on geological eras.

Its execution plan was roughly: organize information, generate four images for four geological eons, and then design the layout.

It produced a long, webpage-style image and marked the task as complete.

Thoughts

The subject of this test, geological history, involves readily accessible information that doesn’t demand complex reasoning. I briefly reviewed the details and found the information from each agent largely accurate, so I didn’t perform an in-depth check. My primary aim was to evaluate their effectiveness in science communication and their capacity to translate specialized knowledge into formats easily digestible by the public.

Different AI agent products possess distinct ‘DNA’ and employ varied approaches. Whether they prioritize content or presentation, neither approach is inherently superior or inferior. This helps identify their respective strengths; when used judiciously, they can effectively address specific problems.

Notably, Skywork and Lovart surpassed basic document generation, employing technical methods to enhance content presentation. This capability isn’t exclusive to AI agent tools. AI design expert 歸藏 (Guīcáng) demonstrated similar AI design capabilities using prompts long ago. In other words, the core of an agent’s design ability still lies in the prompt.

For those less skilled in prompt engineering, AI agent tools offer a viable alternative, significantly lowering the entry barrier. However, for more customized content presentation, carefully crafted prompts in general AI tools can achieve this, though it necessitates a separate information-gathering step.

Finally, to answer the initial question: Can AI independently create PPTs now?

If this means creating a usable PPT file with reliable and substantial content, then the answer is no.

However, if you can ensure content quality yourself, and AI’s role is merely to convert that content into a more digestible visual format (not necessarily PPT files), then the answer is yes.

AI Search Got You Stuck?

hi@victor42.work (Victor42) — Wed, 26 Feb 2025 12:14:00 +0000

I’ve been looking for a truly reliable AI search tool. I figured I’d just test it with some representative questions. Turns out, I was in over my head.

Starting with Real-Life Problems

I’ve used some real-world questions as test cases, and most AI search tools fall short.

It’s not that the questions are hard. The trick is how the AI goes about searching and pulling out the answer.

What kind of fish is “Guyanyu”?

It’s a type of edible flatfish. This is a trick question – “Guyanyu” is a colloquial, shortened name used in fish markets, not the scientific one.

AI finds useful info, but also noise:

Some only consider “Guyanyu,” ignoring homophones, leading to wrong answers like “spotted shad.”
Others consider the homophone “Guyanyu,” but mistake it for the spotted shad.

Reasoning models see they’re different, but can’t tell which one you’re asking about, so they list both.

Sometimes, because there’s so much more info on “Guyanyu” (the flatfish), the AI jumps to a conclusion and gets it right, accidentally.

There’s not much of a direct link, but there’s a two-step indirect one: his daughter, Liu Qing, and Didi (which she leads), which owns Qingju Bike.

I didn’t know their relationship when I asked. I wanted the most significant chain of influence, not the most direct.

Non-reasoning models focus on Legend Capital’s investment in ofo, mostly ignoring Liu Qing.

Reasoning models are smarter, recognizing Liu Qing’s importance, but stop at Didi. They assume Didi is just ride-hailing and don’t dig into Didi’s connection with Qingju, often concluding: The Liu family has a big impact on transportation, but little direct connection to bike-sharing.

Hangzhou was called Lin’an in ancient times. Why did this name get “given” to Lin’an District today?

This used to confuse me. It wasn’t “given.” I had it backwards. Lin’an County came first, then the Southern Song Lin’an Prefecture. The Southern Song might have been inspired by the county, but they were different places. The Southern Song capital was in downtown Hangzhou, not Lin’an. After the Song Dynasty fell, Lin’an Prefecture went back to being Hangzhou, while Lin’an County stayed Lin’an. Later, Lin’an became a district of Hangzhou.

Because the question itself is misleading, non-reasoning models mostly go with the incorrect assumption, talking about commemorating history or the glory of the Southern Song.

Reasoning models do well here, mostly getting it right. They spot the chronological order and point out the flawed “given to” phrasing.

What was the highest throughput of Shanghai’s port during the colonial period? How did it compare to the largest ports at the time?

I was just curious. I still don’t know, but I found that most AI search tools can’t answer this.

A somewhat reliable source is the Shanghai Port Chronicle on Baidu Baike, mentioning 14 million tons before the Second Sino-Japanese War, ranking 7th globally.

Data for other ports is either unavailable or made up by the AI. Some less intelligent AIs with big search volumes found some useful data (at least with references).

These are all real problems. I had tons of questions. I was a “walking encyclopedia” as a kid, and many quick searches turned up nothing. This made me doubt AI search.

Not All Problems Are Created Equal

AI search is a mixed bag. Some do well on certain questions, others don’t. I started looking for patterns: How can I tell which AI is good at what? And how should I pick an AI search product?

First, reasoning models are generally better, but not all are smart enough. Gemini 2.0 Flash and Kimi K1.5 aren’t great. In my tests, Gemini 2.0 Flash couldn’t answer these, but R1 could.

Search method matters, too.

Interestingly, Grok 3 has strong reasoning, even without “Think,” but can’t answer the “Guyanyu” question. Looking at its searches, I get why. It might be forcing a translation. With a weird Chinese name like “Guyanyu,” it mistranslates, doesn’t search for the shad or flatfish, and probably searches for things like “ancient” and “eye” separately. It finds nothing useful and makes stuff up.

Search volume is also key.

which country does Windsurf IDE come from?

It’s from the US. I thought, “easy.” Foreign AI search did great, even finding Mountain View, California. I tested domestic ones. Kimi and Yuewen can search English, so I asked in English. Finding the US was easy, but not the city.

But it’s not that simple. Which article on Windsurf IDE would mention the city? At most, they’d say the country. To get the full answer, the AI needs to find Codium (the company behind it), then find the city from Codium’s site, job postings, or Product Hunt. That takes reasoning and multi-step searching!

This made me realize: questions we find easy can be tough for AI. It’s not that AI is dumb; we underestimate the complexity.

Even with a search engine, finding Windsurf IDE’s country is easy, but the city isn’t a one-search deal.

So, I came up with a rough way to evaluate AI search: four quadrants based on AI ability and search ability:

I underestimated the “Guyanyu,” Liu Chuanzhi/bike-sharing, and Lin’an questions. I thought they were type D, but they’re type B. The Shanghai port question is a trickier type A.

Mistaking type A for C, and B for D, leads to disappointment.

The biggest problem? We don’t know the category when we ask, and we often underestimate the difficulty.

But AI search is a tool, and tools should serve us, right? It’s not doing a great job yet, and that’s not our fault; it’s on them to improve.

To reliably answer type B, agents like Grok 3 Deep Search and OpenAI Deep Research are crucial. They need multi-step searches, deep dives into relationships, source reliability checks, and conflicting info evaluation.

Making the Most of AI Search

Deep search for everything is too slow.

As someone in the AI community said: Since we can’t make AI accommodate humans yet, let humans accommodate AI.

Use Multiple Products Simultaneously

To save time and get decent answers, ditch the “one-tool-fits-all” idea. Think a bit about which quadrant a question likely falls into. Each has reliable AI search products; choose accordingly.

It takes more thought, but saves time. Your call.

Let’s go backwards. Type D is easiest; any AI search tool works.

Type C needs a lot of searching, but no reasoning. If the webpage exists, the answer is there. Example:

which country does Windsurf IDE come from?

Kimi does well on these. Products with 50 search entries are also good. Consider long-tail knowledge as this type.

Type B has two scenarios:

The answer’s there, but with lots of conflicting noise.
The answer’s not in the core search results, but is abundant in incidentally searched terms. My earlier questions are examples.

These need strong reasoning models, like R1, Grok 3 Think, or O3 Mini. Search capability isn’t as crucial; a dozen or two dozen sources are enough. Type B is easily mistaken for D. If answers are bad, realize this.

Finally, type A. I’m not sure any current AI search can handle these reliably. Info is scarce. You’ll probably have to sift through search engines manually. If you want to try AI, use deep search/research.

Give Up on One-Shot Answers

The goal is to solve problems. Don’t expect a perfect answer in one go. Let that go, and you’ll find more options.

Back to:

which country does Windsurf IDE come from?

If the first question doesn’t give the city, ask:

which city?

For reasoning models, the odds of success go way up. Use multi-turn dialogue; you’d do the same with a search engine.

For tricky type A questions, like I said, accommodate the AI.

Ask in different ways, skim the sources, and judge usefulness by titles. Put useful ones in a knowledge base, and use AI to RAG it for the answer. Tools include NotebookLM, Tencent’s iMa, Perplexity, and AI clients like Cherry Studio.

Pay Attention to Language Differences

Language matters. An AI limited to Chinese can’t answer English-world nuances; foreign AI can’t answer questions about your local school’s enrollment plan.

A test:

wildfire trends in CA in the last 10 years

Ask about something abroad in English. If most results are Chinese webpages, it can’t search English well and is only good for Chinese topics.

Most domestic products have R1, so reasoning is good. Choosing a Chinese-world AI search is easy: find one with a large search volume.

If you need English and foreign info, foreign products are best. If that’s inconvenient, test domestic products with English questions.

Finally, models and products mentioned are time-sensitive (February 2025). Conclusions might change, but the factors for understanding and evaluating AI search remain useful.

Review on Victor42

Can AI Make PPTs Independently Now

Skywork

Coze Space

Manus

Lovart

Thoughts

AI Search Got You Stuck?

Starting with Real-Life Problems

What kind of fish is “Guyanyu”?

What’s the relationship between Liu Chuanzhi and bike-sharing?

Hangzhou was called Lin’an in ancient times. Why did this name get “given” to Lin’an District today?

What was the highest throughput of Shanghai’s port during the colonial period? How did it compare to the largest ports at the time?

Not All Problems Are Created Equal

Making the Most of AI Search

Use Multiple Products Simultaneously

Give Up on One-Shot Answers

Pay Attention to Language Differences