<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Review on Victor42</title><link>https://victor42.eth.limo/tags/review/</link><description>Recent content in Review on Victor42</description><generator>Hugo -- gohugo.io</generator><language>en</language><managingEditor>hi@victor42.work (Victor42)</managingEditor><webMaster>hi@victor42.work (Victor42)</webMaster><lastBuildDate>Fri, 23 May 2025 15:46:00 +0000</lastBuildDate><atom:link href="https://victor42.eth.limo/tags/review/index.xml" rel="self" type="application/rss+xml"/><item><title>Can AI Make PPTs Independently Now</title><link>https://victor42.eth.limo/post-en/ai-generated-ppt/</link><pubDate>Fri, 23 May 2025 15:46:00 +0000</pubDate><author>hi@victor42.work (Victor42)</author><guid>https://victor42.eth.limo/post-en/ai-generated-ppt/</guid><description>&lt;img src="https://cdn.victor42.work/posts/2025-05/07cf2ceb0b1574f2e3c69b2887632c9b.webp" alt="Featured image of post Can AI Make PPTs Independently Now" /&gt;&lt;p&gt;I ran an interesting test for AI agents: to create a presentation on the history of Earth&amp;rsquo;s geological eras, complete with text and images.&lt;/p&gt;
&lt;p&gt;The task involved action planning, information gathering, content organization, layout design, and file format conversion. This allowed for an assessment of current AI agent capabilities, their practical usability, and potential bottlenecks.&lt;/p&gt;
&lt;p&gt;I tested four AI agent products: Skywork, Coze Space, Manus, and Lovart. Here&amp;rsquo;s how they performed: 👇&lt;/p&gt;
&lt;h2 id="skywork"&gt;Skywork
&lt;/h2&gt;&lt;p&gt;&lt;a class="link" href="https://skywork.ai/" target="_blank" rel="noopener"
&gt;https://skywork.ai/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Skywork had the highest completion rate, being the only tool that successfully outputted a PPT file.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://cdn.victor42.work/posts/2025-05/07cf2ceb0b1574f2e3c69b2887632c9c.webp"
loading="lazy"
alt="The title and table of contents slides of the Earth geological eons presentation generated by Tiangong"
&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src="https://cdn.victor42.work/posts/2025-05/2d429bf55bb6b3a3733e63033500e005.webp"
loading="lazy"
alt="Slides detailing Paleozoic biodiversity and Mesozoic geological features generated by Tiangong"
&gt;&lt;/p&gt;
&lt;p&gt;See the full result here: &lt;a class="link" href="https://tiangong.cn/share/v2/ppt/1925788478895357952?dataType=outfile&amp;amp;outputId=1925788478895357952&amp;amp;outputType=gen_ppt&amp;amp;projectId=1925782838113832960&amp;amp;sharingId=1925797872445526016" target="_blank" rel="noopener"
&gt;https://tiangong.cn/share/v2/ppt/1925788478895357952?dataType=outfile&amp;amp;outputId=1925788478895357952&amp;amp;outputType=gen_ppt&amp;amp;projectId=1925782838113832960&amp;amp;sharingId=1925797872445526016&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src="https://cdn.victor42.work/posts/2025-05/fc8204dba40ddd291deccb7fb6dbffb2.webp"
loading="lazy"
alt="The requirements confirmation form in the Tiangong user interface for the presentation report"
&gt;&lt;/p&gt;
&lt;p&gt;Upon receiving the task, Skywork initiated a scope confirmation process. I provided as much detail as possible, and its final output was the most comprehensive among the agents tested.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://cdn.victor42.work/posts/2025-05/412c047fd18bf39596ffb101fe0d01f3.webp"
loading="lazy"
alt="The to-do list generated by Tiangong for collecting data and creating the presentation"
&gt;&lt;/p&gt;
&lt;p&gt;Next, it planned by creating a task list, which it referred to throughout the execution.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://cdn.victor42.work/posts/2025-05/14e6822b21eb2d2bf9a2446faa3ae9f3.webp"
loading="lazy"
alt="Tiangong calling web search and browser tools to gather geological information"
&gt;&lt;/p&gt;
&lt;p&gt;The execution was lengthy, primarily involving searching and browsing. Here&amp;rsquo;s an excerpt:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://cdn.victor42.work/posts/2025-05/1512322d1022a65174fe5c735b7ddda4.webp"
loading="lazy"
alt="The structured outline generated and confirmed by Tiangong before creating the slides"
&gt;&lt;/p&gt;
&lt;p&gt;After gathering sufficient information, it first drafted a PPT outline.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://cdn.victor42.work/posts/2025-05/999323fe9dbe801fabe443853e1ab34a.webp"
loading="lazy"
alt="Tiangong interface displaying the completed task output in HTML format"
&gt;&lt;/p&gt;
&lt;p&gt;The final PPT generation involved creating about a dozen web pages, which were then displayed together.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://cdn.victor42.work/posts/2025-05/fa8b4512ddcd260e54f53796c3c7d607.webp"
loading="lazy"
alt="The file download menu offering formats including presentation slides, PDF, and HTML"
&gt;&lt;/p&gt;
&lt;p&gt;The conversion to PPT slides and merging into a single file only occurred during download, making the process lengthy. Downloading the HTML format resulted in a folder containing these separate web pages.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://cdn.victor42.work/posts/2025-05/e3caa031828f83557dae4dade1ab2f6f.webp"
loading="lazy"
alt="The extracted folder on local storage containing multiple separate HTML files of the presentation"
&gt;&lt;/p&gt;
&lt;p&gt;However, the resulting PPT file wasn&amp;rsquo;t very practical. Due to inconsistent page dimensions during generation, each slide varied slightly in size, often leaving blank space at the bottom.&lt;/p&gt;
&lt;p&gt;Additionally, minor layout errors from the web page generation phase meant the final result wasn&amp;rsquo;t perfect.&lt;/p&gt;
&lt;p&gt;However, it required minimal manual adjustment, indicating considerable potential.&lt;/p&gt;
&lt;h2 id="coze-space"&gt;Coze Space
&lt;/h2&gt;&lt;p&gt;&lt;a class="link" href="https://space.coze.cn/" target="_blank" rel="noopener"
&gt;https://space.coze.cn/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Coze Space couldn&amp;rsquo;t directly generate PPTs, providing a document instead. However, since the format wasn&amp;rsquo;t critical, this was still considered a task completion.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://cdn.victor42.work/posts/2025-05/f1d552d2a069e73f168d53363b143d75.webp"
loading="lazy"
alt="The title and introduction sections of the geological history markdown report in Coze Space"
&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src="https://cdn.victor42.work/posts/2025-05/b480aa0644716122a8a1ee4e17c75aba.webp"
loading="lazy"
alt="Prehistoric fish illustrations embedded in the geological eons markdown report in Coze Space"
&gt;&lt;/p&gt;
&lt;p&gt;See the full result here: &lt;a class="link" href="https://space.coze.cn/s/bSmamok4LFg/" target="_blank" rel="noopener"
&gt;https://space.coze.cn/s/bSmamok4LFg/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src="https://cdn.victor42.work/posts/2025-05/8026407bd8e9265b92fdd565524cd0f0.webp"
loading="lazy"
alt="The initial thinking process and task planning steps displayed in Coze Space interface"
&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src="https://cdn.victor42.work/posts/2025-05/8d42f164e6b26e94c4c5e234fffc7c1d.webp"
loading="lazy"
alt="Coze Space calling image search and file saving tools to complete the markdown report"
&gt;&lt;/p&gt;
&lt;p&gt;Its execution process was simpler but followed a similar pattern: planning, data gathering, sourcing web images, and content integration.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://cdn.victor42.work/posts/2025-05/5fef56ffc2dc7b28742139ce8646184c.webp"
loading="lazy"
alt="The add extensions menu in Coze Space showing the checked image generator tool"
&gt;&lt;/p&gt;
&lt;p&gt;I specifically enabled two extensions for Coze Space—Feishu Docs and an image generation tool—to see if it would utilize them. It used neither. The report wasn&amp;rsquo;t written to Feishu Docs, nor were images generated and inserted. This was expected, as I hadn&amp;rsquo;t explicitly instructed it to use them. Besides, for this kind of report, web images are preferable to generated ones; aesthetics weren&amp;rsquo;t the priority.&lt;/p&gt;
&lt;h2 id="manus"&gt;Manus
&lt;/h2&gt;&lt;p&gt;&lt;a class="link" href="https://manus.im/" target="_blank" rel="noopener"
&gt;https://manus.im/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Manus provided a text-only PDF, essentially failing the task.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://cdn.victor42.work/posts/2025-05/966c7cbfbb511b90ea2a844fdb759589.webp"
loading="lazy"
alt="The plain text PDF report about geological eons generated by Manus"
&gt;&lt;/p&gt;
&lt;p&gt;See the full result here: &lt;a class="link" href="https://manus.im/share/DdcDQMgzQ59pWvI2akPuiD?replay=1" target="_blank" rel="noopener"
&gt;https://manus.im/share/DdcDQMgzQ59pWvI2akPuiD?replay=1&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src="https://cdn.victor42.work/posts/2025-05/92df60135a908ccbd177f011bb25b01f.webp"
loading="lazy"
alt="Manus initializing the workspace directory and searching encyclopedias for geological facts"
&gt;&lt;/p&gt;
&lt;p&gt;Its execution process was logical, however.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://cdn.victor42.work/posts/2025-05/fcb8e048de923c536c1afde38ededdbf.webp"
loading="lazy"
alt="The task to-do list file generated by Manus in its local workspace directory"
&gt;&lt;/p&gt;
&lt;p&gt;Although there wasn&amp;rsquo;t a distinct planning step, the final file included a to-do list, suggesting an underlying plan.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://cdn.victor42.work/posts/2025-05/58f5cdd8a07f6a51595c641eb1a1bed9.webp"
loading="lazy"
alt="Manus browsing and attempting to save the international chronostratigraphic chart PDF"
&gt;&lt;/p&gt;
&lt;p&gt;It searched for images during execution but saved very few, with none saved successfully.&lt;/p&gt;
&lt;p&gt;This resulted in a plain text report.&lt;/p&gt;
&lt;h2 id="lovart"&gt;Lovart
&lt;/h2&gt;&lt;p&gt;&lt;a class="link" href="https://www.lovart.ai/" target="_blank" rel="noopener"
&gt;https://www.lovart.ai/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This agent focuses on design, serving a different purpose. I included it for comparison to see its output.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://cdn.victor42.work/posts/2025-05/fb9e4ed8ae1530d3d56eb45fc2b90d15.webp"
loading="lazy"
alt="The vertical geological timeline infographic webpage generated by Lovart"
&gt;&lt;/p&gt;
&lt;p&gt;See the full result here: &lt;a class="link" href="https://www.lovart.ai/r/62cce51" target="_blank" rel="noopener"
&gt;https://www.lovart.ai/r/62cce51&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Design-focused agents operate differently; Lovart treated this task as creating an infographic.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://cdn.victor42.work/posts/2025-05/c452cbd16f83c1cf54c4acaa41a09968.webp"
loading="lazy"
alt="The Lovart chat interface displaying collected inspiration images and charts"
&gt;&lt;/p&gt;
&lt;p&gt;It began by seeking visual inspiration while gathering information on geological eras.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://cdn.victor42.work/posts/2025-05/bd58a66bddb861b38de62e73d3328cf5.webp"
loading="lazy"
alt="The execution plan generated by Lovart detailing visual generation and layout tasks"
&gt;&lt;/p&gt;
&lt;p&gt;Its execution plan was roughly: organize information, generate four images for four geological eons, and then design the layout.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://cdn.victor42.work/posts/2025-05/74f874debdd5bad825e3f22f81f6334f.webp"
loading="lazy"
alt="The Lovart interface showing completed HTML generation and task completion notice"
&gt;&lt;/p&gt;
&lt;p&gt;It produced a long, webpage-style image and marked the task as complete.&lt;/p&gt;
&lt;h2 id="thoughts"&gt;Thoughts
&lt;/h2&gt;&lt;p&gt;The subject of this test, geological history, involves readily accessible information that doesn&amp;rsquo;t demand complex reasoning. I briefly reviewed the details and found the information from each agent largely accurate, so I didn&amp;rsquo;t perform an in-depth check. My primary aim was to evaluate their effectiveness in science communication and their capacity to translate specialized knowledge into formats easily digestible by the public.&lt;/p&gt;
&lt;p&gt;Different AI agent products possess distinct &amp;lsquo;DNA&amp;rsquo; and employ varied approaches. Whether they prioritize content or presentation, neither approach is inherently superior or inferior. This helps identify their respective strengths; when used judiciously, they can effectively address specific problems.&lt;/p&gt;
&lt;p&gt;Notably, Skywork and Lovart surpassed basic document generation, employing technical methods to enhance content presentation. This capability isn&amp;rsquo;t exclusive to AI agent tools. AI design expert &lt;a class="link" href="https://x.com/op7418" target="_blank" rel="noopener"
&gt;&lt;strong&gt;歸藏 (Guīcáng)&lt;/strong&gt;&lt;/a&gt; demonstrated similar &lt;a class="link" href="https://mp.weixin.qq.com/s/f1IozQKgIEDODfLRP5E2qg" target="_blank" rel="noopener"
&gt;AI design capabilities&lt;/a&gt; using prompts long ago. In other words, the core of an agent&amp;rsquo;s design ability still lies in the prompt.&lt;/p&gt;
&lt;p&gt;For those less skilled in prompt engineering, AI agent tools offer a viable alternative, significantly lowering the entry barrier. However, for more customized content presentation, carefully crafted prompts in general AI tools can achieve this, though it necessitates a separate information-gathering step.&lt;/p&gt;
&lt;p&gt;Finally, to answer the initial question: Can AI independently create PPTs now?&lt;/p&gt;
&lt;p&gt;If this means creating a usable PPT file with reliable and substantial content, then the answer is no.&lt;/p&gt;
&lt;p&gt;However, if you can ensure content quality yourself, and AI&amp;rsquo;s role is merely to convert that content into a more digestible visual format (not necessarily PPT files), then the answer is yes.&lt;/p&gt;</description></item><item><title>AI Search Got You Stuck?</title><link>https://victor42.eth.limo/post-en/ai-search/</link><pubDate>Wed, 26 Feb 2025 12:14:00 +0000</pubDate><author>hi@victor42.work (Victor42)</author><guid>https://victor42.eth.limo/post-en/ai-search/</guid><description>&lt;img src="https://cdn.victor42.work/posts/2025-02/84O7u4RISVmTo0al7fmLUA.jpg" alt="Featured image of post AI Search Got You Stuck?" /&gt;&lt;p&gt;I&amp;rsquo;ve been looking for a truly reliable AI search tool. I figured I&amp;rsquo;d just test it with some representative questions. Turns out, I was in over my head.&lt;/p&gt;
&lt;h2 id="starting-with-real-life-problems"&gt;Starting with Real-Life Problems
&lt;/h2&gt;&lt;p&gt;I&amp;rsquo;ve used some real-world questions as test cases, and most AI search tools fall short.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s not that the questions are hard. The trick is how the AI goes about searching and pulling out the answer.&lt;/p&gt;
&lt;h3 id="what-kind-of-fish-is-guyanyu"&gt;What kind of fish is &amp;ldquo;Guyanyu&amp;rdquo;?
&lt;/h3&gt;&lt;p&gt;It&amp;rsquo;s a type of edible flatfish. This is a trick question – &amp;ldquo;Guyanyu&amp;rdquo; is a colloquial, shortened name used in fish markets, not the scientific one.&lt;/p&gt;
&lt;p&gt;AI finds useful info, but also noise:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Some only consider &amp;ldquo;Guyanyu,&amp;rdquo; ignoring homophones, leading to wrong answers like &amp;ldquo;spotted shad.&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Others consider the homophone &amp;ldquo;Guyanyu,&amp;rdquo; but mistake it for the spotted shad.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Reasoning models see they&amp;rsquo;re different, but can&amp;rsquo;t tell which one you&amp;rsquo;re asking about, so they list both.&lt;/p&gt;
&lt;p&gt;Sometimes, because there&amp;rsquo;s so much more info on &amp;ldquo;Guyanyu&amp;rdquo; (the flatfish), the AI jumps to a conclusion and gets it right, accidentally.&lt;/p&gt;
&lt;h3 id="whats-the-relationship-between-liu-chuanzhi-and-bike-sharing"&gt;What&amp;rsquo;s the relationship between Liu Chuanzhi and bike-sharing?
&lt;/h3&gt;&lt;p&gt;There&amp;rsquo;s not much of a direct link, but there&amp;rsquo;s a two-step indirect one: his daughter, Liu Qing, and Didi (which she leads), which owns Qingju Bike.&lt;/p&gt;
&lt;p&gt;I didn&amp;rsquo;t know their relationship when I asked. I wanted the &lt;em&gt;most significant&lt;/em&gt; chain of influence, not the most direct.&lt;/p&gt;
&lt;p&gt;Non-reasoning models focus on Legend Capital&amp;rsquo;s investment in ofo, mostly ignoring Liu Qing.&lt;/p&gt;
&lt;p&gt;Reasoning models are smarter, recognizing Liu Qing&amp;rsquo;s importance, but stop at Didi. They assume Didi is just ride-hailing and don&amp;rsquo;t dig into Didi&amp;rsquo;s connection with Qingju, often concluding: The Liu family has a big impact on transportation, but little direct connection to bike-sharing.&lt;/p&gt;
&lt;h3 id="hangzhou-was-called-linan-in-ancient-times-why-did-this-name-get-given-to-linan-district-today"&gt;Hangzhou was called Lin&amp;rsquo;an in ancient times. Why did this name get &amp;ldquo;given&amp;rdquo; to Lin&amp;rsquo;an District today?
&lt;/h3&gt;&lt;p&gt;This used to confuse me. It wasn&amp;rsquo;t &amp;ldquo;given.&amp;rdquo; I had it backwards. Lin&amp;rsquo;an County came first, &lt;em&gt;then&lt;/em&gt; the Southern Song Lin&amp;rsquo;an Prefecture. The Southern Song might have been inspired by the county, but they were different places. The Southern Song capital was in downtown Hangzhou, not Lin&amp;rsquo;an. After the Song Dynasty fell, Lin&amp;rsquo;an Prefecture went back to being Hangzhou, while Lin&amp;rsquo;an County stayed Lin&amp;rsquo;an. Later, Lin&amp;rsquo;an became a district of Hangzhou.&lt;/p&gt;
&lt;p&gt;Because the question itself is misleading, non-reasoning models mostly go with the incorrect assumption, talking about commemorating history or the glory of the Southern Song.&lt;/p&gt;
&lt;p&gt;Reasoning models do well here, mostly getting it right. They spot the chronological order and point out the flawed &amp;ldquo;given to&amp;rdquo; phrasing.&lt;/p&gt;
&lt;h3 id="what-was-the-highest-throughput-of-shanghais-port-during-the-colonial-period-how-did-it-compare-to-the-largest-ports-at-the-time"&gt;What was the highest throughput of Shanghai&amp;rsquo;s port during the colonial period? How did it compare to the largest ports at the time?
&lt;/h3&gt;&lt;p&gt;I was just curious. I still don&amp;rsquo;t know, but I found that most AI search tools can&amp;rsquo;t answer this.&lt;/p&gt;
&lt;p&gt;A somewhat reliable source is the Shanghai Port Chronicle on Baidu Baike, mentioning 14 million tons before the Second Sino-Japanese War, ranking 7th globally.&lt;/p&gt;
&lt;p&gt;Data for other ports is either unavailable or made up by the AI. Some less intelligent AIs with big search volumes found &lt;em&gt;some&lt;/em&gt; useful data (at least with references).&lt;/p&gt;
&lt;p&gt;These are all real problems. I had tons of questions. I was a &amp;ldquo;walking encyclopedia&amp;rdquo; as a kid, and many quick searches turned up nothing. This made me doubt AI search.&lt;/p&gt;
&lt;h2 id="not-all-problems-are-created-equal"&gt;Not All Problems Are Created Equal
&lt;/h2&gt;&lt;p&gt;AI search is a mixed bag. Some do well on certain questions, others don&amp;rsquo;t. I started looking for patterns: How can I tell which AI is good at what? And how should I pick an AI search product?&lt;/p&gt;
&lt;p&gt;First, reasoning models are generally better, but not all are smart enough. Gemini 2.0 Flash and Kimi K1.5 aren&amp;rsquo;t great. In my tests, Gemini 2.0 Flash couldn&amp;rsquo;t answer these, but R1 could.&lt;/p&gt;
&lt;p&gt;Search method matters, too.&lt;/p&gt;
&lt;p&gt;Interestingly, Grok 3 has strong reasoning, even without &amp;ldquo;Think,&amp;rdquo; but can&amp;rsquo;t answer the &amp;ldquo;Guyanyu&amp;rdquo; question. Looking at its searches, I get why. It might be forcing a translation. With a weird Chinese name like &amp;ldquo;Guyanyu,&amp;rdquo; it mistranslates, doesn&amp;rsquo;t search for the shad or flatfish, and probably searches for things like &amp;ldquo;ancient&amp;rdquo; and &amp;ldquo;eye&amp;rdquo; separately. It finds nothing useful and makes stuff up.&lt;/p&gt;
&lt;p&gt;Search volume is also key.&lt;/p&gt;
&lt;p&gt;which country does Windsurf IDE come from?&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s from the US. I thought, &amp;ldquo;easy.&amp;rdquo; Foreign AI search did great, even finding Mountain View, California. I tested domestic ones. Kimi and Yuewen can search English, so I asked in English. Finding the US was easy, but not the city.&lt;/p&gt;
&lt;p&gt;But it&amp;rsquo;s &lt;em&gt;not&lt;/em&gt; that simple. Which article on Windsurf IDE would mention the city? At most, they&amp;rsquo;d say the country. To get the full answer, the AI needs to find Codium (the company behind it), then find the city from Codium&amp;rsquo;s site, job postings, or Product Hunt. That takes reasoning and multi-step searching!&lt;/p&gt;
&lt;p&gt;This made me realize: questions we find easy can be tough for AI. It&amp;rsquo;s not that AI is dumb; we underestimate the complexity.&lt;/p&gt;
&lt;p&gt;Even with a search engine, finding Windsurf IDE&amp;rsquo;s country is easy, but the city isn&amp;rsquo;t a one-search deal.&lt;/p&gt;
&lt;p&gt;So, I came up with a rough way to evaluate AI search: four quadrants based on AI ability and search ability:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://cdn.victor42.work/posts/2025-02/Snipaste_2025-02-26_12-52-46.png"
loading="lazy"
alt="A quadrant chart classifying search problems by inference requirement and information volume"
&gt;&lt;/p&gt;
&lt;p&gt;I underestimated the &amp;ldquo;Guyanyu,&amp;rdquo; Liu Chuanzhi/bike-sharing, and Lin&amp;rsquo;an questions. I thought they were type D, but they&amp;rsquo;re type B. The Shanghai port question is a trickier type A.&lt;/p&gt;
&lt;p&gt;Mistaking type A for C, and B for D, leads to disappointment.&lt;/p&gt;
&lt;p&gt;The biggest problem? We don&amp;rsquo;t know the category when we ask, and we often underestimate the difficulty.&lt;/p&gt;
&lt;p&gt;But AI search is a tool, and tools should serve us, right? It&amp;rsquo;s not doing a great job yet, and that&amp;rsquo;s not our fault; it&amp;rsquo;s on them to improve.&lt;/p&gt;
&lt;p&gt;To reliably answer type B, agents like Grok 3 Deep Search and OpenAI Deep Research are crucial. They need multi-step searches, deep dives into relationships, source reliability checks, and conflicting info evaluation.&lt;/p&gt;
&lt;h2 id="making-the-most-of-ai-search"&gt;Making the Most of AI Search
&lt;/h2&gt;&lt;p&gt;Deep search for everything is too slow.&lt;/p&gt;
&lt;p&gt;As someone in the AI community said: Since we can&amp;rsquo;t make AI accommodate humans yet, let humans accommodate AI.&lt;/p&gt;
&lt;h3 id="use-multiple-products-simultaneously"&gt;Use Multiple Products Simultaneously
&lt;/h3&gt;&lt;p&gt;To save time and get decent answers, ditch the &amp;ldquo;one-tool-fits-all&amp;rdquo; idea. Think a bit about which quadrant a question likely falls into. Each has reliable AI search products; choose accordingly.&lt;/p&gt;
&lt;p&gt;It takes more thought, but saves time. Your call.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s go backwards. Type D is easiest; any AI search tool works.&lt;/p&gt;
&lt;p&gt;Type C needs a lot of searching, but no reasoning. If the webpage exists, the answer is there. Example:&lt;/p&gt;
&lt;p&gt;which country does Windsurf IDE come from?&lt;/p&gt;
&lt;p&gt;Kimi does well on these. Products with 50 search entries are also good. Consider long-tail knowledge as this type.&lt;/p&gt;
&lt;p&gt;Type B has two scenarios:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The answer&amp;rsquo;s there, but with lots of conflicting noise.&lt;/li&gt;
&lt;li&gt;The answer&amp;rsquo;s not in the core search results, but is abundant in incidentally searched terms. My earlier questions are examples.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;These need strong reasoning models, like R1, Grok 3 Think, or O3 Mini. Search capability isn&amp;rsquo;t as crucial; a dozen or two dozen sources are enough. Type B is easily mistaken for D. If answers are bad, realize this.&lt;/p&gt;
&lt;p&gt;Finally, type A. I&amp;rsquo;m not sure any current AI search can handle these reliably. Info is scarce. You&amp;rsquo;ll probably have to sift through search engines manually. If you want to try AI, use deep search/research.&lt;/p&gt;
&lt;h3 id="give-up-on-one-shot-answers"&gt;Give Up on One-Shot Answers
&lt;/h3&gt;&lt;p&gt;The goal is to solve problems. Don&amp;rsquo;t expect a perfect answer in one go. Let that go, and you&amp;rsquo;ll find more options.&lt;/p&gt;
&lt;p&gt;Back to:&lt;/p&gt;
&lt;p&gt;which country does Windsurf IDE come from?&lt;/p&gt;
&lt;p&gt;If the first question doesn&amp;rsquo;t give the city, ask:&lt;/p&gt;
&lt;p&gt;which city?&lt;/p&gt;
&lt;p&gt;For reasoning models, the odds of success go way up. Use multi-turn dialogue; you&amp;rsquo;d do the same with a search engine.&lt;/p&gt;
&lt;p&gt;For tricky type A questions, like I said, accommodate the AI.&lt;/p&gt;
&lt;p&gt;Ask in different ways, skim the sources, and judge usefulness by titles. Put useful ones in a knowledge base, and use AI to RAG it for the answer. Tools include NotebookLM, Tencent&amp;rsquo;s iMa, Perplexity, and AI clients like Cherry Studio.&lt;/p&gt;
&lt;h3 id="pay-attention-to-language-differences"&gt;Pay Attention to Language Differences
&lt;/h3&gt;&lt;p&gt;Language matters. An AI limited to Chinese can&amp;rsquo;t answer English-world nuances; foreign AI can&amp;rsquo;t answer questions about your local school&amp;rsquo;s enrollment plan.&lt;/p&gt;
&lt;p&gt;A test:&lt;/p&gt;
&lt;p&gt;wildfire trends in CA in the last 10 years&lt;/p&gt;
&lt;p&gt;Ask about something abroad in English. If most results are Chinese webpages, it can&amp;rsquo;t search English well and is only good for Chinese topics.&lt;/p&gt;
&lt;p&gt;Most domestic products have R1, so reasoning is good. Choosing a Chinese-world AI search is easy: find one with a large search volume.&lt;/p&gt;
&lt;p&gt;If you need English and foreign info, foreign products are best. If that&amp;rsquo;s inconvenient, test domestic products with English questions.&lt;/p&gt;
&lt;p&gt;Finally, models and products mentioned are time-sensitive (February 2025). Conclusions might change, but the factors for understanding and evaluating AI search remain useful.&lt;/p&gt;</description></item></channel></rss>