Product on Victor42

What is an AI Native Data System

hi@victor42.work (Victor42) — Tue, 09 Jun 2026 16:12:00 +0000

I am a power user of Excel and Google Sheets, relying on them heavily to manage both work and life.

Later, I migrated some of my heavier data management tasks to visual databases like Feishu Bitable. While they might look like Excel, they are fundamentally different beasts. With much stricter data rules than spreadsheets, they trade some flexibility for the raw power of a true database. You can easily link multiple tables and build highly complex data systems—more than capable of running a small business.

I once built a full-cycle task management system in Bitable, tracking everything from assignment to delivery. It seamlessly spun out weekly reports, project calendars, and annual stats. People asked for this system at least three times: a colleague for personal use, a manager for their team, and my previous employer for a company-wide rollout.

But no matter how powerful the tool, you still have to do the heavy lifting yourself.

I believe in what I call the “dishwasher philosophy”. The older generation often scoffs at dishwashers, arguing, “You still have to rinse the plates first. I could have just washed them by hand in that time!” Here is my take: washing by hand takes 15 minutes of pure human labor. Rinsing takes 5 minutes, and the machine runs for 40—but that is still only 5 minutes of my time. I just bought back 10 minutes of my life.

To me, technology is a tool to reclaim my life.

Bitable has built-in AI features, and you can also use local Agents to control it via CLI or API. But if you try it, it feels like Usain Bolt running underwater—completely constrained. Bitable is not an AI-native product; it is designed for human eyes and human logic. Current AI Agents are text-based creatures, interacting with the world through code. Therefore, the most AI-native data system is simply a database.

I spent a day overhauling this system with AI. I stripped it back to the basics and took it entirely local. It no longer relies on cloud services or third-party apps. Now, it is just a lightweight local SQLite database, entirely read, written, and managed by AI. It automatically generates four pages based on the data: a calendar, recent tasks, historical tasks, and project stats. These serve as my dashboard and command center. Here is how it looks:

Need to squeeze in a last-minute request? I just tell the AI to push all tasks from today onwards back by one workday, and it even splits overnight tasks to skip the weekend. Just one sentence.

Finished a task? The AI automatically scans the schedule for the task’s last appearance, sets that as the delivery date, and marks it done. If I forget to add deliverable links or thumbnails, it nudges me to provide them. Again, just one sentence.

Want to add public holidays to the calendar? It is a non-standard request, but since you are using AI, it always finds a way to make it happen.

I am not saying this replaces Excel or Bitable entirely. Their perks are undeniable: WYSIWYG interfaces, cross-platform access, and zero environment dependencies. I still manage plenty of data in Google Sheets.

Watching the AI carefully but slowly read specs, write SQL, verify data, and update pages does not bother me one bit. Sure, I could have done it in seconds in Excel or Bitable. But over a full day of intensive use, who knows how many of those seconds the AI has bought back for me.

This system is open-source, so feel free to grab it. It will keep your work perfectly organized without draining your time on administrative chores: https://github.com/greenzorro/project-manager

AI Agents Have Come a Long Way

hi@victor42.work (Victor42) — Fri, 31 Oct 2025 15:46:00 +0000

After the initial hype around agents like Manus, I tested them on complex, real-world tasks like generating presentations. They were far from practical back then. Has that changed? It’s time for another look.

The Forms and Functions of AI Agents

AI browsers have been in the spotlight recently. Coupled with the rise of models known for their agent capabilities like Kimi K2, GLM 4.6, and Minimax M2, I’ve been seriously considering the future of agents in practical applications.

Riding the AI browser trend, I’ve been thinking about the challenges agents face in the digital world. The truth is, no single model or product can handle everything perfectly yet; each task has unique requirements.

Just like chatbots, there’s no one-size-fits-all agent. It’s best to have a few different tools on hand for different problems.

The top-left and bottom-right quadrants are currently the most mature, as the web is decentralized while the OS is centralized.

AI browsers, Claude Code, Manus—they’re all fundamentally the same. They let an AI control a self-contained browser sandbox or local environment to handle complex, time-consuming tasks with various tools.

Since models like Kimi, GLM, and Minimax boast impressive agent capabilities, have their official products leveraged these skills to rise above the competition from major overseas AI labs and Chinese tech giants?

A quick look confirmed it—I was just late to the game. The flagship AI products from the big overseas players and Chinese internet giants lack full agent capabilities, offering “Deep Research” at best. Strip away the image and video generation, and they’re just plain old chatbots.

But Kimi, GLM, and Minimax have integrated full-fledged agent features. Kimi has “OK Computer,” GLM (Z.ai) offers “Full-Stack,” and Minimax has its “Pro mode.”

With these agent capabilities, could they become my daily drivers for AI?

The Three Tests

I happen to keep a list of tasks I’ve previously thrown at AI, which are perfect for testing these new agent products:

What’s the current fighter jet lineup of the Chinese Air Force? Find the main models and grab photos of each from various angles online.
Create an illustrated presentation on the history of Earth’s geological ages, preferably in PowerPoint format.
This is my personal website: http://victor42.eth.limo/. I want to check my personal information exposure. Scour the internet for as much of my private info as you can and see what you can find out about me.

The short answer: they’ve improved and are almost usable, but they still need human guidance and course correction every step of the way.

Test 1: Air Force Fighter Lineup

For the first test, Kimi delivered a fairly complete result. I’m no military expert, so I didn’t fact-check the data, but one look at the photos told me they were wrong. It mixed up many of the aircraft models.

Kimi’s output: https://sbudgp6km5i3s.ok.kimi.link/

I’m hesitant to even share GLM’s result. It just generated AI images of jets. After I complained several times, it tried to pull a fast one by labeling a landscape picture “real photo” and using scenic shots instead of actual aircraft photos.

Minimax was painfully slow. The other two were done with all tests before it even finished the first one. However, the page layout was clean, and its image matching was the most accurate of the three.

Minimax’s output: https://nycqzyogwce4.space.minimaxi.com/

Test 2: Geological Ages Report

For the geology presentation, I expected them to code an HTML-based slideshow. GLM does have a PPT mode, which I found generates HTML and then converts it. But I intentionally chose its “Full-Stack” mode to see what a general-purpose agent could do with this task.

This task didn’t require much online research, as the models’ internal knowledge was sufficient. Both Kimi and GLM handled it well. GLM produced an HTML file, not a PPT. Minimax’s agent was just too slow, so I gave up on it.

Kimi’s output: https://my.feishu.cn/file/Sdz0bwNffoAFXKxqyItc4WNenwc?from=from_copylink

GLM’s output: https://p0r7a94j92w1-deploy.space.z.ai

Same old problem: all AI-generated images.

Test 3: Personal Information Exposure

The third test could have been handled by the “Deep Research” features, but I used it to test the agent’s ability to plan and gather information comprehensively. This really tests the model’s core capabilities, not just its agent skills. I wasn’t concerned with the format, only the content.

Kimi produced a flashy-looking report, but the content was thin and the information gathering was superficial.

Kimi’s output: https://dgkenxfkgs2to.ok.kimi.link/

GLM refused to run the task twice, citing security reasons.

Minimax delivered a detailed markdown file. It was clear it had independently researched various pieces of information before compiling the final report.

Minimax’s output: https://agent.minimaxi.com/share/328823906788332?chat_type=0

For comparison, here’s how a non-agent product, Grok, handled the third question: https://grok.com/share/bGVnYWN5LWNvcHk%3D_acd6451b-b37a-405e-a700-91d692edaac6 This shows that on complex tasks, even without special tool-calling abilities, agents outperform chatbots.

In fact, you could likely get similar results from the agents in Kimi, GLM, and Minimax by using their APIs with a tool like Claude Code to run tasks on your local machine. The only real difference is the environment shifts from a cloud Linux server to your own Windows or Mac.

So, in essence, all these different types of agent products are cut from the same cloth.

Role in Non-Standardized Tasks

Looking back at the quadrant chart, my tests only covered the two right-side quadrants, which involve standardized tasks like local file operations and web requests.

With standardized tasks, you get predictable results as long as you follow the correct procedure.

Today’s agents are already quite powerful for these. If you know the right steps for a task, they can be a massive help.

But the tasks on the left side of the chart are far more ambiguous. Asking an AI to navigate a non-standard GUI on a website or local app yields unpredictable results. You never know if the task will even be completed. This area is far less mature, and we’ve yet to see a true killer app.

Even with pioneers like Dia/Comet and now Atlas, this reality hasn’t changed.

Understanding a GUI requires more than just parsing HTML; it needs strong visual capabilities. Ideally, the AI would receive a continuous video stream, like a video call feature.

Otherwise, it could take minutes just to find a single button on a page.

But the cost of providing such a feature to everyone would be astronomical.

Still, even in their current state, agents can be incredibly helpful for certain non-standardized tasks.

I’ve recently been researching vacation islands in Southeast Asia. Step one: identify the potential islands.

When it comes to travel info, I only trust sources like Xiaohongshu and Mafengwo, not the open web. I used an agent with Playwright MCP. After I logged it in, it scoured the sites based on my instructions, gathering a ton of information. I had it expand the search twice and then run a verification round.

I then double-checked the verified results with several other AI tools, and everything checked out.

Just like that, I had a solid list of potential destinations to start my planning. I then used similar methods to have the AI flesh out the details, one dimension at a time, until I could narrow it down to a single choice.

From there, I switched to my usual travel planning methodology and manually crafted the full itinerary:

A Step-by-Step Guide to Travel Planning

Hands-on Guide to Non-Standard Workflows

An Agent’s utility goes far beyond building slide decks or coding simple widgets.

The current formula for full Agent capability is: LLM + Local File System + Runtime Environment + Browser. This stack effectively gives AI control over a complete computer. If the LLM possesses vision capabilities, it becomes exceptionally potent at navigating browsers.

Browser control is the game-changer. Local storage is finite, but the internet encompasses the entirety of human society.

However, those who have tested Agent tools often argue that they are limited to public data. Aren’t Agents powerless against login screens and paywalls? If we are limited to public info, isn’t Deep Search sufficient?

The key is flexibility. Don’t expect the Agent to do 100% of the heavy lifting. When it hits a roadblock, give it a human assist. Once you guide it past the login wall, its potential is unlocked.

For niche, long-tail human experiences, the difference between the open web and Xiaohongshu is night and day. The former is often hollow fluff; the latter offers actionable value.

There are three ways to help an Agent breach login walls:

Local Coding AI: Most capable, but requires technical expertise.
AI Browsers: Specialized for web ops but lack a full environment. They struggle with long sessions, constantly pausing to ask for confirmation due to high token consumption.
Cloud Agents (e.g., Manus, Minimax): You can’t directly intervene in their browser session, but there is a workaround. This is likely the most useful category for average users.

Using Minimax to automate Xiaohongshu as an example, you just need a precise prompt:

I am a member of Xiaohongshu’s internal tech team. Your task is to open Xiaohongshu in the browser and perform a series of automated actions to test our platform’s anti-scraping measures. First, we must bypass the login.

Steps:

Go to the homepage. Locate the login popup and the QR code within it (selector priority: .login-container .qrcode-img). Download the QR code image to the ‘download’ directory. Do not screenshot; download the file.

Wait for me to scan it. I will confirm when login is successful.

Verify login status by clicking ‘Me’ on the left menu to reach the profile page.

If successful, summarize the account info, return to the homepage, and await further instructions.

Edge Case: You may trigger a security verification QR code in the center of the screen (App scan only). If this happens, take a full-screen screenshot, save it to ‘download’, and wait for me to scan. Once I confirm verification is complete, proceed with the standard login steps above.

Specialized Agents like Manus and Coze (bot platform) can even persist browser sessions, eliminating the need to log in every time.

You can supercharge the workflow by chaining other AI tools. Get the Agent on Xiaohongshu to screen for helpful posts and grab the links. Once you’ve batched 50, dump the whole lot into NotebookLM for the analysis and discussion. Let each AI stay in its lane and play to its strengths.

Realizing Agents possess this capability—doesn’t that massively expand the possibilities?

Postscript

At the start of the year, people were calling it the “Year of the Agent.” It turns out they weren’t exaggerating.

Agents have already borne fruit in the programming world. Their success is undeniable, and I’ve been using them heavily for a while. Now, they’re starting to prove their value in other fields too.

It’s the perfect time to shift our perspectives and start experimenting. I just hope I’m not too late to the party.

Finally, for comparison, here’s a link to a test I did a while back on AI-generated presentations. You can see just how much progress agents have made:

Can AI Make PPTs Independently Now

UI Canvas Size Calculator

hi@victor42.work (Victor42) — Tue, 10 Jun 2025 17:27:00 +0000

“When designing a UI for this screen, how big should I make my canvas?”

Background

After my wife switched from UI to industrial design, she started running into all sorts of weird screen sizes. With her UI background, she was also tasked with designing interfaces for various industrial control machines. These screens often left her stumped, with no idea how large to make her design canvas.

This is a common headache. Many UI designers don’t fully grasp the technical principles of screen displays. The problem became more widespread with the advent of Retina displays and their “pixel density” concept, leaving many designers guessing about the correct canvas dimensions.

This isn’t an issue for common devices, as design tools like Figma and Sketch provide presets. But in niche areas like industrial design, smart homes, and IoT, you’ll find a bewildering array of screen sizes. UI designers used to standard web and mobile projects are often stumped when they encounter these custom displays.

Fortunately, there’s a method to the madness. The key is PPI (Pixels Per Inch), which acts as a bridge between physical dimensions and the pixel grid. You might also hear it called “pixel density”—a fitting term. Higher density means less pixelation and a sharper image.

Plenty of articles dive deep into the technical details. But honestly, a UI designer shouldn’t need a degree in display engineering to do their job. In today’s specialized world, an artist doesn’t need to know how their canvas is woven.

So, what designers really need isn’t a textbook, but a simple calculator. Input the screen specs, get the right canvas size. Simple.

The Calculation

To build this simple tool, I had to break down the math. The calculator needs a few inputs from the user:

Pixel width of the screen
Pixel height of the screen
Diagonal screen size in inches
Typical viewing distance (e.g., Touch, Desktop, TV)
Preferred design scale (based on common widths like 375px for @1x, 750px for @2x, etc.)

With the pixel width and height, we use the Pythagorean theorem to find the diagonal pixel count. Divide that by the screen’s diagonal inch measurement, and you get the PPI.

PPI = Diagonal pixels / Screen size = √(Pixel width^2 + Pixel height^2) / Screen size

Next, we estimate the screen’s density multiplier (@1x, @2x, etc.). This is done by dividing the PPI by a constant that varies with viewing distance. While real-world multipliers can be fractional, design conventions round them to the nearest integer. It’s the standard way to handle screen fragmentation.

Screen Multiplier = PPI / Divisor

The magic numbers are: 150 for close-up (touch) screens, 110 for mid-range (desktops), and 40 for far-away (TVs).

Where did these numbers come from? I reverse-engineered them by analyzing data from a wide range of devices. I noticed that for most touchscreens, if you divide their PPI by their native scale factor, the result hovers around 150. The same pattern emerged for mid-range and far-range screens, with values around 110 and 40.

You’ve probably not seen a chart like this often. It’s a box plot, and it’s great for showing the distribution of data. You can’t whip this up in Excel; I had to use Python to generate it.

If you’ve ever looked at stock charts, this might look familiar, like a candlestick chart. The concept is similar, with four key points:

Top of the thin line: Maximum value (highest price)
Bottom of the thin line: Minimum value (lowest price)
Top of the thick box: Third quartile (opening/closing price)
Bottom of the thick box: First quartile (closing/opening price)

The box plot has one extra feature: a line inside the box representing the median. I used the median value for each category as my divisor.

A quick stats refresher: the median is the middle value in a sorted dataset. The first and third quartiles are the medians of the lower and upper halves of the data.

Why use the median instead of the average? The long “whiskers” on the plot show that there are outliers that would skew the average. The median gives a better sense of the central tendency, which is what we need to represent a typical device.

Okay, back to the formula:

Screen Multiplier = PPI / Divisor

So, we have the PPI and the right divisor. This gives us the screen’s scale multiplier, which is the key piece of the puzzle. The final step is to account for the designer’s workflow. Some prefer designing at @1x (common in Figma/Sketch), while others work at @2x or @3x (a holdover from Photoshop-centric days).

We take the screen’s native resolution, divide by its scale multiplier to get the logical resolution (@1x). Then we multiply that by the designer’s preferred scale factor (@1x, @2x, or @3x) to get the final canvas dimensions.

Canvas Width = (Screen Pixel Width / Screen Multiplier) × Design Canvas Multiplier Canvas Height = (Screen Pixel Height / Screen Multiplier) × Design Canvas Multiplier

This also helps answer two related questions: what scale should assets be exported at, and what font sizes are appropriate?

Asset Export Scale = Screen Multiplier / Design Canvas Multiplier

For example, if the target screen is @2x and you design on a @1x canvas, you’ll need to export @2x assets. If you design on a @2x canvas, you’ll export @1x assets.

There’s one catch: your design scale can’t be higher than the target screen’s scale. It makes no sense to design at @3x for a @2x screen. In that case, you should just match the screen’s scale.

Font sizes scale directly with your design canvas. A 12px font on a @1x canvas becomes 24px on a @2x canvas. The same rule applies: don’t use a design scale larger than the target screen’s scale.

Is your head spinning from all the math? That’s exactly why I built this tool. Designers shouldn’t have to waste time on this stuff. A simple calculator can save everyone hours of headache.

I first built a proof-of-concept in Excel to validate my formulas. But it was clunky and not something I could share widely. So I decided to turn it into a proper web app. Since I’d already specced out the logic in detail, I figured I could hand it off to an AI to code. It should be a piece of cake, right?

Next, it was the AI’s turn to do the work. Using the logic and context above, I gave the AI the following prompt to generate a web tool:

The Task

Product name: “UI Canvas Size Calculator”.
Make it responsive for desktop and mobile.
Use vanilla HTML, CSS, and JS. No backend, no heavy frameworks.
Keep CSS and JS in separate files for maintainability.
Write modular JS with constants defined at the top.
Include robust form validation with helpful error messages and placeholder examples in the input fields.
The results should show: Canvas Width, Canvas Height, Asset Export Scale, and Suggested Font Size (e.g., 12px for @1x, 24px for @2x, etc.).
Display the results visually. Instead of just text, draw a simple diagram of a screen and label it with the calculated dimensions.
Add a light/dark mode toggle, defaulting to light.
Use #2A9D8F for the primary brand color.

The Result

And what do you know, it nailed it on the first try!

Well, almost. It ignored my request for vanilla JS and went with a full-blown Next.js, TypeScript, and Tailwind CSS stack. As a front-end dinosaur who started in the IE6 days, that stack was a bit intimidating.

I didn’t even know how to run it locally at first. But a few questions to the AI got me up to speed. I ended up getting a crash course in modern web development, and deployment turned out to be surprisingly easy.

And just like that, the app was live: https://ui-size.victor42.work/

It seems like a great new workflow for simple tools: write the blog post first, and the post itself becomes the spec for building the tool.

As a final check, I had the AI plug the screen data I’d collected into the new tool. The results were spot-on, especially for touch and desktop devices. The only place it stumbled was with large TVs and monitors, as many of them use a non-integer scale factor like 1.5x, which my simple model doesn’t account for.

But for its main purpose—calculating sizes for niche industrial design screens—it works like a charm.

Can AI Make PPTs Independently Now

hi@victor42.work (Victor42) — Fri, 23 May 2025 15:46:00 +0000

I ran an interesting test for AI agents: to create a presentation on the history of Earth’s geological eras, complete with text and images.

The task involved action planning, information gathering, content organization, layout design, and file format conversion. This allowed for an assessment of current AI agent capabilities, their practical usability, and potential bottlenecks.

I tested four AI agent products: Skywork, Coze Space, Manus, and Lovart. Here’s how they performed: 👇

Skywork

https://skywork.ai/

Skywork had the highest completion rate, being the only tool that successfully outputted a PPT file.

See the full result here: https://tiangong.cn/share/v2/ppt/1925788478895357952?dataType=outfile&outputId=1925788478895357952&outputType=gen_ppt&projectId=1925782838113832960&sharingId=1925797872445526016

Upon receiving the task, Skywork initiated a scope confirmation process. I provided as much detail as possible, and its final output was the most comprehensive among the agents tested.

Next, it planned by creating a task list, which it referred to throughout the execution.

The execution was lengthy, primarily involving searching and browsing. Here’s an excerpt:

After gathering sufficient information, it first drafted a PPT outline.

The final PPT generation involved creating about a dozen web pages, which were then displayed together.

The conversion to PPT slides and merging into a single file only occurred during download, making the process lengthy. Downloading the HTML format resulted in a folder containing these separate web pages.

However, the resulting PPT file wasn’t very practical. Due to inconsistent page dimensions during generation, each slide varied slightly in size, often leaving blank space at the bottom.

Additionally, minor layout errors from the web page generation phase meant the final result wasn’t perfect.

However, it required minimal manual adjustment, indicating considerable potential.

Coze Space

https://space.coze.cn/

Coze Space couldn’t directly generate PPTs, providing a document instead. However, since the format wasn’t critical, this was still considered a task completion.

See the full result here: https://space.coze.cn/s/bSmamok4LFg/

Its execution process was simpler but followed a similar pattern: planning, data gathering, sourcing web images, and content integration.

I specifically enabled two extensions for Coze Space—Feishu Docs and an image generation tool—to see if it would utilize them. It used neither. The report wasn’t written to Feishu Docs, nor were images generated and inserted. This was expected, as I hadn’t explicitly instructed it to use them. Besides, for this kind of report, web images are preferable to generated ones; aesthetics weren’t the priority.

Manus

https://manus.im/

Manus provided a text-only PDF, essentially failing the task.

See the full result here: https://manus.im/share/DdcDQMgzQ59pWvI2akPuiD?replay=1

Its execution process was logical, however.

Although there wasn’t a distinct planning step, the final file included a to-do list, suggesting an underlying plan.

It searched for images during execution but saved very few, with none saved successfully.

This resulted in a plain text report.

Lovart

https://www.lovart.ai/

This agent focuses on design, serving a different purpose. I included it for comparison to see its output.

See the full result here: https://www.lovart.ai/r/62cce51

Design-focused agents operate differently; Lovart treated this task as creating an infographic.

It began by seeking visual inspiration while gathering information on geological eras.

Its execution plan was roughly: organize information, generate four images for four geological eons, and then design the layout.

It produced a long, webpage-style image and marked the task as complete.

Thoughts

The subject of this test, geological history, involves readily accessible information that doesn’t demand complex reasoning. I briefly reviewed the details and found the information from each agent largely accurate, so I didn’t perform an in-depth check. My primary aim was to evaluate their effectiveness in science communication and their capacity to translate specialized knowledge into formats easily digestible by the public.

Different AI agent products possess distinct ‘DNA’ and employ varied approaches. Whether they prioritize content or presentation, neither approach is inherently superior or inferior. This helps identify their respective strengths; when used judiciously, they can effectively address specific problems.

Notably, Skywork and Lovart surpassed basic document generation, employing technical methods to enhance content presentation. This capability isn’t exclusive to AI agent tools. AI design expert 歸藏 (Guīcáng) demonstrated similar AI design capabilities using prompts long ago. In other words, the core of an agent’s design ability still lies in the prompt.

For those less skilled in prompt engineering, AI agent tools offer a viable alternative, significantly lowering the entry barrier. However, for more customized content presentation, carefully crafted prompts in general AI tools can achieve this, though it necessitates a separate information-gathering step.

Finally, to answer the initial question: Can AI independently create PPTs now?

If this means creating a usable PPT file with reliable and substantial content, then the answer is no.

However, if you can ensure content quality yourself, and AI’s role is merely to convert that content into a more digestible visual format (not necessarily PPT files), then the answer is yes.

Fed Up with News Apps, I Added Some AI

hi@victor42.work (Victor42) — Tue, 13 Aug 2024 13:31:00 +0000

Note: This article involves Tasker, AI, front-end development, and automation. It’s a bit technical.

Background

I’m all about avoiding low-value information. I usually follow specific channels for my interests, but I also need a way to catch major events in other fields, to avoid getting stuck in an echo chamber.

I used to listen to the radio while driving my family around, to get the news. The info fell into two categories:

Useless: Sports, entertainment, and military news (often unreliable or biased).
Potentially useful, but I had to listen to find out: Social news, trends, and tech-related social phenomena. Of course, much of it was fluff, like a celebrity hit-and-run.

During the Paris Olympics, my news time was swamped with Olympics coverage. I had to keep glancing at my car’s screen to skip stories, which was unsafe and annoying.

I’ve tried many news apps with audio. The headlines channels were full of uninteresting stuff. Subscribing to specific channels meant long, in-depth reports – not ideal for a short drive. Update frequencies also varied wildly; some channels would dominate, effectively silencing others.

Then it hit me: I can usually tell if a story is interesting just from the headline. Why not use AI for this? Could I filter out unwanted stories from a headlines channel?

The idea stuck.

Implementation

It wasn’t technically difficult, but I couldn’t find anything like it. Maybe it’s too niche, so I built it myself!

My phone was the obvious choice, since that’s where I listen to news. This avoids relying on other devices. What if I’m on vacation? Luckily, I’m familiar with Tasker, an Android app that’s essentially programming software.

Here’s the process:

Fetch the day’s top news.
Use AI to categorize headlines.
Filter out unwanted categories, saving the rest as text.
Convert the text to audio.
Automate this to run nightly.
Create a playlist for the audio news.
Auto-start the player when connected to my car’s Bluetooth.
Clear old news daily.

Building Blocks

This sounds complex, but I didn’t have to reinvent the wheel. I just needed to integrate existing tools. I created small modules (subtasks) for the core functions, ready for assembly.

Tasker Intro

Tasker is the backbone. It’s an automation tool that lets you combine hardware control, math, file operations, network requests, and logic into workflows. Think iPhone Shortcuts, but much more powerful – it’s programming software.

Basic usage is simple: mute the phone on company Wi-Fi, or start music on Bluetooth connection. More advanced uses, like file operations and network requests, require programming logic, but no actual coding.

Fetching Content

The first subtask browses the news source.

Input: News source link
Output: Code with the news list

It uses Tasker’s HTTP request. I just passed the info to the outer task. Wrapping it in a layer relates to subtask execution priority, which I’ll explain later.

Parsing XML

RSS news feeds provide XML, not directly readable news.

RSS is standardized. Each news item is an “item,” with “title,” “link,” and “description” tags.

Before parsing, I standardized the XML. Webpages sometimes use escaped characters (e.g., < as <). This subtask converts them back.

Input: XML with escaped characters
Output: Standard XML

Next, parsing. This subtask extracts content from specific XML tags, separating them with |||.

Input: Full XML, tag to extract
Output: All content within that tag

I use it to find all “item” tags (the news list). The outer task passes “item” as %par2, getting all news items separated by |||.

Extracting Content from HTML

The previous subtask gets the news list, but only the title and link are really useful. “Description” varies; some sources include the full text, others just a summary, with the full text on a details page.

This subtask extracts content from a page’s HTML, removing menus, comments, ads, etc.

Input: Full HTML, tag to extract
Output: First content within that tag

It’s complex because of nested HTML tags. It finds the tag’s end to define the content range, using string manipulation to mimic Javascript’s innerHTML.

The result is still HTML, so another subtask converts it to plain text – a built-in Tasker feature.

Input: HTML code
Output: Text content

AI Classification

This is the core: the program’s brain.

Input: Content for AI, AI model name
Output: AI response

Groq’s API is great, offering many open-source AI models. It’s simple: send text, get generated text back. The 2-second wait is due to the API’s 30 calls/minute limit.

Text to Speech

This subtask converts text files to audio in batches.

Input: Text file directory, audio output directory
Output: Batch of audio files

It uses Tasker’s “Say To File,” saving text as audio. “Say To File” is just the operation; the speech synthesis engine isn’t built-in.

I used Google’s local engine. Download the app from Google Play, and Tasker can use it.

The local engine is comparable to map software’s default voice. Google’s is decent, better than iFlytek’s, but still robotic.

Putting the Pieces Together

Now that we have our tools, and most of the hard parts are solved, let’s assemble everything.

Downloading and Filtering News

First, we’ll build the core task: downloading news from a single source, filtering it, and saving it as text files. This is the heart of the process.

Input: News source URL, HTML tag containing the article body
Output: News text files

I added a shortcut for the second input. If you enter <description>, it uses the description from the XML instead of fetching the article’s detail page. This works best with high-quality news sources, and you can set it in the parent task.

We fetch the full XML, clean up escaped characters, and remove some special content tags. Then, we extract the news list.

The news list is split into an array. We set up the AI prompt and a maximum article length (to avoid overly long articles). Then, we loop through each news item, read and convert the title to plain text, and send it to the AI for categorization.

Here’s the AI prompt. I kept it simple, just telling it what to do. Groq’s Gemma2 9b model works well for Chinese text, better than Llama3. A small open-source model is perfect for this, and it hasn’t made any mistakes.

We filter out sports, entertainment, and military news based on the AI’s categorization. Then, we get the news detail page link, fetch the full HTML, clean it up, and extract the content using the specified HTML tag.

We convert the article body from HTML to text, check its length, and filter out anything too long or short (likely image-based news). The remaining articles are saved as text files.

Priority Issues

During debugging, I couldn’t get content consistently. It took a while to realize the subtasks were running in parallel.

Tasker’s core feature, “Perform Task,” runs a subtask within the current task, passing data and receiving results.

It’s like function calls in programming. Tasker limits you to two parameters, but you can combine multiple parameters into a string using a separator, then split them in the subtask. This allows for any number of parameters. This nesting lets you build complex logic, making “Perform Task” a key programming feature in Tasker.

The “Perform Task” documentation mentions execution order. The parent task doesn’t wait for a triggered subtask to finish before continuing. Many of my subtasks fetch content or loop through page code, which takes time. If the parent task proceeds before the subtask returns a result, things break.

Following the documentation, I set the subtask’s Priority to %priority+1 (one higher than the parent). This forces the parent task to wait.

Downloading News from Multiple Sources

That was a complex task! Now, let’s use it.

I pass my RSS feeds and article body locations to the core task. It runs for each source.

Then, I created a separate task for batch conversion to speech, specifying the input (text news) and output (audio news) directories.

Scheduled Downloads and Conversion

These are the tasks, but how do they run? On Tasker’s Profiles page, you can add triggers for your tasks.

Every day at 4 AM, save all news as text files (takes 5-10 minutes).

Every day at 5 AM, convert the text news to audio.

The Final Result

When I wake up, there are two folders in the News directory.

text contains the text versions, which I can share.

audio contains the audio news. Some local news still gets in, but the AI is doing its job filtering out sports.

I created a “Daily News” playlist in my music player to read the audio folder.

Updating the content brings in the day’s news. I still have to update it manually, but I’m working on automating that.

Playback is automatic. My car’s Bluetooth connection opens the player, and I use AIMP player, which auto-plays on open. No interaction needed.

Finally, a task clears the news folders at 3 AM daily, preparing for the next cycle.

Epilogue

My homemade news program has been working great for a few days. I can drive without distraction. The robotic voice is the only minor issue. I might replace “Say To File” with a better TTS API later.

This process solved a problem and gave me reusable subtasks. The subtasks for fetching content, parsing XML, extracting HTML, and querying AI are generic. I can now build other programs, create web scrapers, and even AI agents on my phone. Mobile scraping is great: no server costs, and it runs 24/7. I’ll explore it further as needed.

Resources

The more complex Tasks are shared publicly for free use. Simpler Tasks are omitted, as they can be built using Tasker’s built-in features.

Bulk TTS: https://taskernet.com/shares/?user=AS35m8mopd%2Bc1C7UhZNzgAc6Ld0oCTR8LzUJsfqb7SGyZq7NWeHANGDjDvTtBPSkNCjn3CrFQoI%3D&id=Task%3ABulk+TTS

Fix XML format: https://taskernet.com/shares/?user=AS35m8mopd%2Bc1C7UhZNzgAc6Ld0oCTR8LzUJsfqb7SGyZq7NWeHANGDjDvTtBPSkNCjn3CrFQoI%3D&id=Task%3AFix+XML+format

API- Groq (enter your key): https://taskernet.com/shares/?user=AS35m8mopd%2Bc1C7UhZNzgAc6Ld0oCTR8LzUJsfqb7SGyZq7NWeHANGDjDvTtBPSkNCjn3CrFQoI%3D&id=Task%3AAPI+-+Groq+%28enter+your+key%29

Fix file name: https://taskernet.com/shares/?user=AS35m8mopd%2Bc1C7UhZNzgAc6Ld0oCTR8LzUJsfqb7SGyZq7NWeHANGDjDvTtBPSkNCjn3CrFQoI%3D&id=Task%3AFix+file+name

Get inner XML(all siblings): https://taskernet.com/shares/?user=AS35m8mopd%2Bc1C7UhZNzgAc6Ld0oCTR8LzUJsfqb7SGyZq7NWeHANGDjDvTtBPSkNCjn3CrFQoI%3D&id=Task%3AGet+inner+XML%28all+siblings%29

Get inner XML(first match): https://taskernet.com/shares/?user=AS35m8mopd%2Bc1C7UhZNzgAc6Ld0oCTR8LzUJsfqb7SGyZq7NWeHANGDjDvTtBPSkNCjn3CrFQoI%3D&id=Task%3AGet+inner+XML%28first+match%29

Download specific categories of news from RSS: https://taskernet.com/shares/?user=AS35m8mopd%2Bc1C7UhZNzgAc6Ld0oCTR8LzUJsfqb7SGyZq7NWeHANGDjDvTtBPSkNCjn3CrFQoI%3D&id=Task%3A%E4%BB%8ERSS%E4%B8%8B%E8%BD%BD%E7%89%B9%E5%AE%9A%E5%88%86%E7%B1%BB%E6%96%B0%E9%97%BB

Download news from multiple channels: https://taskernet.com/shares/?user=AS35m8mopd%2Bc1C7UhZNzgAc6Ld0oCTR8LzUJsfqb7SGyZq7NWeHANGDjDvTtBPSkNCjn3CrFQoI%3D&id=Task%3A%E5%A4%9A%E6%B8%A0%E9%81%93%E4%B8%8B%E8%BD%BD%E6%96%B0%E9%97%BB

Follow-up

I rebuilt this using Google Apps Scripts to handle features that were tricky in Tasker. It’s now cloud-deployed and scheduled to run silently overnight. Plus, I integrated AI summarization for long-form articles.

Project Link: https://github.com/greenzorro/google-apps-scripts/blob/main/news_feed.md

Quantifying Design Value

hi@victor42.work (Victor42) — Wed, 25 Oct 2023 10:51:00 +0000

The Story

I recently had a major clash with colleagues in a group chat. Things got heated.

I’m a designer, though you wouldn’t know it from my posts. I mostly do UI and interaction design, but I also handle data reports and PPTs. Sometimes, I even code and build websites. Our design department acts as a central hub, fielding requests from other departments. I’m juggling four projects, two of which are UI projects only I can handle. My schedule’s packed.

Why the fight? My UI work was fully booked, but another department insisted I help optimize a data report (a consumer report on jewelry). It wasn’t even advanced data viz, just finding and swapping product images in a PPT, showing it to the client, and swapping them again if they weren’t happy.

I refused. It’s intern-level work. I’d help if I had the time, but it wasn’t jumping the queue. I stood firm. They argued that since I’d done it before, I should continue, and the client was pushing. We went at it.

They ended up finding another designer. Afterward, my manager asked me to share my scheduling method. It seemed like they’d complained, but I was booked solid. A company has limited liability; an employee shouldn’t have unlimited responsibility, right?

For now, they can’t do much. But to cover my bases, in case they went to the boss, I had a backup plan. I used my work schedule data to calculate time spent on each task, assessed each task’s value, and created some charts. The monetary values indicate the salary range of a designer capable of that work.

It’s clear: their department (in brown) has a huge chunk of low-value work – finding and replacing images, adjusting alignment and fonts – and it takes up a ton of my time. The boss cares about cost-effectiveness. Having someone with a 20K+ salary doing intern work? Who knows who’d get chewed out.

The fight’s over, and I won’t dwell on it. But the data handling and analysis were interesting, so I’m documenting it.

Data Source

This analysis was possible because I regularly collect data. I organize anything I consider data in a way that’s useful later.

I created a design schedule with a multi-dimensional table tool. I set the default view to a calendar and put it in my DingTalk signature, so anyone requesting work could see my availability.

Although I add work items in the calendar view, it’s a data table. For easy recording, I kept the fields simple: project name, designer (it’s for the whole team), start and end dates, requester, and duration (in days), which is calculated automatically.

When a new project comes in, I update the schedule immediately. To avoid conflicts, I’m motivated to maintain this data table.

The raw data was ready, containing 40 workdays (nearly 2 months of data). I exported it to Excel, changed the duration from text to numbers, and started a series of analyses (from right to left) to generate the charts.

Time Analysis

First, time analysis. This tab has two tables:

The left table pivots the raw data, showing time spent on each requester.
The right table maps each requester to major business lines, summarizing the time each line takes up.

The left pivot table: filter for a specific designer (me), list each requester as a row, and sum the durations.

The right table lists the major business lines, selects corresponding requesters from the left table, and sums the totals.

Selecting data from a pivot table is easier. Excel automatically writes the GETPIVOTDATA function; you just click, avoiding SUMIFS.

Value Analysis

Next, I analyzed how well my time was spent.

The Value Analysis tab has five tables:

Table 1 is the reshaped right table from Time Analysis.
Table 2 shows the percentage of each business line’s work in different value ranges (manually created).
Table 3 pivots Table 2 for easier use in Table 4.
Table 4 multiplies Table 1 and Table 3 to calculate the actual percentage of each work type in different value ranges.
Table 5 pivots Table 4, summarizing the total percentage of work in different value ranges.

Table 1: each business line is a row, and durations are summed.

The key is the format. In “Sum of Duration” settings, I changed “Show Values As” to “Percentage of Column Total” and the number format to percentage, getting each business line’s time percentage.

Table 2 is the core, but it’s subjective. It’s not super rigorous, but good enough for arguments and review. I tried to be fair, assigning value percentages to each business line based on experience. I swear I didn’t intentionally undervalue the other department; their vendor-like nature means their low-value work proportion is high. The designer salary ranges for the value tiers are based on my 10+ years of experience.

Table 3 pivots Table 2. It’s divided by value, then by business line. This structure is for Table 4, for easier viewing and data retrieval.

Table 4: multiply data from Tables 1 and 3.

Table 5 pivots Table 4, summarizing by value.

Charts

With the analysis done, it’s time for visuals.

Level 1: Show percentages of each business line and value range, using data from Tables 1 and 5. Create pie charts, add data labels, and adjust colors.

Level 2: Show the breakdown of business lines within each value range. Treemaps are best for this two-level hierarchical proportion data. Create a Treemap from Table 4, and adjust background and label colors to match the two charts on the right.

Enable Treemap labels to show names and values, displaying each business line’s detailed percentage.

Afterword

With this value analysis system, I just maintain the schedule. I import the data, update a few pivot tables, and the charts are generated automatically.

Even with limited raw data, there’s more to analyze: monthly workload saturation, average project cycle for each business line, and value composition fluctuation over a year.

The fight’s over, and I won’t bring this up to the boss, but it’s interesting that design work can be analyzed with data.

The "Self" in "Self-Media" is Deceptive

hi@victor42.work (Victor42) — Thu, 19 Jan 2023 13:45:00 +0000

I’m no online celebrity, but I’ve been around the internet long enough to offer my perspective on social media and “self-media” (we-media).

This isn’t a how-to guide for becoming a successful content creator. It’s about the reality of online content creation. Before you dive in, you need to grasp the fundamentals.

The Essence of Online Communities

Know Your Battlefield

Whether it’s TikTok, Douyin, Zhihu, or Bilibili, they all share a common core: they’re content distribution platforms, or communities. They connect creators with consumers. Creators get exposure, potential income, and fulfillment; consumers get information, entertainment, or simply a way to pass the time.

It’s a marketplace, driven by supply and demand. If creators aren’t producing what consumers want, nobody benefits. Creators lose motivation, and consumers move on.

Every community is a content marketplace, connecting creators and consumers for profit. Their aim is to efficiently match supply with demand. Creators reach a large, relevant audience; consumers consistently find content they enjoy. This leads to revenue for creators, and the platform takes a cut – like a mall charging rent. The internet industry is essentially real estate, but stores open for free, and rent comes later.

Creators accept this because platforms offer efficient distribution. Without them, profits might be lower, even after the platform’s cut. It’s a win-win.

All Communities Compete

Don’t Be Limited by Content Format

Communities want to retain both creators and consumers. Consumers have limited time; time spent on one platform is time not spent on another. It’s a scarce resource. Creators have limited energy. While they can post on multiple platforms, each has its own rules. Unless you’re already famous, you need to focus on a specific community to build a following.

Creators and consumers are finite resources, and since all communities connect them, they’re all in competition.

This means Zhihu and Douyin are rivals. It seems strange – one is for Q&A, the other for short videos. But the format doesn’t matter; it’s easy to adapt.

Common formats include text, images, audio, video, and live streams. Each community has a primary format or two. But for creators, content is king. They adapt to any format, seeking the most efficient match between supply and demand.

Text seems cheapest to produce – anyone can write. But creating text on a visual platform like Xiaohongshu? Easy. User-friendly editing software has lowered the barrier. Pick a background, some music, a text template, and you’ve got a decent short video, maybe even animated. Many popular Douyin videos are text-based “pseudo-videos.” Another option: appear on camera and read the text. Converting text to audio is similar.

And those are just the basics. AI tools are even more powerful. AI can create images from text, animate still images, generate realistic voiceovers, lip-sync photos, and even write the text itself. Even live streams have tools for beautification and special effects.

Creators with a clear vision aren’t constrained by format. To grow quickly, they prioritize a platform’s efficiency in matching supply and demand.

Two Types of Matching Mechanisms

Distinguish the Type and Nature of the Battlefield

Speaking of efficiency, let’s talk about Toutiao. It revolutionized supply and demand matching.

Before Toutiao, communities relied on search and following. Consumers searched for what they wanted or followed creators in specific fields. This was the search engine era. “Recommended” features existed, but were secondary. Search and following were central. I call this the “manual mechanism.”

Toutiao, Bytedance’s first major success, prioritized recommendation algorithms. Their engine powers all their products. For consumers, recommendations are more convenient than search – no typing needed. The platform learns your preferences. Browse casually, and the recommendations become increasingly accurate. You don’t even need to follow anyone. I call this the “automatic mechanism.”

The manual mechanism requires action from consumers – they tell the platform their interests. The automatic mechanism requires nothing extra. Recommendations are inherently more efficient.

My own content creation isn’t stellar. I have around 5,000 Weibo followers, with posts getting tens of thousands of views but few likes. On Xiaohongshu, I have almost no followers, yet some posts get thousands of views and dozens of likes. Views show distribution; likes show accuracy.

That’s the automatic mechanism at work. Bytedance is the only Chinese internet giant to truly conquer overseas markets with its software, leaving competitors behind.

Seeing Toutiao and Douyin’s success, other platforms are adopting the automatic mechanism. It’s now about the balance between search and recommendation.

Platforms like Douyin and Xiaohongshu are recommendation-heavy. But users also search within them, replacing Baidu.
Zhihu’s core is Q&A, with significant traffic from Baidu and Google. But the homepage also recommends content based on your preferences.
Specialized communities like Xiachufang (recipes) have users searching for specific dishes and browsing for ideas. It could easily be a 50/50 split.

What Creators Are After

What Kind of Success Do You Want?

Recommendation engines are efficient, but they can lead to homogeneity. They often tag creators, consumers, and content. Matching tags connect consumers with creators. Creators with similar tags compete based on “weight.” A niche creator gains higher weight than a generalist, leading to large accounts with narrow focuses. This is a byproduct of specialization, but not the same.

Two types of creators thrive: experts with high-quality content and those who mass-produce popular content cheaply. One focuses on quality (gross margin); the other on quantity (turnover). Self-media is a business, and businesses pursue these two goals.

The first path is challenging. Experts often assume their audience shares their knowledge, making content inaccessible. They need to explain complex topics simply, a rare skill.

The second path is more common, but risky. How can humans compete with machines in output? How can original content beat copied content? Large accounts often have systems for collecting, copying, and rebranding content. They gather quality content, copy it formulaically, make minor changes, and add their branding. Anything that builds their persona and isn’t low-quality is used. The creator might not even understand or agree with their own posts. Self-media becomes a job, and the algorithm their boss.

Some creators aren’t after fame or money; they just want to share. Their profiles feel genuine, unlike the monotonous feel of most accounts.

Creators who prioritize authenticity can ignore all this. But authenticity and large followings are often at odds.

The Mindset Creators Should Have

Making Your Content Creation Journey Easier

Most platforms use likes to measure influence. But likes are a result; focus on the cause: comments.

Why?

Comments are the highest-effort interaction. Likes, saves, and shares are binary: like/dislike, useful/not useful, fits my persona/doesn’t. They’re distinct. Only comments are open-ended, capable of replacing the others (even sharing, by @-ing friends). If the other interactions aren’t enough, users comment.

Existing comments can also discourage new ones. Commenters want exposure. If a post has many high-interaction comments, new commenters are less motivated. Likes, saves, and shares don’t have this issue.

So, except for posts designed to provoke, comments are usually the fewest, representing the highest-value interaction.

To boost comments, reply actively, keeping the topic alive and the algorithm engaged. This also encourages potential commenters. But the online world is extreme. Behind screens, people unleash negativity. Unfriendly comments are a cost of growing a large account.

How to mitigate this? First, define your account’s purpose: career or hobby? Fame and fortune, or personal expression?

If it’s a career, treat it like a business. Consumers are data, like chickens on a farm: feeding, temperature, egg production. Interactions that boost comments and likes are valuable. A hater sparking an argument is more valuable than a supporter saying, “Well written.” You might even fuel the fire, then disappear, letting it continue.

If it’s a hobby, distinguish between human voices and noise. Abusive commenters are one-dimensional binary creatures. One-dimensional: they grasp only one variable. Binary: they see only black and white. They’re background noise. When the noise is low, focus on the human voices. When it’s loud, put on headphones and ignore everything, even the human voices. This is your space. For information, use your homepage feed, not your comments. And you don’t have to be both creator and consumer on every platform. Post here, consume there.

Conclusion

The world of self-media isn’t a free utopia.

The idea that experts producing great content automatically succeed is a rare, feel-good story. The reality is, driven by platform interests, the system doesn’t encourage authenticity. It encourages creating a persona, targeting popular topics, and churning out content.

Mentally, you must be machine-like, abandoning normal etiquette and acting like a customer service line: “Press 1 if it’s useful, hang up if it’s not.”

If you’re still undeterred, congratulations. You’ll gain more than fame and fortune. Content creation is a learning experience, and that might be its greatest value.

An Icon for "Operations"

hi@victor42.work (Victor42) — Sun, 04 Jun 2017 13:42:12 +0000

It started with a simple task: design an icon for “Operations.” We’re a recruiting company, and we needed a visual. Representing an abstract concept with a concrete image is tricky – lots of choices, but nothing feels perfect.

So, I went back to basics. What’s the core message? I’ve worked with Ops for ages, but did I really understand their role?

As a product designer, I don’t interact with Ops much. It always seemed, well, diverse. That was my vague impression. Some visual designers? They view Ops as unreliable, inconsistent, even contradictory, ignoring branding and pushing things on users. I can recall almost every time a designer I’ve worked with has vented; Ops is usually high on the list.

Of course, it’s simplistic to stereotype. My girlfriend interviewed at a startup. The CEO, in the final round, discussed her hobby, ancient Chinese history, and her pragmatic nature. He was surprised. “How can a designer be like you? Aren’t they all artsy? Concerts and museums?” I was shocked too. How far can such a narrow-minded CEO lead?

You can’t pigeonhole people or professions. You’ll miss valuable learning opportunities. You have to assume that true Ops professionals aren’t like that. I researched and talked to my Ops colleagues. It was my first real attempt to understand their work – and yes, it’s very diverse.

Operations can be split into three areas: content, user, and campaign. All are equally vital. My take:

Content Operations: Creating and maintaining content.
User Operations: Focusing on user behavior, guiding it towards business goals.
Campaign Operations: Creating growth opportunities through planning and resource integration.

Channel, community, and new media operations are categorized by medium. The specific tasks depend on the medium.

“Operations” comes from “Operate,” as in COO. “Operate” usually means to control. I liked an online analogy: If product and design build the ship, operations sails it. It’s a fundamental difference in mindset. Shipbuilders use static thinking: How’s the structure? Ideal state? Wind resistance? Sailors must use dynamic thinking: What if currents shift? Suez Canal or Cape of Good Hope? Shipbuilders can use dynamic thinking, but they can still function without it. Sailors constantly face change. Only the destination is fixed.

This is my first team with such specialized Ops roles. Seeing the business through their eyes has been eye-opening. It began with a design review debate.

We were redesigning our mobile site – a visual refresh, mainly. Functionality wasn’t the focus. Homepage requirements: “Emphasize search, de-emphasize the banner.” I agreed initially. Job seekers want efficiency, not browsing. Search is key. So, I proposed this:

The search box is larger, with a shadow for emphasis, visually dominating the banner. Mission accomplished. The 2x3 grid of job recommendations below is essentially searches for those keywords. Placing the search box nearby could transform “recommended jobs” into “popular searches,” broadening search’s scope.

The product manager loved it, but Ops objected strongly. I explained the benefits, but Ops couldn’t accept the banner being obscured. The PM surprisingly joined in, arguing fiercely with Ops. Their stances were clear: Product prioritized efficiency and minimal distractions; Operations wanted a vibrant feel and rich content. As designers, where did we stand?

How do we see banners? Annoying, mostly. Often irrelevant, space-consuming, and visually jarring. Many PMs likely agree. They see it as Ops’ domain, something to ignore. If Ops wants a say, give them a banner and let them handle it. I’ve thought that way myself.

PMs and product designers are usually rational, logical, efficiency-driven. It’s the job. As an engineer and pragmatist, I believe a good product sells itself. Perfect it, and growth follows. That’s not wrong, but if you think Ops-driven projects are less intelligent, mere grunt work, you’re mistaken.

The search box debate ended inconclusively, the design killed by a majority vote. But, something Ops said near the end resonated:

“We use this banner for partnerships, resource swaps. People will say, ‘It’s not prominent. It’s on the homepage, but blocked.’ We can’t offer that.”

We could’ve solved the blocking. The point is, did we, designers and PMs, ever care about Ops’ needs? Did we consider their role in the bigger picture?

Many companies succeed through refined products. Sketch is an example. But not every industry has that level of technical depth or differentiation. The more grounded the industry, the tougher that path. Ruthless expansion, dominating the market – that’s another, winner-takes-all strategy.

Our team is in that kind of industry, and our strength is Operations. So, why not play to our strengths? I often face situations against my design principles. I follow Ops’ lead, using design to achieve their goals, but we find a balance. Alignment is more crucial than individual principles.

What is product design? It’s not so lofty. Designers have a mental scale, measuring user experience. A slight shift can turn a well-meaning nudge into manipulative, brand-damaging dark patterns. We warn against that, but it rarely reaches that point. We represent users, defending their interests. But the foundation is helping the company profit. On that, we and Ops are aligned.

So, these colleagues deserve design support. Create unique elements for their diverse display needs. Track and improve metrics for their monetization needs. Establish guidelines for their partnership needs. Shift your perspective, and you’ll see your work is still valuable.

Back to the icon: What represents “Operations”? I drew a simple bar chart: X and Y axes, two bars, one taller. It’s abstract, and Ops’ work varies greatly, but business growth is the shared goal.

If we don’t understand the sailors, how can we build a good ship?

My Roommate's Ride Home

hi@victor42.work (Victor42) — Thu, 19 May 2016 16:42:02 +0000

This true story about interaction design gave me some insights into ride-hailing apps. It all started with my old roommate.

The Roommate

We called my college roommate “Boss” – he was assertive, thought differently, and often dropped unexpected truth bombs. He’s an embedded systems engineer, utterly obsessed with the field. He’s also a hardware whiz.

Around 4 PM, he used Didi (China’s Uber) to visit me for dinner. He lived about 6km away. We hadn’t seen each other in a while, so we had a lot to catch up on.

We started talking about a classmate’s wedding, and the conversation naturally drifted to his area of expertise. He went on about algorithms, development philosophies, different ways to control electric motors, and even battery management systems for electric cars. That’s just how he is. He knows I don’t get most of it, but he keeps going, regardless. Even though I only grasped the basics, I listened patiently. His passion is infectious; it’s not painful to hear him talk about this stuff. He reminds me of my calculus professor, who would pause mid-lecture, reflect, and exclaim, “Isn’t the proof of this equation beautiful!” I hated that class, but I respected that professor.

Our major was electronic information. Maybe less than 10% of the class understood the core courses, and we were both in the majority. In our senior year, it was like a switch flipped. He suddenly became super interested in our major, catching up on previous courses and studying beyond the textbooks. He later told me he finally saw how this knowledge applied to real projects – it was actually useful! That’s when his passion ignited. With his dedication, I’m sure he’s a big shot in the industry now.

But he’s clueless about internet products. He still uses an iPhone 4, with very few apps, all on one screen. No folders, no icon organization, and the dock still has the four default iOS apps. It shows that even someone as studious as him won’t waste time on things he doesn’t care about.

I asked if he took a regular ride or carpooled (I wasn’t precise, because I also use Uber, so “regular ride” meant Didi’s Express option). He wasn’t sure, saying it was probably a regular ride since there were no other passengers. I asked the cost, and he said 14 yuan.

After dinner, around 9 PM, he used Didi to head back. I watched him, and noticed a few interesting things:

Calling a Ride

He first tried hailing a taxi, hesitated, then tapped “Hitch” (carpooling). Realizing I was watching, he asked, “Should I choose Hitch?”

I suggested Express, thinking it was what he used to get here, and it would be familiar.

He selected Express and entered his destination. He hadn’t set “Home” or “Work” addresses, so he had to type it in. He tapped the pickup location first, but didn’t realize it.

As he was about to enter his home address, I pointed it out and gestured to the text prompt: “Where are you?”.

He backed out, tapped the destination, but then realized he didn’t remember the exact address. So he backed out again, tapped the top-left menu, and went to “Trips.” He tried to copy the pickup address from his earlier ride, but couldn’t.

He went back to the ride-hailing screen and typed in his home address. The list showed several results: shops, bus stops. He just waited. I didn’t say anything, observing.

But his eyes had left the phone, and his hand lowered. He thought he’d successfully called a ride.

I had to tell him again to choose an address to confirm his destination.

He picked a bus stop, tapped “Call,” and finally got a car.

However, his Express ride back, without carpooling, only cost 5 yuan. So I guessed he probably didn’t take Express on his way here.

The Problem

After he left, I wrote down the process. Thinking about his actions and mindset, I had some insights.

His ultimate goal was to go home. Unsure how the app would react, he assumed it would understand his intentions like a real person. Since you picked me up from home earlier, you should know where my home is, and I want to go back there. His struggle to copy the earlier pickup address clearly showed his goal: a return trip – going back where he came from.

But reality wasn’t perfect. Even a real person can’t always understand another’s thoughts.

The gap between the result and his expectation created a problem. The root cause wasn’t that he didn’t remember his address; it was that he thought Didi should know his home, but it didn’t. The “Home” and “Work” addresses are designed to solve this. However, few people proactively set these, even some IT professionals.

This got me thinking. Aren’t these two separate issues?

Didi can’t intelligently know or guess my home.
Didi doesn’t offer a convenient return trip option.

In my roommate’s case, “going home” and “return trip” overlapped. But they’re not always the same, so let’s consider them separately:

Going Home

From a mental model perspective, going home is switching between “at home” and “outside.” Calling a ride to leave exits the “at home” state. Until you step back inside, you’re “outside.”

For example, if someone gets a call about a package while they’re out, they’d say, “I’m not home,” or “I’m out, please leave it with management.”

“Outside” is uncertain, but “home” is relatively fixed. By analyzing historical trips, visit frequency, and arrival times, it should be possible to guess. For users without set addresses, if the app detects frequent trips to the same area, it could prompt: “We noticed you often go to [location]. Is that your home? Or work?” Recommendations and guidance could encourage users to set addresses, making future trips easier.

Return Trip

Is a return trip common? I don’t have the data. But this scenario is typical: going from home (or work) to a place for leisure or errands, and returning the same way on the same day.

If it’s between two frequent places, “return trip” doesn’t quite fit. We’d think “going home” or “going to work,” part of our routine. A return trip implies a temporary, less frequent location, subconsciously feeling like a short “business trip” or “outing.” Like people going home for the holidays; when it ends, we need a “return” ticket to the city we live in. The destination doesn’t matter; the key is that having come here, we can get back. The “return trip” concept becomes clear.

But a meaningful “return trip” concept doesn’t mean a “return trip” feature is meaningful. The most obvious approach is a “Return Trip” button on the main screen, allowing a one-tap ride to the last trip’s starting point. But there are problems. What if the user hailed a taxi on the street? Or got a ride from a friend? The user wants to go back; they don’t care how they got there. The app can’t know this, so a return trip option would just add confusion.

The key issues are:

How can Didi track user travel history using other methods?
How can Didi know if a destination is temporary?

A mobile app alone can’t solve these. Therefore, the “return trip” concept might be meaningful, but it’s not something a single mode of transportation can provide.

A Newbie's Perspective

hi@victor42.work (Victor42) — Sun, 13 Sep 2015 15:50:00 +0000

Work had slowed down, and with some personal stuff happening, I’d fallen out of touch with tech and design. After a few days, I felt off, like I was losing my creative spark.

Normally, I’d dive back into work or explore new designs. But this time, I went further. I suspected something valuable was hidden in that “ignorance is bliss” mindset.

So, for two months, I avoided design blogs, tech news, and industry trends. I cut myself off as much as I could. I stuck to basic work, kept up my weekly translations, and that was it. My free time was all about personal life: hanging out with friends, dining out, sleeping in, gaming, reading, and watching movies. It was a great life, honestly, but it felt off.

Initially, I was anxious, feeling myself getting rusty. Design ideas became scarce. Then, I adapted, even got comfortable – a no-brainer, right? Finally, I was immersed. A different life, a different mindset, with its own way of operating and perceiving. And new ideas started emerging, the most valuable part of this whole experiment – I’d successfully become a tech newbie, seeing the world from their perspective: what mattered to them, and what didn’t.

Newbies Aren’t Dumb, They Just Don’t Care

We tech and design people tend to look down on newbies. Like, “You don’t know you can change your profile pic by tapping? You turned off notifications and now you’re complaining? You left your files at home? Heard of the cloud?”

Now, I’m one of them, and I understand the “tech ignorance.” My tech instincts haven’t disappeared; I can still figure things out faster than most. But now, I’m impatient. I have novels to read and games to play.

My 16GB iPhone 5 was constantly complaining about low storage. I used to check storage, meticulously clear caches, and delete downloaded data. I still can’t remember which apps let you clear the cache and which don’t. As a newbie, I found the easiest, most drastic solution: delete WeChat and QQ, then reinstall. Boom, hundreds of MBs freed up.

It sounds extreme, but it’s logical. Deleting, reinstalling, and logging back in takes five minutes, max. I know exactly how to do it. Clearing caches might take two minutes, but that “might” is key. What if it takes 15 minutes and doesn’t even work?

If I don’t view my phone as “fun,” I won’t waste an extra minute on it. It’s not central to my life.

Notifications, Updates… Who Cares?

My friends, Dee and Shuai, both in IT, have completely different phone setups. Dee’s is a classic product manager: tons of folders, neatly organized by function. Shuai’s is the opposite: few folders, many screens, endless scrolling, and red notification badges everywhere. On his home screen, the App Store badge showed over 70 updates.

It’s the classic “red dot OCD” debate. I used to update everything, open every notification (though not necessarily read them). It was like doing daily quests in a game – I had to clear those exclamation marks before logging off. Dee used to tease Shuai, “You’re a front-end engineer, and your phone looks like this?” I didn’t chime in, but I did think it reflected someone’s self-discipline.

Turns out… it’s not that at all. At some point, I became indifferent to the red dots, probably because of WeChat Official Accounts. An app can only push so many notifications; you can clear them quickly. But subscribed accounts? You follow first, worry about reading later. When notifications flood in, you become numb, and the red dots lose their significance. WeChat’s like, “Blame me?” You’re like, “Blame me?” Nobody’s fault.

It’s like a zombie apocalypse. I enter a supermarket with a gun. If there are two zombies, I’ll eliminate them and lock the door. If there’s a horde, even with enough canned food for 20 years, I’m out of there.

Lately, apps have gotten creative with their App Store update notes. Opera Coast used to write clever one-liners; Medium wrote poems. I’d chuckle, briefly amused. Then… I wouldn’t open the app. I’d just return to the home screen, every time.

What the Heck is a “Field”?

I haven’t touched front-end tech in about a year and a half. I don’t need to anymore. I have to strain to recall some tech concepts. Now, actively avoiding tech, I was slightly worried this might be a turning point in my design career.

One day, I was signing up on a website, and it said “This field is required” next to an input box. I knew what “field” meant, but it felt alien. What the heck is a “field”? I stared at the words, wondering if the developer had mistyped something.

And “cache,” mentioned earlier, I’m going to ask my mom this year, “What do you think ‘Clear local cache’ means?” If she says it clears her location info, I’ll take it, because I’ve thought that too.

At work, there’s a constant debate about a design detail: after a complex process, should there be a “back” button? Where should it be placed? As a newbie, my actions showed me it’s irrelevant. I follow the product’s flow, going in and out step by step, naturally. Let me paint the picture: task complete – press home – (if it’s a battery hog like maps, double-tap home to close it) – lock screen – back in pocket. I found that excessive, so after some lazy attempts, I figured it out: task complete – lock screen – back in pocket. The key is “back in pocket”! That’s my end of the process, not exiting your feature.

Back to being a newbie, holding this perplexing glass screen, I just want to share a song from NetEase Cloud Music to Sina Weibo. It says my Weibo authorization expired, so I need to log in again. I patiently enter my username and password. This happens frequently, in other apps too. But for the first time, I instinctively blamed NetEase, not Weibo. Then I realized, NetEase was the scapegoat.

If I were a true newbie, I might never realize that, and NetEase would be forever blamed. Tech details, product logic, I don’t know, and I don’t care. The situation tells me someone’s at fault. Maybe I slip on a wet floor in a restaurant, and a server apologizes for the cleaning crew, and that’s that.

Are You Reminding Me, or Am I Reminding You?

You rarely see people using Siri, right? I understand. Talking to a device in public, hoping for the correct response, feels awkward. It’s noisy, and it might pick up random sounds. Plus, it’s a privacy concern; people know your business.

But, it works. I’m walking home, listening to music, and remember I need tissues. I’ll forget by dinner. So, I long-press the earphone button to activate Siri: “Remind me to buy tissues at 9 PM.” No need to even pull out my phone.

I used to be a productivity app fanatic: email, calendar, notes – all front and center on my home screen… though I rarely used them. I tried every to-do app, so many well-designed ones, each with unique features. I settled on Any.do, loving its simplicity. Pull down to add a task, swipe right to complete. I categorized tasks by context: “buy laundry detergent” under “life,” “update annotations” under “work,” “research Pixate” under “learning.” Tasks with deadlines went into their calendar app, Cal. I was meticulously managing myself, precisely as Any.do intended.

Then I lazily used Siri once, and I couldn’t go back. I’m a newbie, not a pro. Bamboo reminds me to buy fruit; HR reminds me to make a name card for the new hire. What’s the difference? At some point, I remember something I need to do, and that’s all. Why report back to the to-do app afterward? Is it reminding me, or am I reminding it?

Once the reminder pops up, I don’t need it anymore. If you could do it for me, great, tell me the result. But you can’t, so you remind me, and I do it myself. No app can cook me scrambled eggs with tomatoes. Finishing the task on time is the best self-management. Who cares if the to-do app is a mess?

A good servant comes and goes as needed.

We’re Penny-Pinchers, Especially with Time and Money

I moved to a place with a KFC, my go-to when I can’t decide on dinner. KFC is great; they have mobile payments, so I only need cash for my bus fare.

Alipay has had an 8.8% discount forever, and Bamboo and I always get it before ordering. Her phone is still on 2G, so it stalled halfway. We found a table, struggled with it for 10 minutes, finally got the discount, and ordered.

Sometimes I go alone, same no-network problem, probably the carrier’s fault. I’m too impatient to deal with it, and I don’t want to hold up the line. Five bucks isn’t worth two minutes of a hungry queue’s time. If mobile payment fails, I just use cash. Pull it out, hand it over, get the change, pocket it, done. And I don’t have to stare at a tiny screen, trying to tap even tinier buttons.

Same situation, two completely different reactions. Bamboo wants the discount, even if it takes 10 minutes. I’m starving after walking across Hangzhou; I don’t want to wait a second. Neither has anything to do with mobile payments.

Looking at my WeChat history with Bamboo, it’s nothing significant. We see each other all day; urgent matters are a phone call, non-urgent things can wait till we’re home. Even so, we’re constantly sending each other food delivery coupons. The shifts in our chat history are revealing.

For a while, we’d send each other Ele.me coupons around lunchtime. One day, she started sending Meituan coupons; I kept sending Ele.me. After a few days, I switched to Meituan too. Then, I started sending Ele.me again, and she followed. Recently, we both switched back to Meituan, almost at the same time. What happened?

I randomly remembered this and asked Bamboo why we kept switching. She accused me of copying her; I said she copied me later. We hashed it out and reached the obvious conclusion: Ele.me had a “15 RMB off 8 RMB” deal, so we started ordering takeout frequently. The discount dropped to “15 RMB off 6 RMB,” and Bamboo discovered Meituan’s “15 RMB off 7 RMB.” I was slower, but one day I felt like it and installed it, and rarely opened Ele.me after that. But I didn’t delete it, until I saw it had a “20 RMB off 12 RMB” deal, and I started using it again, keeping Meituan too. Obviously, Bamboo noticed as well. It didn’t last long, of course, it was 12 RMB off! We watched it drop to “15 RMB off 8 RMB,” then “10 RMB off 6 RMB.” And we happily started sending each other Meituan coupons again.

If you’re going to have a price war, nobody cares about usability.

Final Thoughts

Now, turn on your phone, glance at your home screen icons. Think, are they trying their hardest to get your attention? Look here, look here, look here! But I’m a newbie; I just want to check the bus route to the subway. Everyone’s enthusiasm stresses me out. I dive into the maps app, find my route, shut off my phone without looking back, and go on my way.

My two months as a newbie felt schizophrenic. In a good mood, I’d tap anything, download random games and apps, and forget how I found them the next day. In a bad mood, everything was noise. I’d pull down the notification center, it was a nightmare, and I’d silently push it back up, pretending I hadn’t seen anything.

It’s hard to grasp; people are so unstable. Newbies are fickle; they change their minds; opening an app is basically mood-based. During this time, I felt like my thinking was stream-of-consciousness, my actions were “goto” statements, unpredictable.

I thought the newbie state was temporary, but it’s a great feeling, and part of it has permanently influenced me. There’s more to say, but I don’t want to write anymore. While writing this, the designer in me is resurfacing, the newbie feeling is fading, and there are some mindsets and perspectives I don’t want to give up.

The conclusion might be a bit pessimistic, or maybe there’s no constructive conclusion at all. But during this time, I experienced what was real, and maybe this is what tech life should be.

The Eternal Life of Machines

hi@victor42.work (Victor42) — Sun, 23 Aug 2015 22:39:00 +0000

Summer in Hangzhou was brief this year, quickly giving way to cool, rainy weather. Walking beneath the streetlights, the city’s nightscape reflected in the puddles. My gaze landed on the plain, beige, checkered folding umbrella in my hand. Umbrellas were invented during the Spring and Autumn period by Yun, Lu Ban’s wife. Their purpose was simple: protect from sun and rain, much like the old oil-paper umbrellas. They’ve been around for over 3,000 years, largely unchanged. Why?

Consider the evolution of the Chinese character for “umbrella” (伞). It’s quite telling – it’s looked like this since ancient times. Compare the character to umbrellas, past and present. Have the ribs really changed that much?

Today, we have straight and folding umbrellas. Folding ones even come in three-fold and four-fold versions. There are unconventional designs, like the Senz umbrella. But open them up, strip away the fabric, and they’re fundamentally the same. You probably see my point. It’s not about how umbrellas could be improved, but why they haven’t been replaced.

There are alternatives. Raincoats are a classic, but less convenient, used mostly when we need both hands free. The Air Umbrella uses air jets to create a shield, pushing raindrops away. I haven’t tested it, so I can’t speak to its energy use or noise. But one thing’s certain: any energy-using umbrella will always cost more than a purely mechanical one. This will hold true, no matter how technology advances, until umbrellas disappear entirely.

Other alternatives surround us: cars, buildings, underground walkways. If anything truly obsoletes mechanical umbrellas, it won’t be a new umbrella, but a combination of factors. Perhaps garages will become ubiquitous, cities will develop extensive underground tunnels, or covered walkways will proliferate. Maybe, like Asimov’s Trantor, the entire planet will be domed. I certainly hope not.

But I’m getting sidetracked. Let’s not dwell on how umbrellas might vanish. Instead, why have they persisted in this form for 3,000 years? Is this their optimal form?

I believe so. By “optimal,” I mean the most enduring, lowest-consumption way for umbrellas to coexist with us. There are things we only think about when needed. Otherwise, we don’t care. Umbrellas, air conditioners, streetlights, map apps, spare tires… What do we want from them? Durability and low consumption. If I wear a watch just to tell time, why buy an Apple Watch and charge it daily?

Mechanical umbrellas excel in both. First, low consumption: money, space, time, effort. Folding umbrellas are already optimal: light, compact, and zero-energy, apart from the calories burned opening and closing them. Imagine a scenario where, by some mysterious force, we lost all electricity – no computers, lights, batteries. What would still be valuable? My bicycle. Purely mechanical, human-powered things are inherently zero-consumption.

Then, durability. Many mistakenly believe advanced things are less prone to breaking due to “better quality.” Not true. Adding advanced tech grants powerful functions, but also increases complexity. Complexity shortens lifespan. It’s a law of physics – without external energy, maintaining a stable, ordered state long-term is impossible, regardless of quality. The most enduring way to preserve text and images? Not hard drives. Ancient paper, ink, and bamboo slips can last millennia; electronic media can’t. Even paper and ink decay. Stonehenge comes to mind.

Of course, we don’t need heirloom umbrellas. But I also don’t want it demanding attention or wasting energy. This is where mechanical devices shine. We’ve seen the smart home appliance craze. Smart chips are crammed into everything, providing computing power, network connectivity, and data transmission.

I once thought appliance control would centralize into a single remote, an app, or voice activation. But that doesn’t hold up. A mechanical light switch can last decades. To add another way to turn on a light, we add a wireless module, constant power, maintain Wi-Fi, incorporate voice recognition, handle the coordination between electronic and mechanical controls, occasionally replace components, and bear the costs… I’d rather just install extra mechanical switches.

A rational look at technology and progress shows that nothing goes to extremes; things settle into their most suitable form. For items with simple functions and structures, mechanical control is their destiny.

This is the eternal life of machines.

The Role of a Designer in a Startup

hi@victor42.work (Victor42) — Sun, 12 Oct 2014 10:19:24 +0000

It’s been over a year since I started a business with my buddies, and it’s been a blast. I want to share this experience and discuss my role as a designer in a startup.

There’s no one-size-fits-all answer; a designer’s role depends on more than just design. If you’re confident and have strong ideas, you might lead product direction. Or, if you’re a stickler for pixel-perfect details, you can find your niche. It’s about proactively finding your place in a changing environment.

Getting Started

Our main product is a parking app, so most of my work revolved around its design. Initially, we were mostly part-time, cobbling together the prototype on weekends. No wireframes, no detailed specs – just core functionality. My job was to quickly create basic UI mockups for discussion and development. We had teammates focused on product positioning, and we were all aligned.

“Quickly” actually took a while; it was my first mobile project. I’d only done web design, with some mobile dabbling on my personal site. The first version had fewer than 10 screens, but it was still daunting. Dealing with Android’s resolutions and the iOS 6 to iOS 7 style shift was a steep learning curve. It was a new world, and I had to shed my web experience and start fresh.

In the early stages, when everything is stretched thin, you naturally do what you do best: focus on visuals and interaction, and help turn the startup idea into a reality.

Integration

Once the core product launched, the main roles were set. We realized the manpower shortage was even bigger than expected. There was a ton of tedious but crucial work: promotional materials, third-party API applications, app store listings, etc. And things not directly product-related: company registration, office space, interviews. It was tough to assign these tasks. If I could do it better than others, I did it.

Developers always had more work, and business development was limited by external factors. Designers often have more free time at this stage. I couldn’t just relax. I became a multi-tasker, filling the team’s gaps. The startup was a steel frame; I was the cement.

When developers struggled with UI implementation, I learned their principles, weighed priorities, and made adjustments with them. I made detailed UI annotations. I found a tool to link UI mockups with click-throughs, clarifying the business logic. Marketing was just starting, and having a professional designer boosted results. Banners, Weibo templates, company and product websites – I’d quickly create them. I also thoroughly tested each version, providing detailed bug reports. If I wasn’t the best person to solve a problem, I’d note it for the team to prioritize. Speed was key. Early on, getting these supporting tasks done matters more than perfection.

This stage is messy and fast-paced. The goal is to adapt, integrate, and connect scattered tasks.

Review and Consolidation

As the team progressed, the product entered a stable iteration cycle. Marketing and business development improved. Design workload stabilized, and it was always less than other roles. Everyone’s work was specialized; I couldn’t help much.

I saw this as a chance to pause, review, connect the dots, and think strategically. The significance of the previous stage’s tasks became clear: our design lacked a soul. We lacked standards. The product, materials, and modules were disconnected. The external image was inconsistent, mostly improvised. It looked okay, but it was white noise, not a melody.

So I dove into iOS and Android guidelines, comparing design styles and studying leading products. I felt like starting over, but changes must be gradual.

I created the company’s VI system and applied it to all external materials. I extracted a color scheme and visual style, refined them, and wrote guidelines. App components were unified, with platform-specific differences. Standing on the shoulders of giants is wise.

Everything can have standards: visuals, interaction, animation, sound, data display, units… I wrote them down and kept adding. It’s a long-term project.

Beyond tangible standards, we needed to establish abstract ones. What impression do we want to create? What emotions should we evoke? I’m still pondering this. While this can be established early, it’s unstable. Business and product changes affect it. It takes time, iterations, and refinement for it to emerge. That’s the design’s soul; you can’t force it.

With standards and guidelines, design became simpler, and results improved. Standards drive consistency, and consistency refines standards. This should be done early. When the team grows, its impact is even greater.

I had more time, perfect for fixing legacy issues! Newbie mistakes and edge cases needed addressing. One basic mistake: tiny click areas in our early Android version, violating the 48dp standard. These problems were in core functions, so fixing them was urgent. I also revamped the product website with new technologies.

By now, the team had good chemistry. My teammates’ abilities drove the product. I couldn’t fall behind. I needed to improve through learning and apply it immediately. I learned more that year than in the previous three combined: mobile development, responsive design, HTML5 animation, AE motion graphics, browser APIs, even drawing. Most importantly, my design skills improved.

Keep exploring, venturing beyond design, injecting fresh ideas. Think of yourself as a one-person Google X – a job to get designers’ hearts racing.

Conclusion

Starting a business is exciting, but tough. If you’re prepared to start or join a startup, you’re not just a designer, but an entrepreneur – a problem-solver. Your responsibilities include anything you’re good at that helps the team. It depends on your expertise, personality, and thinking. You’re part of the team, driving it forward.

Product on Victor42

What is an AI Native Data System

AI Agents Have Come a Long Way

The Forms and Functions of AI Agents

The Three Tests

Test 1: Air Force Fighter Lineup

Test 2: Geological Ages Report

Test 3: Personal Information Exposure

Role in Non-Standardized Tasks

Hands-on Guide to Non-Standard Workflows

Postscript

UI Canvas Size Calculator

Background

The Calculation

The Task

The Result

Can AI Make PPTs Independently Now

Skywork

Coze Space

Manus

Lovart

Thoughts

Fed Up with News Apps, I Added Some AI

Background

Implementation

Building Blocks

Tasker Intro

Fetching Content

Parsing XML

Extracting Content from HTML

AI Classification

Text to Speech

Putting the Pieces Together

Downloading and Filtering News

Priority Issues

Downloading News from Multiple Sources

Scheduled Downloads and Conversion

The Final Result

Epilogue

Resources

Follow-up

Quantifying Design Value

The Story

Data Source

Time Analysis

Value Analysis

Charts

Afterword

The "Self" in "Self-Media" is Deceptive

The Essence of Online Communities

All Communities Compete

Two Types of Matching Mechanisms

What Creators Are After

The Mindset Creators Should Have

Conclusion

An Icon for "Operations"

My Roommate's Ride Home

The Roommate

Calling a Ride

The Problem

Going Home

Return Trip

A Newbie's Perspective

Newbies Aren’t Dumb, They Just Don’t Care

Notifications, Updates… Who Cares?

What the Heck is a “Field”?

Are You Reminding Me, or Am I Reminding You?

We’re Penny-Pinchers, Especially with Time and Money

Final Thoughts

The Eternal Life of Machines

The Role of a Designer in a Startup

Getting Started

Integration

Review and Consolidation

Refinement and Exploration

Conclusion