Excel on Victor42

Turning Photoshop into a Machine Gun with Excel

hi@victor42.work (Victor42) — Thu, 13 Jun 2024 14:05:00 +0000

I heard Marketing was tearing their hair out. The boss greenlit the new course cover design, and now they needed to update all 800+ existing covers. It wasn’t a simple find-and-replace; there were tons of small differences. Marketing has only one designer, and they were slammed. Doing it in-house? No way. Outsourcing would cost 20 RMB per image, totaling 16,000 RMB – a budget buster.

Bingo! 16,000 RMB? My ears perked up. I love automation. A data geek who knows Photoshop? This was my moment. People talk about the “value of design.” But what is your value? How do you put a number on it? Saving the company a designer’s monthly salary in half a day? That’s real value. Plus, it’d be great for my year-end review. I jumped on the task.

The Challenge

This is the template the marketing designer created. No use criticizing – the boss wanted this style. Simple. The basic need was also simple: replace three text areas and generate 800+ images.

Most designers would think, “Piece of cake! Define some variables in Photoshop, create an Excel sheet, and batch export.”

If you don’t know how to batch output with Excel and Photoshop, check out this tutorial: https://zhuanlan.zhihu.com/p/33725280

Yeah… that’s the gist. If it were that easy, you could just follow the tutorial, and this article would be done.

But, once I saw the template, I realized it was much trickier. The variations were crazy:

Over a dozen course categories, some with unique backgrounds, others sharing.
The top category wasn’t always text. Two (Taobao and Tmall) used logos – images.
Course titles: one or two lines. Single-line titles needed vertical centering.
Text color changed with the background – a tinted shade, not pure black.
The bottom description text wasn’t always there. If missing, its decorative box had to go too.
The box’s line color also changed, matching the text but brighter.

Think. Could you handle this with Photoshop variables? Sure, you could make a dozen PSDs. But I wanted just one.

Yes, it was possible.

But it needed a designer who was also an Excel expert.

Designing the Excel Data Model

The complexity meant I needed to think about the data model first.

Programmers might laugh. “Data model? For a simple image?”

Don’t @ me! I’m just using the idea. Look, if you just want to finish, anything goes. But for top efficiency, you need a data model mindset. What’s that? The operations team fills in the least info, and I do the least work per export. This was ongoing, so I needed low marginal costs. The initial setup could be complex; that cost was less important.

So, what columns did we need?

Course Category
Course Title
Description
Background Image

Obvious ones. Adding the variations, the real list was:

Filename: Controls the output filename, arranged logically.
categories: The dozen-ish categories, shown at the top, determining the template.
Title Line 1: Titles can be one or two lines, split for manual line breaks.
Title Line 2: Optional; if blank, it’s a single-line title.
Description: The optional keywords, determining if the box below is shown.
Taobao: Yes/no, toggles the Taobao logo, based on Category.
Tmall: Yes/no, toggles the Tmall logo, based on Category.
Single Line: Yes/no, controls the single-line title layer, based on Title Line 2.
Two Lines: Yes/no, controls Title Line 1 and 2 layers, based on Title Line 2.
Has Description: Yes/no, controls the description box, based on Description.
Background Image: Path for the background image.
Foreground Color: Path for the color image, used for title text color.

Explanation: I had three title layers. One for single-line, two for two-line titles.

Giving this to operations would be brutal. Most could be calculated. Operations only needed: Category, Title Line 1, Title Line 2, and Description. I made an online spreadsheet with just those four and sent it out. We had 5-6 people working, each taking categories. They finished fast.

The hard part was mine: calculating the rest, all needed for Photoshop. None could be skipped. Category was key. It determined the logos, background, text color, and filename sorting. So, I made a separate Category table, a dimension table, where each category was like a product. The image content table was the fact table, like an order. Category name was the dimension table’s primary key, a foreign key in the fact table, pulling in category info. One fact table (CSV) and one dimension table – a simple star schema, or maybe “Earth-Moon schema”?

These concepts are from data modeling and databases. Simply, it’s defining attributes on Category. Anything in a category would auto-read the background, color, etc., based on the name. This matched the requirements.

All the operations data (4 columns) was now in my Excel. I referenced it, added the calculated columns, and formed a complete table. I updated, saved as CSV, and gave it to Photoshop.

These calculated columns tested my Excel skills:

vlookup was crucial for looking up category attributes.
Filenames needed text concatenation. I could combine them freely, deciding the output order.
I used string replacement to remove spaces in titles, ensuring centering even with accidental spaces.
IF checked for empty values, preventing 0 on empty rows.

These are easy for Excel users, so I won’t detail them.

Merging Tables with Power Query

But, two questions remained:

How did operations’ data get into my Excel?
How do I update it?

First: The online spreadsheet let people work independently and update in real-time. My table was local because I needed Excel’s Power Query for merging, which most online spreadsheets lack.

For each batch, I downloaded the online spreadsheet (Course Cover Content Collection.xlsx) to the same directory as my table (Course Cover Content.xlsx). The data link would stay as long as the location didn’t change.

I used Power Query from the “Data” menu. Think of it as a visual SQL. It reads data from local tables, web pages, databases, and Azure, and cleans, transforms, and aggregates it. I used its local table reading.

The Power Query interface is both familiar and strange to basic Excel users. Familiar: “Tables!” Strange: “What’s all this?”

Understanding Power Query: It does three things:

Specifies the data source.
Sets rules and conditions.
Executes and loads data, one request per sheet.

Step two is crucial. The left list is a series of requests, executed in order.

Each needs “Use First Row as Headers” and removal of empty values.

It’s not just filtering and sorting. I used its table merging. Operations’ data was scattered. I couldn’t copy-paste, right? I queried each sheet, then created an append request, combining tables with the same format, like SQL’s CROSS JOIN.

Its merge query is also useful, like SQL’s JOIN and LEFT JOIN.

“Close” (actually save) made a bunch of sheets appear. I deleted unneeded ones. I added a sequence number for filename sorting.

All operations data was now in.

Second question: updating?

New batch? Download, overwrite, open the data table, “Data” menu, “Refresh.” Simple.

Why compare to SQL? It records query conditions, not results. Results are shown, but it’s a preview. It records requests and re-queries on “Refresh.”

After complex initial setup, the pipeline was set. Use was simple: download, overwrite, refresh, save as CSV – Photoshop’s data file.

Batch Image Generation in Photoshop

Photoshop had five steps:

Organize/rename layers.
Define variables.
Import data.
Batch export PSDs.
Batch convert to JPGs.

1. Organize and Rename Layers

Not hard. Merge, reorder. Name layers according to table headers for easier variable definition.

“Filename” is special; it’s not visible. I created it manually. Style doesn’t matter. Hide it.

“Foreground Color” needed special handling. Variables can’t directly change text color. For background-based changes: group the text, create a solid color layer, and use a clipping mask. This gives unified control.

The box’s line color? Related to text, but not the same. Add a Hue/Saturation layer for the lines, increasing saturation and brightness. Brown becomes orange, dark green becomes grass green… This needs color theory and Photoshop knowledge.

2. Define Variables for Layers

No step-by-step; the linked tutorial covers it. I’ll discuss tricky points.

Common use: “Text Replacement.” Non-text layers become “Pixel Replacement” – image change. Background is replaced this way.

Foreground color is similar. Prepare color images, define the clipping mask as a variable, select based on category.

Visibility variables are useful. TRUE/FALSE control display. Can be used with text/pixel replacement. Description text: text replacement changes content, visibility controls display.

These first two steps, though tedious, are one-time.

3. Import Data Sets

Import the CSV.

Two common errors: extra/mismatched columns, and empty cells. Photoshop doesn’t support empty cells, so I used NULL, with visibility checks.

4. Batch Export PSDs

No trick; do it like this.

Define filename format. “Data Set Name” is useful; it’s the first column, “Filename,” allowing customization.

5. Batch PSD to JPG Conversion

PSDs need conversion.

Record a simple action: open, save as JPG, close. Batch process the PSD folder.

My action set has “Save as JPG”; link at the end.

One More Table

Done? Task complete, but not the matter. One crucial table is missing.

These 800+ images (16,000 RMB) are just the first batch. More will come. Shouldn’t I know the yearly savings? Even if I don’t, the boss should.

So, a statistics table, a “bragging table.” Let’s call it “Rock and Roll Table.”

I could even make a chart, showing monthly/quarterly/seasonal value. Subtract from my salary to show my cost – hiring me is a steal! Data is there; whether I do it is TBD.

Epilogue

This was cost-effective. Half a day for initial setup. Negligible time after; I ran it during lunch.

This is my strength. I don’t reinvent wheels, but I assemble them well.

After setup, I met with operations. Marketing explained the four columns. No one found it hard. Operations thought I used AI. For non-tech people, anything amazing is AI. AI is the silver bullet. It’s funny; I’m used to it.

Finally, resources. Try it yourself:

Important Update

I have since developed a more automated programmatic solution for this workflow, which only requires a Python environment.

Details: https://github.com/greenzorro/excel-ps-batch-export

我用Excel把PS变成了一挺重机枪

hi@victor42.work (Victor42) — Thu, 13 Jun 2024 14:05:00 +0000

听说隔壁市场部正在头疼，新版课程封面图设计被老板认可了，接下来要把平台上现有封面图全部更新掉。总共800多张图，不是那种简单替换文字就可以的，有很多细微的变化。市场部就一个设计师，忙着呢，自己做不现实。外包出去一张20就是16000，预算不够。

诶！一听到这值16000，我就来劲了。我最擅长自动化，数据狂魔加上PS技巧，这事儿整个部门也就只有我能干了。你们不是喜欢说设计价值吗？你的价值是什么？怎么量化你的设计产出？花半天时间，给公司省下一个设计师一整个月工资，够不够价值？就冲这绝佳的年底汇报材料，果断把活抢下来。

需求

市场部设计师做出来的模板长这样。就别对设计指点江山了，反正这效果是老板点名要的，简单直接没毛病。具体需求说起来也简单，把图上三处文字替换成真实内容，输出800多张图。

看到模板，很多设计师会觉得这很简单啊，PS变量定义，建个Excel批量输出就完事儿了。

还不会Excel+PS批量输出的设计师，有兴趣请看这篇：https://zhuanlan.zhihu.com/p/33725280

嗯……思路是这样没错，如果真这么简单，照着教程做就可以了，全文完。

但是，所有模板拿到手后，发现事情远没那么简单，各种变数超乎想象：

课程分了十几类，有些分类有独特的背景图，有些分类共用一张背景图。
顶部的课程分类不全是文字，有两个特殊分类（淘宝和天猫）用的是两者的Logo字形，这只能用图片来实现。
课程名称有的一行、有的两行，一行的时候要垂直居中。
图里文字颜色随背景图变化，并不是纯黑，而是略微偏向于背景色。
最下面的描述小字不一定都有，有的空缺，连带着小字底下的装饰框也得拿掉。
装饰框的线条颜色也随背景色变，和文字同色系，但鲜艳很多。

现在先停下来想想，用PS变量定义还能应付吗？十几个分类出十几个PSD，当然也能实现。但是我不想导出十几次，只用一个PSD，这事儿能搞吗？

能。

这就需要设计师同时也是Excel高手了。

Excel数据模型设计

既然这事这么复杂，先思考一下数据模型吧。

程序员看到这怕是要笑出声来，你一个小破图这么点信息，还数据模型？

轻喷啊大侠，我这不是借用一下思路嘛。但是话说回来，如果只想把事干成，条条大路通罗马；如果想达到最高的效率，那真的要用数据模型的思路来规划。怎样算最高的效率？运营填最少的信息，我每次导出做最少的操作。这个出图流程会长期运转，要保持边际成本最低。至于第一次的配置操作，复杂点问题不大，初始成本不那么重要。

那么我们的表里有哪些列呢？

课程分类
课程名称
描述
背景图

这几个是最显而易见的。加上需求里那一堆变数，实际需要的列有这么多：

文件名：用来控制最终输出的文件名，按照合理、便于查找的顺序排列
分类：那十几个分类，既以文字形式显示在顶部，也决定着模板的整体外观
标题第1行：标题有的一行有的两行，拆开处理，满足了运营手动控制断行位置的需求
标题第2行：第2行是可选的，不填就当作单行标题
描述：时有时无的关键词，既以文字形式出现，也决定了它底下的装饰框是否显示
淘宝：是或否的布尔值，作为”淘宝“分类的Logo图显示开关，由分类列决定
天猫：是或否的布尔值，作为”天猫“分类的Logo图显示开关，由分类列决定
单行：是或否的布尔值，控制单行标题图层是否显示，由标题第2行是否为空决定
两行：是或否的布尔值，控制第1行和第2行标题图层是否显示，由标题第2行是否为空决定
有描述：是或否的布尔值，控制描述装饰框是否显示，由描述是否为空决定
背景图：模板背景图对应的文件路径
前景色：前景色图片对应的文件路径，这是一张纯色图，用来给标题文字上不同的色

这里要解释一下，我标题实际上有3个图层，单行标题是一个图层，第1行标题和第2行标题是另外2个图层，用于两行标题的情况。

这样一堆信息丢给运营填的话，估计会挨打。其中大多数都可以计算得出，真正需要运营填写的只有分类、标题第1行、标题第2行、描述。把仅有这4列的表做成在线表格，散出去给运营填。没错，我们有5、6个运营在做这事，每人领了几个分类，这样吭哧吭哧填起来也快得很。

现在复杂的部分留给我自己了，怎样把其余的列算出来，这些都是要给PS用的，一列也不能少。略微一想就会发现，分类是关键。分类决定了淘宝、天猫Logo是否展示，决定了背景图用哪张，决定了文字是什么颜色，还决定了输出文件名的排序。所以应该为分类单独开一个表，作为维度表，一个分类就好比一种商品。记录图片内容的表，就是事实表，这就像商品的一条订单记录。分类名是分类表的主键，作为外键在事实表里出现，把分类的各种信息带过去。一张事实表（导出csv）、一张维度表，这应该是最简陋的星形结构了，或许可以叫”地月结构“？

上面这段许多概念属于数据模型与数据库领域。简单说就是把许多属性定义在分类上。只要属于某个分类，就自动根据分类名读出对应的背景图、前景色等属性。实际需求也正是如此。

现在，运营填好的全部数据（4列）已经在我的Excel文件里了，把它引用到隔壁的事实表里，根据PS的需要加上各个运算列，构成完整的数据表。每次使用时更新数据，保存成csv格式，交给PS处理。

这些运算列就考验Excel公式的运用了：

由于要去分类表里查它的各种属性，用好vlookup是关键。
文件名则需要文本拼接，想拼成什么样都可以，随心所欲决定输出图片文件的排序。
我在两个标题列里贴心地用字符串替换公式做了去空格功能，即使运营不小心多敲了个空格，也不会导致标题位置跑偏，精准居中。
到处都用了IF来判断空值，防止在空行上产生0。

这些对熟悉Excel的人并不难，就不展开了。

Power Query合并表格

但是，上面似乎遗漏了2个问题：

运营填好的数据是如何进到我Excel里来的？
运营再给一批新数据，怎么更新进来？

先来解决第1个。之前散出去给运营填的是在线表格，每人领其中几页，互不影响，且能实时更新。而我的数据表是一个本地Excel文件，因为要用到Excel强大的Power Query功能来合并表格，大多数在线表格产品没有这种能力。

每当要输出一批图，首先要把在线表格以Excel文件的形式下载下来（封面图内容收集.xlsx），放到和我数据表（封面图内容.xlsx）同一个目录下。只要文件位置不动，这层数据读取的关联关系就能一直保持下去。

在数据表中使用“数据”菜单里的Power Query。这个功能可以理解为一种图形化界面的SQL，它可以从本地表格、网页、本地数据库、Azure云上读取数据，并进一步清洗、转换、聚合。我这里用到的是它从本地表格读取数据的能力。

Power Query的界面对于只用Excel基础功能的人应该既熟悉又陌生。熟悉的是，“哇这里也有表格耶”；陌生的是，“其他这些都是什么玩意”。

怎么理解Power Query呢？它做了3件事：

先指定了数据源，外部数据从哪来
然后让我设定查询的规则和条件
最后执行这些查询请求，把我要的数据加载到当前Excel文件中，一个请求加载一页

其中最核心是第2步。界面左侧列表就是一个个的查询请求，它会按顺序执行下来。

每个请求里要“将第一行用作标题”，并且在筛选条件里去掉空值。

查询操作不止是基本的筛选排序，我这里用到了它的合并表格的能力。从运营那收过来的数据分散在很多页表里，总不能一个个手动复制黏贴吧？我把运营的每页表都查询出来，最后再建一个追加查询请求，它可以把格式完全相同的表合成一张，相当于SQL的CROSS JOIN。

顺道提一下，它的合并查询也非常有用，能实现SQL里JOIN和LEFT JOIN的效果。

点了关闭（其实作用是保存）按钮后，我的Excel文件里就刷刷刷多出许多页，不需要的可以删掉，留下有用的。我再合并好的表左侧加了一列序号，用来给文件名排序。

这样，运营填的数据就全部进到我的Excel文件里来了。

现在来解决第2个问题，怎么更新数据？

如果运营提交了一批新的封面图内容过来，我只要下载下来放到老地方覆盖一下。再打开数据表，进入“数据”菜单，点刷新。就这么简单，数据就更新了。

为什么把Power Query比作SQL呢，因为它记录和复用的是我的查询条件，而不是查出来的结果。查询结果虽然也显示出来了，但那只是预览，让我方便修改查询条件。实际上它是把我的查询请求记录在Excel文件中，每次点刷新，就重新查询一遍。

现在，经过了一系列看似复杂的首次配置，数据处理的管道已经建立起来了。后续使用变得异常简单：下载、覆盖、刷新、保存为csv，就变成PS需要的数据文件了。

PS批量出图

到了PS这一步，要做5件事：

整理和重命名图层
为图层定义变量
导入数据组
批量导出psd
批量psd转jpg

1. 整理和重命名图层

第1步没什么技术含量，图层该合并的合并，顺序该调整的调整。为了后面定义变量方便，建议把图层按照数据表表头来命名。

文件名这个图层很特殊，它不体现在图上。拿到的psd模板里没有，需要手动新建，样式随意，反正不显示。把它藏在背景图图层的底下，或者移到画布外面。

前景色这个图层也需要特殊处理。变量定义无法直接改变文字颜色，怎么实现随背景变化？把需要上色的文字编组，新建一个纯色图层，使用剪贴蒙板作用在这个组上，这就实现了统一控制文本颜色。

装饰框的线条颜色也会变，还记得吗？此处的颜色和文字颜色有关联，但又不一样。它们之间的关联是什么？在前景色的基础上，为这两根线条专门加一个色相/饱和度调整图层，把它的饱和度和明度调高就行。比如棕色加浓提亮就变橙色，墨绿加浓提亮就变草绿……这个技巧需要对色彩原理和PS调色有比较深刻的理解。

2. 为图层定义变量

第2步变量定义的基础技巧，这里就不手把手教学了，开篇引用的教程里都有讲。这里主要讲几个难点的处理。

变量定义功能最常用的是替换文字，用到的是其中的文本替换这个选项。当变量图层不是文本时，它就会变为像素替换，也就是换图。背景图通过这种方式替换。

前景色也一样，为每种文字颜色准备相应的纯色图片，把上一步创建的剪贴模板图层定义为变量，根据课程分类选用相应颜色。

可见性变量常被忽视，但非常有用。表格里这些TRUE和FALSE的列，就是通过可见性来控制图层的显示隐藏。可见性变量还可以和文本替换或像素替换同时使用。比如底部描述小字，文本替换改变它的内容，可见性则控制它显示隐藏。

前两步的工作虽然繁琐，但都是一次性的，在后续的使用中不必重复。

3. 导入数据组

切换到导入面板，把准备好的csv文件导进来就是了。

导入常见的错误有两种，一种是有多余的列或名称对不上的列，需要检查。

另一种是有的行里有空单元格。是的，PS导入数据组不支持空单元格，所以我在制作数据表的时候把空单元格内容都改成了NULL。这些NULL显示出来会破坏图片效果，所以可能为空的列都要做可见性判断，适时隐藏。

4. 批量导出psd

导出没什么技巧，这么操作就是了。

在这一步你可以自己定义导出的文件名格式。这么多可选的项里，真正有用的只有数据组名称，只有这一项能在文件名上留下对运营有帮助的信息。数据组名称读取的是csv文件第一列的内容，这就是我为什么把文件名这一列放在最前面，以实现对输出图片文件名的自由定制。

5. 批量psd转jpg

PS数据组导出的都是psd文件，交付出去的得是jpg文件，还要再转换一道。

录一个简单的PS动作，打开psd，存储为jpg，关闭。就这么3步，把整个psd文件夹用批处理跑一遍就行了。

我自制的动作集里有现成的存为jpg动作，可在文末提供的链接里下载。

还差一张表

好了，活干完了吗？只能说任务完成了，事情还没完，还差一张最重要的表。

这价值16000的800多张封面图，只是第1批。公司还会不停出新课程的，还会有第2批、第3批……我难道不该了解一下，自己的举手之劳一年下来给公司省了多少钱吗？即使我不想知道，也非常有必要让老板知道。

所以还要有这样一张数据统计表，邀功表。当然，不要起这么露骨的名字，取个谐音，叫摇滚表好了。

我要是再卷一点的话，还可以基于这张表生成个柱状图，每月、每个Q、每季度的输出价值。用我的月工资减去这些产值，直观展现出每个月我的实际用人成本，雇了就是赚到。要不要这么卷到时候再看吧，反正数据是沉淀了。

后记

这个活抢过来性价比极高。总共就费我半天时间，做了点工作流的初次配置，后续花费的时间可以忽略不计，午餐时间挂机跑一跑就好了。

这就是我的本事，我不爱造轮子，但我极其擅长组装轮子。

流程跑通后，和运营开了个会。市场部向他们讲解了提供内容的规范要求，也就那4列在线表格，没谁觉得难。运营们都以为我这是AI输出的。对于非技术人员，好像不可思议的事情都是AI干的，AI是silver bullet。有意思，这种偏见我已经习以为常了。

最后放上相关的文件和资源，有兴趣可以自己下载动手试试：

重要更新

这一套工作流程，我后来又开发了自动化程度更高的程序方案，只需要有一套Python运行环境。

详见：https://github.com/greenzorro/excel-ps-batch-export

Creating Custom Child Growth Charts in Excel

hi@victor42.work (Victor42) — Thu, 03 Aug 2023 14:30:00 +0000

This is about how I used Excel, data visualization, AI, statistics, and formulas to create a custom growth chart. I’ll explain everything clearly, even the basics.

Many parents use apps to track their baby’s height and weight. I only used that one feature. Installing a large app just for that felt wasteful. It was a perfect chance to use my Excel skills. It’s just data analysis, right? Excel can handle it!

System Planning

First, I needed a plan. Let’s see how growth curves work in parenting apps.

This is a growth curve from Baobaoshu (BabyTree). The 50% line is the median. If my baby’s height (or weight) is on this line, about half of babies are taller (or heavier) and half are shorter (or lighter). The 75% and 97% lines mean the height (or weight) exceeds 75% and 97% of babies of the same age. The 25% and 3% lines work similarly. This shows my baby’s growth compared to others.

I wanted a similar tool to:

Record my baby’s height and weight.
Query the normal height and weight range for each month.
Show how much my baby’s measurements deviate from the norm.

A chart or curve didn’t matter. The key was the third point: calculating and displaying the deviation intuitively. A diverging bar chart seemed suitable:

This chart compares two data sets in the same dimension.

For one data set, it shows direction and distance from a benchmark, often for positive and negative values.

This was perfect. I’d use the median as the benchmark, showing if my daughter’s height (or weight) was above or below it. The bar length would show the deviation. To simplify, I used symbols: a minus for below, a plus for above, with more symbols meaning greater deviation. Seeing “+++” or “—-” would signal a need to check her growth trend.

Preparing the Data

With a goal, I started working. First, the first two capabilities:

Record height and weight.
Query normal ranges.

Baby’s Growth Data

My baby’s data was in the Baobaoshu app (dates are omitted to protect my daughter’s birthday):

Baobaoshu doesn’t export data. Manual entry was an option, but there had to be a better way.

I took screenshots of the records and used an Android app, Screen Master, to stitch them into one long image.

Then, I used Baimiao OCR (https://web.baimiaoapp.com/) to extract the text:

The format was messy, but AI can handle that.

Done! I just copied it to Excel. Day age, month age, and age were automatically calculated by subtracting my daughter’s birthday from the recording date.

Normal Range Standards

Reference values are on the National Health Commission’s website. The 2022 standard, WS/T 423—2022, is the same source as Baobaoshu: http://www.nhc.gov.cn/fzs/s7848/202211/8b94606198e8457dafb3f8355135f1a3/files/e38068f0a62d4a1eb1bd451414444ec1.pdf

The data was in this format:

I’ll explain this table. We’ve covered the median. The key is “SD,” or Standard Deviation. It’s a basic statistical term. First, we need to understand normal distribution. The Health Commission’s statistics use a large sample size, measuring many children. Height and weight are random and, with a large enough sample, normally distributed around the average (or median, which is very close). A normal distribution looks like this:

The horizontal axis is height (or weight), and the vertical axis is the number of children. The center dashed line is the median. Most children are near the median. Fewer children are at the extremes.

Standard deviation is the distance between the dashed lines, which are equally spaced. It’s like a ruler for the normal distribution. It tells us the proportion of data within a range. For example, 68% of children are within one standard deviation above and below the median; 95% are within two.

Standard deviation is a key property of normal distribution. Proportions for 1, 2, and 3 standard deviations are always 68%, 95%, and 99.7%. Knowing the average (or median) and standard deviation lets us find any data point’s position.

I copied the table to Excel and converted all ages to months:

The table shows the median and values at 1, 2, and 3 standard deviations above and below it. This helps me see where my daughter’s measurements fall and how much they deviate.

Drawing the Curve

Now, the hard part: showing how much my baby’s measurements deviate from the normal range. This requires real Excel skills.

I had two tables: my baby’s data and the reference ranges. I needed to add deviation columns, query the reference table, calculate the deviation, and show it with plus and minus signs. Minuses would be right-aligned in the left column, and pluses left-aligned in the right, creating a simplified diverging bar chart.

Matching the Reference Month

In theory, this is simple: use VLOOKUP to match the month, then nested IFs to compare and output symbols.

But the National Health Commission table has gaps:

From 2 years old, data is provided every 3 months. This is reasonable, as growth slows. But it affects querying. At 25 months, a direct VLOOKUP finds nothing.

One workaround is to manually complete the reference table, adding missing months and using values from younger months (e.g., using 24-month values for 25 and 26 months).

But I wanted intelligent matching!

So, I added a hidden column to find the corresponding reference month for each row.

The formula for this column is:

=IF(ISBLANK(A2),"",INDEX('生长对照表'!A$3:A$46,COUNTIFS('生长对照表'!A$3:A$46,"<="&C2),0))

In plain English, the formula checks if the date is blank. If so, the cell is empty. Otherwise, it counts rows in the reference table with months less than or equal to the baby’s age, effectively “matching down.”

Before two years, the baby’s age matches the reference age. I tested this; a 25-month record will match the 24-month reference.

Calculating Deviation

The reference month column handles mismatches, so we can calculate deviations.

The “height below average” column formula serves as an example:

=IF(ISBLANK(F2),"",IF(F2>VLOOKUP(E2,'生长对照表'!A$3:O$46,12),"",IF(F2=VLOOKUP(E2,'生长对照表'!A$3:O$46,12),"=",REPT("-",5-RANK(F2,{F2,VLOOKUP(E2,'生长对照表'!A$3:O$46,11),VLOOKUP(E2,'生长对照表'!A$3:O$46,10),VLOOKUP(E2,'生长对照表'!A$3:O$46,9)},1)))))

Okay, this formula looks insane. Let’s break it down, layer by layer, starting from the outside:

Layer 1

=IF(ISBLANK(F2),"",IF(F2>VLOOKUP(E2,'生长对照表'!A$3:O$46,12),"",IF(F2=VLOOKUP(E2,'生长对照表'!A$3:O$46,12),"=",Layer 2)))

This part first checks if the height column (F2) is empty. If so, this cell is also empty. Otherwise, it compares F2 with the corresponding median height from the reference table. If F2 is greater than the median, the cell remains blank (as this column only shows negative deviations). If F2 equals the median, it displays “=”. If F2 is less than the median, the second layer calculates the number of “-” signs to output.

Layer 2

REPT("-",Layer 3)

I initially planned to use nested IF statements to determine the number of minus signs, but that seemed a bit silly. Here’s a simpler approach: The REPT function can repeat a string a specified number of times. Now, the problem is passed to the third layer: calculating the number of minus signs to output.

Layer 3

5-RANK(F2,{F2,VLOOKUP(E2,'生长对照表'!A$3:O$46,11),VLOOKUP(E2,'生长对照表'!A$3:O$46,10),VLOOKUP(E2,'生长对照表'!A$3:O$46,9)},1)

Here’s a hidden gem in Excel: array constants. We often use ranges in formulas, which are implicitly arrays. But did you know you can create arrays manually, like in programming, using curly braces {}? For instance, {1,2,3,4} in a formula is the same as:

Array constants are far more flexible. You can combine seemingly unrelated data. Just look at what’s inside the {}:

{F2,VLOOKUP(E2,'生长对照表'!A$3:O$46,11),VLOOKUP(E2,'生长对照表'!A$3:O$46,10),VLOOKUP(E2,'生长对照表'!A$3:O$46,9)}

This array combines my baby’s height (F2) with the heights at -1, -2, and -3 standard deviations from the mean.

RANK(F2,Array,1)

Next, RANK sorts my baby’s height among those four values, ascending. Subtracting the rank from 5 gives the number of minus signs. Why 5? Think it through based on different scenarios, and it’ll become clear.

I used a similar approach for the other three deviation columns. It works perfectly. The number of symbols indicates the standard deviation range. 95% of children fall within two standard deviations, so two symbols are fine. All good so far!

Data Visualization

For data visualization, I need to highlight key data. The plus and minus signs are basic.

I don’t need fancy graphics. To flag outliers, I just replaced the pluses and minuses with distinct symbols and added simple conditional formatting for background colors. That’s enough for me.

Three symbols mean the measurements are outside the 95% range – I use yellow. Four symbols mean outside 99.7% – I use red. I manually adjusted a few extreme values for demonstration:

Wrap-up

Finished! Time to uninstall that parenting app. I happily clicked the “x”.

There are many growth trackers, but building my own is uniquely satisfying. I learned about arrays, REPT, and RANK on the fly – a great experience. The initial planning was the most interesting. Once started, it took just an hour.

It shows the power of combining knowledge, tools, and techniques. Improvise, adapt, overcome.

I should mention I prefer Google Sheets. Replicating this in Excel might require tweaks, but the formulas are similar.

[2024.1.18 Update] I’ve received requests for the spreadsheet. Converting to Excel had issues: Excel doesn’t support array constants as a RANK range, and you can’t reference other cells within them. Doing this in Excel is harder, likely needing many nested IFs. I recommend Feishu sheets or Google Sheets.

I’ve made boy/girl versions available.

Boy version: https://my.feishu.cn/wiki/JlMKw1NiBis8yok62BJcbCZ3n2d?from=from_copylink

Girl version: https://my.feishu.cn/wiki/RKHuwkXafiS987kLxPIc8jkxnAc?from=from_copylink

Excel自制儿童生长曲线

hi@victor42.work (Victor42) — Thu, 03 Aug 2023 14:30:00 +0000

一篇把Excel玩出花来的折腾笔记，涉及数据可视化、AI工具、统计学、Excel公式。不用担心，我会以数据小白的角度来写，最基础的概念我都会解释。

不少宝爸宝妈使用育儿App来记录宝贝的生长，追踪身高体重变化。实际上，育儿App那么多功能，我也就用这一项，就为这个就要在手机上装个大几百Mb的App，这让我动了卸载的念头。我也不是真缺这几百Mb，只是忽然意识到，这也是个练手的好机会。不就是个数据分析工具嘛，我万能的Excel会搞不定？

系统规划

动手前，想清楚这事情该怎么做。首先来看看育儿App的生长曲线是怎么回事。

这是宝宝树的儿童生长曲线。中间的50%线是中位数，如果我宝贝的身高（体重）刚好落在这条线上，说明这个月龄比她高（重）的宝贝和比她矮（轻）的宝贝人数大概一样多。往上的75%线和97%线，表示这个位置身高（体重）超过75%和97%的同龄宝贝，往下的25%和3%同理。看宝贝的数据点落在什么位置，大概就知道她生长状况相对整体如何。

我要的也是一个类似的分析工具，它应该具有以下能力：

能记录宝贝每次测得的身高体重
能查询各月龄的身高体重正常范围
能清晰表达我宝贝各月龄身高体重偏离正常范围的程度

至于这东西是不是个图表、有没有曲线，不重要。重要的是第3点，它的计算能力，能衡量偏离程度，并用一种直观的方式表示出来。这一点我认真构思了一下，觉得比较适合的表现形式是有两个方向的条形图，类似这种：

这类图表叫做diverging bar chart，不知道中文叫什么。它可以把两组数据在同一个维度上两两对比。

如果只用来表达一组数据，它反映的就是该数据围绕某个基准值的方向及距离，最常见是表达正负。

这很适合用来表示我宝贝的生长数据，以参考值的中等水平作为基准值，表现女儿的身高（体重）是偏低了还是偏高。至于偏离基准值多远，图表用柱子长短来表达，柱子长短的差异有时不是那么明显，我觉得应该进一步简化，只使用符号。低于基准值用减号，高于基准值用加号，偏离越多符号就越多，这样当我看到三、四个加号（+++）减号（—-）时，就知道宝贝的生长趋势该引起重视了。

准备数据

有了具体目标，该开始干活了。先实现前2项能力：

能记录宝贝每次测得的身高体重
能查询各月龄的身高体重正常范围

宝贝生长数据

宝贝的身高体重数据存在宝宝树App里，形式如图（月龄的左边还有一行日期，不想暴露女儿生日，没截进来）：

宝宝树没有数据导出功能。虽然我可以一条条手动输入到Excel，但难道不该用聪明点的办法吗？

我先把宝宝树里的记录一屏一屏截下来，用了一个叫Screen Master的Android应用拼成长图。

然后使用白描OCR工具（https://web.baimiaoapp.com/）从长图中识别出文字，得到如下右侧结果：

这样格式错乱混在一起，乍看没法用。但在AI时代，这都不是事儿。

搞定！复制到Excel即可。补充一下，表里的日龄、月龄、年龄是用记录日期减去女儿生日得到的，自动计算无需手填。

正常范围标准

各月龄的身高体重参考值，在卫健委的网站可以找到。2022年发布的标准，还蛮新的，编号是WS/T 423—2022，跟宝宝树同一个数据源： http://www.nhc.gov.cn/fzs/s7848/202211/8b94606198e8457dafb3f8355135f1a3/files/e38068f0a62d4a1eb1bd451414444ec1.pdf

里面找到了格式如下的数据，正是我要的：

稍微解释下这个表格的意思。中位数前面讲过，这里最关键的是看懂这个“SD”，Standard Deviation，标准差。这是个非常基础的统计学术语，在解释标准差之前，我们需要先了解正态分布。要知道，卫健委统计的儿童身高体重，样本量一定是非常大的，也就是说测量了很多很多儿童的身高体重。身高体重这种随机产生的数据，只要样本量够大，每个儿童的数值就会围绕平均数（这里它用的是中位数，与平均数应该很接近）呈正态分布。这是正态分布的样子：

横向是身高（体重）的值，由小到大，纵向是该身高（体重）对应的儿童人数。中央的垂直虚线代表中位数，绝大多数儿童的数据落在中位数附近，说明还是中等水平的儿童最多。越往两边去，人数越少，说明身高（体重）值特别低或者特别高的人很少，情况越极端，人数越少。

现在说回标准差。我们不谈公式，不做计算，不必关心它怎么来的，我们关心的是标准差和正态分布的关系。

标准差体现在正态分布图上，就是每两根垂直虚线间的距离，这些虚线是等距的。怎么理解标准差？它是正态分布的一把标尺，通过标准差，我们可以准确知道某个范围内的数据占总数的比例。比如我们可以说，有68%的儿童，身高（体重）在中位数上下一个标准差范围内。有95%的儿童，身高（体重）在上下两个标准差范围内。

要注意它名字里有“标准”二字，这两个字可不是随便说说的。标准差是正态分布的一个独特性质，不同的数据集算出的标准差数值可能不一样，但比例却是一致的。只要是正态分布，它1个、2个、3个标准差范围对应的占比就一定是68%、95%、99.7%，这就是神奇的地方。生活中各种各样的随机数据，都会呈现正态分布。所以只要我们知道了平均值（或中位数）和标准差，就可以知道手上任何一个数据在整体中所处的位置。

现在，回来处理数据，把卫健委表格复制到Excel，年龄全部折算成月龄：

表格里列出了每个月龄儿童身高（体重）中位数是多少，低于和高于中位数1、2、3个标准差位置的数值分别是多少。这就是我要的判断依据，知道女儿的身高（体重）在同龄宝贝里处于什么位置，相对于中等水平偏离得严不严重。

绘制曲线

接下来，要啃硬骨头了，来实现第3个能力，“表达我宝贝在相应月龄身高体重偏离正常范围多少”，这是实打实的Excel技巧。

现在我的Excel里有两张表格，一张记录着我宝贝各月龄的数据表，一张列出各月龄的正常值范围的参照表。我要做的是在宝贝数据表里新增几列偏离列，在里面查询参照表，得出偏离程度，以加减号的形式表现出来。减号写在左列，靠右对齐；加号写在右列，靠左对齐。这就实现了简化版diverging bar chart。

匹配对照月龄

这个事情想想是不难，不就是拿vlookup去查嘛，月龄对上，然后一堆If嵌套对比数值大小，输出符号，肯定能搞定。

一动手发现没那么简单，因为卫健委表格的月龄有断档：

它从2周岁开始，每3个月才出一行数据。这很合理，宝贝过了2周岁后，生长确实没有婴儿时那么快了，没必要那么频繁去追踪。但这影响到我的查询方法，如果我在宝贝25月龄的时候记录身高体重，直接用vlookup去查，什么也匹配不到，后续的计算便无从谈起。

此时有个土办法，规整数据，手动补全参照表。把缺失的月龄加上，用更小月龄的参照值来填充。比如把25、26月龄的参照标准都填成24月龄的。

但这是练手项目啊，拒绝土办法。我要在宝贝数据表里实现智能匹配！

于是再增加一个隐藏列，用来计算每行的月龄对应参照表里多大月龄。

这一列的公式如下：

=IF(ISBLANK(A2),"",INDEX('生长对照表'!A$3:A$46,COUNTIFS('生长对照表'!A$3:A$46,"<="&C2),0))

翻译成人话是：先查日期列是不是空的，它空我也空。如果不是，就去参照表里数一数比宝贝月龄小或者相等的有多少行，这就实现了向下匹配。

在2周岁以前，月龄与对照月龄一定是相同的。我手动测试了一下，25月龄时如果有记录，它会匹配24月龄作为参照。

计算偏离程度

有了对照月龄列，不担心参照表匹配不上，现在可以放心在偏离列里做计算了。

以身高偏低列的公式为例：

=IF(ISBLANK(F2),"",IF(F2>VLOOKUP(E2,'生长对照表'!A$3:O$46,12),"",IF(F2=VLOOKUP(E2,'生长对照表'!A$3:O$46,12),"=",REPT("-",5-RANK(F2,{F2,VLOOKUP(E2,'生长对照表'!A$3:O$46,11),VLOOKUP(E2,'生长对照表'!A$3:O$46,10),VLOOKUP(E2,'生长对照表'!A$3:O$46,9)},1)))))

啊……这个公式就有点丧心病狂了，我要先拆解一下再翻译。从外向里看，分为3层：

第1层

=IF(ISBLANK(F2),"",IF(F2>VLOOKUP(E2,'生长对照表'!A$3:O$46,12),"",IF(F2=VLOOKUP(E2,'生长对照表'!A$3:O$46,12),"=",第2层)))

这部分先看身高列是不是空的，它空我也空。如果不是，就开始把它与参照表对比，查出对应的身高中位数是多少。如果高于中位数，这列留空（这列专填减号）；如果等于中位数，写个等号“=”；如果低于中位数，就进入第二层，输出一定数量的减号。

第2层

REPT("-",第3层)

原本是打算用一层又一层的 If 条件判断来决定输出几个减号，后来想想这方法也有点傻。这就是简单的方法，Rept函数可以把一个字符串重复输出一定次数。现在问题甩给第3层，计算要输出减号的数量。

第3层

5-RANK(F2,{F2,VLOOKUP(E2,'生长对照表'!A$3:O$46,11),VLOOKUP(E2,'生长对照表'!A$3:O$46,10),VLOOKUP(E2,'生长对照表'!A$3:O$46,9)},1)

此处用到一个Excel隐藏技巧：数组。Excel公式里引用一个范围，这就构成一个数组，我们大多数时候就是这么用的。但你知道吗？可以像编程软件那样，在Excel里手动创建数组，关键就是这个大括号 {} 。比如 {1,2,3,4} 在Excel公式里就等效于这个：

但数组的用法更灵活，可以手动把八竿子打不到一块的数据凑在一起。单看 {} 里的内容：

{F2,VLOOKUP(E2,'生长对照表'!A$3:O$46,11),VLOOKUP(E2,'生长对照表'!A$3:O$46,10),VLOOKUP(E2,'生长对照表'!A$3:O$46,9)}

我这个数组，把宝贝的身高（F2）和-1、-2、-3个标准差的身高值放在一个数组里。

RANK(F2,数组,1)

然后用Rank函数做个排序，得出宝贝身高在这4个数值里从小到大排第几。最后再用5减去这个数字，就得到减号的数量。至于为什么是用5减，这是个数学问题，不展开，但分情况想想就很容易理解了。

用类似原理，改出另外3个偏离列的公式，效果立竿见影。几个符号表示宝贝的数值在几个标准差范围内。根据正态分布的特征，95%的儿童生长数据都在2个标准差范围内，所以看到2个符号时，我没什么需要担心的，目前为止宝贝一切正常。

数据可视化

既然要做数据可视化，就要让值得留意的数据更显眼，一目了然。加减号的效果稍微糙了点。

其实用不着多复杂的图形设计、高级渐变色之类的。要突出异常值，只需要用区别足够明显的符号代替加减号，再简单写个条件格式，用背景色区分就能达到目的，我自己用足够了。

3个符号它代表宝贝的数值低于或高于95%的同龄儿童，需要引起重视了，用黄色。4个符号表示低于或高于99.7%的同龄儿童，用红色。我手动改了几个极端值出来，实际效果如下：

后记

搞定，收工！现在可以把育儿App卸了，愉快按下叉叉按钮。

这类生长记录小工具，我相信有很多现成的，但自己创造的乐趣是它们无法替代的。像其中的数组、Rept函数、Rank函数，都是现学现用，收获很大。这里面最有意思的部分其实是前期的规划构思，真正动起手来，整个过程1小时就搞定了。

它印证了多种知识、工具、技巧相互组合的威力。见招拆招，总能有效解决问题。

最后说明一下，我们用的真的不是同一个Excel，我更喜欢用Google Sheets。如果想要在Excel里重复我的实验，未必能成功。可能少数细节要变通一下，但两者的公式和用法是高度一致的。

【2024.1.18 更新】有些朋友想要表格文件，我亲自尝试了下，转成Excel后部分公式无法正常工作了。因为Excel并不支持把数组常量作为rank的引用范围，而且数组常量里也无法引用其他单元格。所以Office的Excel做这个会相对麻烦，估计得一堆if嵌套了，还是建议大家有条件就用飞书表格或Google Sheets。

我把这个表格做了两个可供取用的版本（男宝/女宝）：

男宝版：
https://my.feishu.cn/wiki/JlMKw1NiBis8yok62BJcbCZ3n2d?from=from_copylink

女宝版：
https://my.feishu.cn/wiki/RKHuwkXafiS987kLxPIc8jkxnAc?from=from_copylink