Author: Su Zihua
As the boss of a listed company, Chumen Wenwen's founder and CEO Li Zhifei did not personally explain the product at the recent product launch conference, but instead shared a personal "performance art" - an experiment of a "one-person company".
He set himself a seemingly unrealistic goal: developing a "Feishu" designed specifically for AI organizations within a few days using AI tools.
As a practitioner of the previous AI wave, he has always been at the forefront. In 2012, he left his position as a Google scientist to return to China and establish Chumen Wenwen, aiming to "redefine human-machine interaction with AI+voice" from voice assistants to intelligent hardware to AIGC. When this wave of AGI emerged, he was initially excited and actively involved, but soon realized it seemed to be a game among giants where small and medium-sized companies could hardly create significant value, which made him feel lost and even discouraged.
However, by using AI programming tools to transform himself into a "one-person company" for practice and experience, he encountered many practical problems, but these details and experiences helped him rediscover his faith in AGI.
He suddenly realized that all the "friction" in the past world and obstacles to building complex things seemed to have disappeared.
The sense of freedom and excitement of racing forward with AI and seeing hope was evident in his on-site speech.
[The rest of the translation continues in the same professional and accurate manner, maintaining the original structure and meaning while translating to English.]Although there are some minor issues with audio synchronization, the entire video is 100% completed by AI. I just need to issue commands, and it can automatically operate, ultimately presenting the completed video before my eyes.
This gives me a sense of accomplishment, having created this thing in just a few days.
Then I wanted to see how others would view this matter. So I uploaded the code to GitHub, allowing my colleagues to download it. But please remember, we are two different individuals, and GitHub does not know how I communicated with AI and completed these tasks.
So my colleagues ultimately only saw the code and ran it locally.
When my colleagues downloaded and ran the code I uploaded to GitHub, they were shocked by its complexity and completion speed. They believed it would take dozens of people several months to complete, and when I told them that an engineer completed this in two days with AI assistance, their reaction was: "This is absolutely insane."
They were surprised by the over 40,000 lines of code, which far exceeded my previous output of 300 lines of algorithm code per day at Google.
Previously at Google, writing 300 lines of algorithm code (non-simple code) in a day was considered highly productive. Recently, I wrote a universal Agent that wrote 3,000 lines of Python code for me in 3 hours, or one evening. That is to say, in those 3 hours, and the code quality was absolutely better than what I would write, containing pure backend logic with no UI.
In other words, its coding ability in 3 hours is equivalent to my previous 10 working days. That's the proportion.
So I was thinking, one person could complete a Google Translate. Previously, Google Translate was written by 20 of the world's top PhDs, working for a long time. Now, I alone can complete the workload of those 20 people. Google Translate was still a very impressive and complex system back then. So, I feel that from this perspective, everything is very different from before.
I believe that ultimately, the key to AI is building a self-evolving AI system.
Li Zhifei's practical insights | Image source: Chumen Wenwen
To facilitate testing this AI organization's App, I automatically wrote code: the website code on the left, and a test framework on the right. Then, it flew up like stepping on its own left foot with its right. You might think this is a perpetual motion machine, and indeed, there is that possibility. Of course, it sometimes kicks its left foot with its right foot and falls down, meaning it can have negative and positive cycles.
To achieve this goal, in addition to engineers, non-engineers can also directly modify my code. I created various Agents.
Of course, many of these are Prompts, and I only verified feasibility, not truly deployable or productized.
But I believe this proves the idea, or demonstrates to the team exactly what I want, which might have previously required a lot of time to figure out. Now you can directly make a Demo for them to see. So I think that even as a CEO, if you have this capability, your output is truly amplified 100 times.
(Translation continues in the same manner for the rest of the text)If an Agent is to achieve true intelligence, it should also have a recursive architecture. For example, an Agent receiving a grand task like "earning 5 million" would gradually break it down into specific subtasks: analyzing business opportunities, building a website, creating videos, integrating payments, social media promotion, etc. Each subtask can ultimately be traced back to executable "atomic Agents".
The key to this recursive architecture lies in achieving self-replication. Just like the inheritance of human civilization depends on the exploration and knowledge accumulation of generations, Agents should be the same. More importantly, Agents must have the ability to modify their own source code.
This is different from current Agents merely adjusting plans; it means Agents can fundamentally change their own operating logic, just like modifying their own genes.
I believe that if an Agent can:
Continuously execute and optimize its plan.
Autonomously modify its core source code when encountering unsolvable problems.
Ultimately form a knowledge base through this mechanism, and even reverse modify the large model itself.
Then this will be a crucial step towards Artificial General Intelligence (AGI).
This is not science fiction. I used to particularly dislike discussing superintelligence, but after an in-depth exploration with large models, I suddenly felt this is completely achievable.
Moreover, the true AI source code might be extremely concise, with core code perhaps not exceeding a hundred lines, but containing multiple layers of recursion, enabling it to explore, learn, provide feedback, and self-iterate in different environments.
I have experienced faith collapsing. In 2023, I developed an AI faith, but after working on it for a while, mainly due to lack of funding, I felt I couldn't sustain it and gave up. Last year, when others talked to me about AI, I didn't even want to listen.
But recently, I rediscovered my faith in AI, and even believed in AGI and superintelligence. This is an unimaginable transformation. I hope I can sustain this faith a bit longer this time.
The Importance of Personalized Environment and Context
So, besides large models, what is most important? Most important is having a personalized environment and Context.
Taking my entrepreneurial experience as an example, I previously made a smart hardware, and Xiaomi brought the price down to one-tenth of ours. I worked on large models, and then all major companies entered. Each time you receive such feedback, it makes you want to give up or constantly adjust your Plan.
If I were in the United States, I might have been acquired by Google for a large model and earned a lot of money. Or if I made hardware, I might have been acquired by Apple and earned a lot of money. So such feedback will definitely cause an entrepreneur's behavior to be completely different. The same entrepreneur, with the same IQ, in different entrepreneurial environments in China and the United States, receiving different feedback, will ultimately have completely different behavior and thinking patterns. This is what I want to say about personalized environment and context.
Context is more of a historical record.
So going back to what I said before, in the era of large models, I was among the first to stand up and say I wanted to do large models, but I might also be the first to realize this wasn't for me. Then, I basically didn't fully invest in this because I didn't know how to participate.
In the first half of this year, I felt that except for those three or four global giants, no other companies were qualified to discuss models, and they shouldn't join the trend or waste their lives. Even more so, don't waste emotions on this. Because you simply have no chance; it's just burning money, and frankly, I felt the large models themselves had become super uninteresting, just burning money. I couldn't find an entry point and couldn't understand the value of most AI companies.
But this time, through practice and re-examination, I feel that even for something as grand as AGI, I seem to be able to participate again.
So, this is the cycle of iteration between an Agent's Planner and Executor. If you invest sufficiently clearly, you can make intelligence produce intelligence, and I believe you can participate in the entire AGI process.
And the large model itself is just a chip to you. Think about Qualcomm's chips, Apple's phones, and TikTok on top of them. These are completely different things. In the end, it's the company that made TikTok that gained the most value.
I discovered that even ambitious AGI goals are not out of reach. By constructing the recursive Agent system I envision, the required funding might not be massive, and it depends more on innovative wisdom. I believe that as long as you have sufficiently deep thinking and technical capabilities, even if you're not an industry giant, you can participate in the AGI process.
Mobvoi's journey also confirms these thoughts. We became one of China's first AI companies in 2012, starting from voice assistants, then exploring intelligent hardware (like TicWatch, TicMirror). Although we experienced market competition and immature technology challenges, we always stayed at the forefront.
After 2019, we shifted to software, becoming one of the first AIGC software companies in China and globally. For example, Magic Voice Workshop contributed a lot of dubbing content to platforms like TikTok, and we also developed Qimiaoyuan (digital human video generation) and other products.
In a competitive environment like China, a tech company is like an Agent that continuously iterates and self-corrects.
Just as Mobvoi's "source code" is vastly different from when we first started in 2012, this is a testament to our continuous evolution.