模型,智能与走向混沌

很久以前有隔一段时间写总结的习惯,但自那以后也过去好久没有写总结了。

事物发展的又迅速又不迅速。几年时间过去,各种新技术不断出现,很多同学都成为了世界范围内的学术界新星。

0:会思考的语言模型

最近的几个月,各种思考模型如雨后春笋不断出现,我也参与了其中之一的研究。

思考,或者说,inference time scaling方法通过添加额外的thinking token,解决了GPT这种前向模型无法精准生成具有下文依赖性的文本(如大数字计算时,前面的

数位实际上取决于后面数位的进位情况)的问题,从而进一步解锁了很多模型的能力,让强化学习可以泛化到各种日常人类能够见到的任务上。

AI甚至在智商测试中也获得了不错的表现,超过了人类的平均水平。

这让人不禁畅想,未来几年内,可能就会实现超强的人工智能主导一切领域。

无论是OpenAI还是Antropic都做出了这类预测,并以此呼吁加紧限制中国的计算能力发展,保证AGI掌握在美国手中。

有趣的是,这些公司实际上都雇佣了大量中国工程师,甚至负责部分核心模块的开发。

实际上可以说是中国的教育和选拔机制在很大程度支持了这类公司的研发速度,但他们反过来用这些技术去限制中国,可谓是现实中的【农夫与蛇】故事了。

但这只是题外话。

Inference time scaling这个问题可以说是我2021年以来就一直想解决的问题,但我当年人工设计的粗劣inference time scaling方法只是使用一些小型的LLM搞出了几个很强的作诗系统,

并没有进一步深入,也没有到解决数学题的程度。虽然在诗社同学人工盲测中,这个系统确实连续当了好几年的第一,

甚至用无微调的2021年小模型配上inference scaling生成的结果可以显著优于gpt-4。

那么,现在有了thinking模型,而且实验证明thinking模型其实也符合scaling law,确实可以不断发展。

在去年底的时候,刘知远老师拟合了一个曲线,每3个月大模型的【性能效率】可以翻2倍,即inference cost为n的模型在t+3个月可以达到inference cost为2n的模型在t个月的水平。

目前的发展也确实符合这个规律,甚至更快一些。

既然如此,那智能就会飞速发展,AGI就会解决一切,甚至依靠高智力反过来消灭人类吗?

我目前认为,是也不是,是或不是。

1:智商测试与智能

首先,要理解智能是什么。什么是智商?这个最好的理解方式就是门萨的智商测试。

门萨有一套公开的智商测试题目,主要是依靠图像推理来检验人的推理能力,25分钟做35道题,来判断智商。

好事者把题目转成文字去测试,发现目前顶级的模型,如o3,gemini 2.5 pro等获得了接近130的智商。

这些题目是什么?网上其实都是公开的,搜索mensa norway就能自己去做。我的体感是,前28题很简单,基本能轻松的在10分钟左右内全部解决。

29开始难度递增,需要一定时间思考。到32,33题,我单题需要的思考时间就突破了10分钟,34题则需要仔细研究一小时左右才能解出。

35则是一个小时都做不出,看题解才知道答案。这对应的智商大约是135-140。

而这套题全对获得智商评估是145。145智商指的就是初见可以轻松在25分钟内全部解出这些题的水平。

当然,145以上的智商可能在人类中也是很罕见的,虽然定义是人群中+3 sigma的水平,按理说会有千分之一的人智商在145之上,

但实际上之前门萨来清华宣讲的时候,现场智力测试我拿到第一还拿到了纪念品,后来有一次九坤过来也是提了一堆智力测试题,我也拿到了第一和纪念品。(量化公司直接发IPAD)

说明即使是在清华,智商高于【智商135标准】的人可能也并不多。在总人群中很可能没有正态分布显示的那么多。

而且,人类目前对145以上的智商可能并没有很好的度量。

如果AI智商飞速发展,那发展之后是什么呢?发展之后就是在这些智商测试题可以全对,然后没有了,和多年前的围棋AI一样。

如果这些智力过人的AI想要在现实世界中产生巨大的作用,那么他必须解决现实世界中的问题。

2:现实世界的混沌

与智商测试题相比,现实世界的问题,是另一回事。没有什么标准答案或者固定的模式,很多依靠实验。

但实验结果中也包含着理论的部分,使用理论可以推进实验,而实验数据又可以促进理论的研究。

教育就是让人对现实世界中的问题和做法有一个基本了解的。而教育中最重要的部分,我目前的总结,不是任何具体的知识,而是认知混沌。

什么是混沌?混沌没有明确的数学定义。洛伦兹有一个物理学的定义——

Chaos: When the present determines the future but the approximate present does not approximately determine the future.

一个系统未来的状态由当前所确定,但当前状态的近似并不能近似的确定未来的状态,这就是混沌。

微观上细小的误差随着时间而不断放大,最终在某个时间达到宏观的程度。或者说,对未来的预测,会因为当下测量手段的不精密,或建模过于简化等原因而在一定长的时候后变得无法奏效。

当然,更为广义的混沌定义是这样的——在具有确定性规则的系统中难以预测的未来变化。

这里的【确定性规则】并不意味着未来的状态完全由当前所决定,而是可以包含随机性的考量,只不过随机的规则是确定的。但同样,在这类系统中,未来也是很难预测的。

混沌存在在世界的每个角落,而在教育中,最重要的一点,我认为就是对混沌的感知。

在各种棋类游戏中,在近似的场景下,有时候最优的走法选择却大相径庭;

在网络游戏中,1点血的归属,决定了生死和整场比赛的胜负;

三国群雄,司马炎一统天下。

失了一颗马蹄钉,丢了一个马蹄铁;丢了一个马蹄铁,折了一匹战马;折了一匹战马,损了一位国王;损了一位国王,输了一场战争;输了一场战争,亡了一个帝国。

极其偏门的GW.5.1.1和FL.13.4.1却成了XBB大家族中延续最为久远的分支。

以上混沌的一些sample。细究起来,这里面每一步都有逻辑可循,但事物最终的演化,也折射出复杂系统的极度混沌和不可预测性。

那混沌是不可预测的吗?不是。混沌是可以预测的。甚至可以说,智力,更像是【预测混沌的能力】。

通过更加精细的测量,更为精细的建模,更加有效的架构,就可以更加精细,持续时间更长的对混沌进行预测。

目前很多现实中的预测,尤其是传播范围广的预测,往往是基于简单模型的线性预测,无论是【东升西落】,还是【美国永远伟大】;无论是【快速增长】,还是【群体免疫导致病毒增长停滞】;

或是【XX股票一定涨/跌】,【房价永远涨/跌】等,这些都属于简单模型的线性预测。

当然【AI是骗局,只会亏损】和【AGI马上到来/消灭人类】也属于类似的简单线性预测。

这些简单的线性预测可能在某一段短时间有一些效果,也具有很大的传播学效果,甚至确实有不低的概率发生,但并不是什么真理。

很遗憾,世界上绝大部分人相信这些预测。

看破这些简单线性预测在短时间的有效性和较长时间轴的无效性,了解未来的一部分是可预测的,但同时也不是可以完全预测的。

意识到智力的重要性,同时也意识到智力的局限性。

在教育中,如果建立了这样的世界观,那么可以说是成功的教育。

那预测混沌需要什么样的智力?无限。通过复杂和精密的建模和测量,可以预测系统未来的状态,但超过这个时间点,又会陷入混沌,需要更为复杂和精密的建模和测量,对这个的需求是无止境的。

为了探索混沌,我们需要无限的智力。

(总结第一部分完)

Models, Intelligence, and the Path to Chaos

Many years ago, I had the habit of writing summaries periodically, but it’s been quite a while since I last did so.

The development of things is both rapid and not. In recent years, various new technologies have continuously emerged, and many peers have become rising stars in academia worldwide.

0: Language Models That Think

In the past few months, various reasoning models have sprouted up like mushrooms after rain, and I’ve participated in one of these research efforts.

Reasoning, or what’s known as inference-time scaling methods, solves the problem that forward models like GPT cannot accurately generate texts requiring contextual dependencies (for example, in large number calculations, preceding digits are actually determined by carryovers from subsequent digits) by adding extra “thinking tokens.” This advancement unlocks numerous model capabilities, enabling reinforcement learning to generalize across various everyday human tasks.

AI has even performed remarkably well in intelligence tests, surpassing human average levels.

This naturally leads us to imagine that in a few years, superintelligent artificial intelligence might dominate all fields.

Both OpenAI and Anthropic have made such predictions, using them to advocate for intensified restrictions on China’s computational capacity development to ensure AGI remains under American control.

Interestingly, these companies employ large numbers of Chinese engineers, with some even responsible for developing core modules.

In fact, it could be argued that China’s education and selection mechanisms have significantly supported these companies’ research and development speed. Yet they use these very technologies to restrict China—a real-life retelling of the fable “The Farmer and the Snake.”

But this is just a tangent.

The issue of inference-time scaling has been one I wanted to solve since 2021. However, my manually designed, crude inference-time scaling methods back then only used some small LLMs to create powerful poetry generation systems.

These methods didn’t delve deeper or reach the level of solving math problems. Although in blind tests by poetry society members, this system indeed ranked first for several consecutive years.

Even using a non-finetuned small model from 2021 combined with inference scaling could generate results significantly superior to GPT-4.

Now, with reasoning models available, and experiments proving they actually follow scaling laws, we can expect continuous development.

Last year, Professor Zhiyuan Liu fitted a curve showing that every three months, the performance efficiency of large models doubles—meaning a model at inference cost n at time t+3 months achieves the same level as a model at inference cost 2n at time t.

Current developments indeed match this pattern, even slightly faster.

Given this, will intelligence develop rapidly, will AGI solve everything, and might it even destroy humanity through superior intelligence?

My current perspective is both yes and no—both affirmation and negation.

1: IQ Tests and Intelligence

First, we must understand what intelligence is. What constitutes IQ? The best understanding comes from Mensa’s IQ test.

Mensa has publicly available IQ test questions primarily assessing reasoning ability through image-based puzzles—35 questions to be completed in 25 minutes to determine IQ.

Curious enthusiasts converted these questions into text form for testing, finding that current top models like o3 and Gemini 2.5 Pro achieved IQ scores approaching 130.

What do these questions look like? They’re actually publicly available online—you can search “Mensa Norway” and try them yourself. From my experience, the first 28 questions are relatively simple, solvable within about ten minutes.

Questions 29 onward increase in difficulty, requiring progressively more thinking time. By questions 32 and 33, each question takes me over ten minutes. Question 34 requires careful study for about an hour to solve.

Question 35 remains unsolved even after an hour, needing me to check solutions afterward. This corresponds to an IQ of approximately 135-140.

Getting all answers correct yields an estimated IQ of 145. This score represents someone who can easily solve all these problems within the 25-minute timeframe on first encounter.

Certainly, IQs above 145 are rare among humans. While defined as +3 sigma above the population mean (theoretically placing one in a thousand individuals above 145), when Mensa conducted a presentation at Tsinghua University, I topped the on-site IQ test and received a souvenir. Similarly, when a quantitative firm called Jiukun brought in a battery of IQ tests, I again came first and received a gift (a quantitative company directly giving an iPad as a prize).

This suggests even at Tsinghua, individuals with IQs exceeding the 135 threshold might be scarce. Overall, the actual number of people with such high IQs likely doesn’t match normal distribution expectations.

Moreover, humanity currently lacks effective measurement tools for IQs above 145.

If AI intelligence develops rapidly, what comes next? It means these IQ tests will be solved flawlessly—just like Go AIs decades ago.

For these highly intelligent AIs to exert significant influence in the real world, they must address real-world problems.

2: Chaos in the Real World

Compared to IQ tests, real-world problems are entirely different. They lack standard answers or fixed patterns, often relying on experimentation.

Yet experimental results include theoretical components. Theory can advance experiments, while experimental data can promote theoretical research.

Education familiarizes people with real-world problems and approaches. Currently, I believe the most crucial element of education isn’t any specific knowledge, but rather cognitive awareness of chaos.

What is chaos? There’s no precise mathematical definition. Lorenz provided a physics definition:

Chaos: When the present determines the future, but the approximate present does not approximately determine the future.

A system’s future state is determined by its current state, but approximations of the current state cannot approximate future states. Microscopic errors amplify over time, eventually becoming macroscopic deviations. Predicting the future becomes ineffective due to imprecise measurements or oversimplified modeling over extended periods.

A broader definition: unpredictable future changes within systems governed by deterministic rules.

These “deterministic rules” don’t necessarily mean future states are fully determined by current conditions—they may incorporate randomness, but with fixed probabilistic rules. Yet future states remain difficult to predict in such systems.

Chaos exists everywhere in the world. I believe the most critical educational element is cultivating awareness of this chaos.

In various board games, slight scenario differences sometimes result in vastly different optimal moves.

In online games, a single point of health determines life or death and match outcomes.

During the Three Kingdoms era, Sima Yan unified China.

“For want of a nail, the shoe was lost; for want of a shoe, the horse was lost; for want of a horse, the rider was lost; for want of a rider, the battle was lost; for want of a battle, the kingdom was lost.”

Obscure mutations GW.5.1.1 and FL.13.4.1 became the most enduring branches of the XBB lineage.

These samples illustrate chaos. While each step follows logical progression, final outcomes reflect complex systems’ extreme chaoticity and unpredictability.

Is chaos inherently unpredictable? No. Chaos can be predicted. One might even say intelligence resembles “the ability to predict chaos.”

More precise measurements, refined modeling, and effective architectures enable longer, more detailed chaos prediction.

Many popular real-world predictions, especially those widely spread, rely on simplistic linear models. Whether “sunrise in the east, sunset in the west,” “America’s eternal greatness,” “rapid growth,” or “herd immunity stopping virus spread”; whether “Stock XX will surely rise/fall” or “property prices will always rise/fall”—these all represent linear predictions based on simple models.

Similarly, “AI is a scam causing only losses” and “AGI will arrive soon/destroy humanity” also belong to such simplistic linear thinking.

These simple linear predictions might hold effectiveness within limited timeframes and possess strong dissemination effects, even having significant probabilities of occurring. But they aren’t ultimate truths.

Unfortunately, most people believe these predictions.

Recognizing the temporary validity and long-term invalidity of these simple linear predictions—understanding that part of the future is predictable while complete prediction remains impossible—is crucial.

Understanding intelligence’s importance while acknowledging its limitations forms the foundation.

In education, establishing such a worldview constitutes successful education.

What level of intelligence does predicting chaos require? Infinite. Through complex, refined modeling and measurements, we can predict system states for certain periods. Beyond those points, chaos returns, demanding even more sophisticated modeling and measurements—an endless pursuit.

To explore chaos, we require infinite intelligence.

(Conclusion of Part One)

复制

询问


已发布

分类

来自

标签:

评论

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注