Where AI Is Headed?6 Key Trends That Will Define the Next 5 Years

1.Generation

Generative capabilities are already reaching maturity:

  • Text: report generation, code generation (e.g., Claude Code)
  • Voice: speech synthesis
  • Video: video generation (e.g., Seedance 2.0)

There are two directions I’m particularly optimistic about: AI coding (though Chinese models are still lagging behind), and AI vision (ByteDance is currently doing an impressive job in visual recognition and video generation).

There are rumors that DeepSeek is moving toward multimodality. Hopefully, it won’t go into areas like Nano Banana or Seedance, but instead focus on more practical use cases such as screenshot-based understanding—for example:

  • generating front-end code from website screenshots
  • analyzing stock K-line charts from screenshots

2.Reasoning

Since October 2024, when OpenAI introduced deep reasoning based on Chain-of-Thought (CoT), reasoning capabilities have developed rapidly.

In February 2025, China open-sourced DeepSeek-R1, bringing reasoning capabilities to a much wider audience.

However, overall, reasoning has been the slowest area of progress.

Over the past three years, reasoning hasn’t seen major breakthroughs. Instead, progress has mainly come from:

  • maturing theoretical methods (CoT)
  • launching commercial products (like GPT-o1)
  • improving inference speed (through memory and GPU optimization, as done by DeepSeek)

It may feel like reasoning is improving, but in reality, much of the progress comes from external tools (e.g., theorem provers like Lean), rather than true reasoning ability.

That said, the AI industry has already invested massive time and capital. People urgently need results—otherwise, it will be difficult to justify continued investment.

This is why everyone is now focusing on Agents. Essentially, Agents combine existing AI capabilities with external tools to quickly produce usable results and convert them into commercial value.

In my view, although fundamental breakthroughs in reasoning may be limited, we can still improve effective reasoning performance through:

  • specialized coding models
  • Agent-based enhancement of base models
  • improvements in intent understanding, planning, and multi-agent collaboration
  • memory modules and skill modules within Agents

3.Agents

Agents originated from the idea of deep research workflows:

  1. Generate a research plan through deep reasoning
  2. Connect to external systems and collect data
  3. Generate code to process the data
  4. Produce outputs such as reports (Word/PDF/PPT), Excel data analysis, and visualizations

In February 2025, the closed-source commercial Agent Manus sparked widespread discussion. In February 2026, the open-source free Agent OpenClaw did the same.

At their core, Agents rely on three key capabilities:

  1. Task decomposition, orchestration, assignment, and integration (multi-agent collaboration)
  2. Personalized memory
  3. Skill modules and prompt engineering, often crafted by experts, to guide models toward desired outputs

Agents also extend many external capabilities:

  • Access to external data: web search, APIs, MCP calls, local file systems
  • Control of browsers: scraping, simulated clicks, login/data extraction, form filling
  • Control of local software: directly invoking software functions (CLI) via prompts, without writing code

However, I personally feel that there isn’t a strong demand for deep research in China.

Instead, since base models now support long context windows (up to 1M tokens), the key challenge is: how to use multi-turn conversations to continuously refine user intent, clarify goals, and adjust plans.

Right now, many Agents tend to go off track because they fail to update their intentions and plans based on new context during conversations.


4.Digital Humans

With advances in different technologies, digital humans are becoming increasingly realistic:

  • Visual generation gives them a visual identity (e.g., generating a figure like Luo Yonghao)
  • Voice technology allows them to speak and understand speech (even using a specific person’s voice)
  • Multi-turn dialogue enables continuous conversations
  • Deep reasoning allows them to answer more complex questions
  • Agent technology enables personalized memory, style, and data

For example, a digital human could communicate using Luo Yonghao’s knowledge, expressions, and thinking style.

Currently, digital humans are mainly used in AI-generated film and video, but in the future, they may play a bigger role in AI gaming.


5.Embodied Robots

Embodied robots essentially have two “brains”:

  • A large brain for reasoning, planning, and decision-making
  • A small brain for controlling physical movements

At present, companies like Unitree Robotics have made significant progress in motion control compared to last year.

There are also rumors that Unitree is using DeepSeek-Omni for higher-level reasoning.

However, these technologies may scale faster when applied to systems like drones or robotic dogs.


6.Brain-Computer Interfaces (BCI)

Brain-computer interfaces connect two worlds:

  • On one side: human bioelectric signals and sensory systems
  • On the other: machine movement, vision systems, and speech processing

The challenge is to integrate these systems seamlessly, especially to assist:

  • people with disabilities
  • the deaf and mute

BCI focuses on building a direct connection between the human brain and machines.

For example:

  • Deaf or mute individuals could leverage AI’s vision and speech capabilities
  • People with physical disabilities could control robotic limbs via embodied systems

This is a true rigid demand, even more urgent than elderly care robotics.

However, integrating the human brain and body with machine systems in a coordinated way remains extremely difficult.

This entry was posted in AI and tagged , , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *