Source report:

“DeepSeek Debates: Chinese Leadership On Cost, True Training Cost, Closed Model Margin Impacts” by SemiAnalysis

DeepSeek took the world by storm. For the last week, DeepSeek has been the only topic that anyone in the world wants to talk about. As it currently stands, DeepSeek daily traffic is now much higher than Claude, Perplexity, and even Gemini.

But to close watchers of the space, this is not exactly “new” news. We have been talking about DeepSeek for months (each link is an example). The company is not new, but the obsessive hype is. SemiAnalysis has long maintained that DeepSeek is extremely talented and the broader public in the United States has not cared. When the world finally paid attention, it did so in an obsessive hype that doesn’t reflect reality.

We want to highlight that the narrative has flipped from last month, when scaling laws were broken, we dispelled this myth, now algorithmic improvement is too fast and this too is somehow bad for Nvidia and GPUs.

  • Pennomi@lemmy.world
    link
    fedilink
    English
    arrow-up
    19
    arrow-down
    1
    ·
    1 month ago

    I may be in the minority, but training efficiency didn’t matter nearly as much as runtime efficiency. And neither matter as much as whether the weights are open or not.

    The disruption isn’t in cost, it’s in being hackable with comparable quality.

    • Alphane Moon@lemmy.worldOPM
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 month ago

      While the full report requires a subscription, they do have a section titled “DeepSeek subsidized inference margins”.

      This is from the intro to that section:

      MLA is a key innovation responsible for a significant reduction in the inference price for DeepSeek. The reason is MLA reduces the amount of KV Cache required per query by about 93.3% versus standard attention. KV Cache is a memory mechanism in transformer models that stores data representing the context of the conversation, reducing unnecessary computation.

      As discussed in our scaling laws article, KV Cache grows as the context of a conversation grows, and creates considerable memory constraints. Drastically decreasing the amount of KV Cache required per query decreases the amount of hardware needed per query, which decreases the cost. However we think DeepSeek is providing inference at cost to gain market share, and not actually making any money. Google Gemini Flash 2 Thinking remains cheaper, and Google is unlikely to be offering that at cost. MLA specifically caught the eyes of many leading US labs. MLA was released in DeepSeek V2, released in May 2024. DeepSeek has also enjoyed more efficiencies for inference workloads with the H20, due to higher memory bandwidth and capacity compared to the H100. They have also announced partnerships with Huawei but very little has been done with Ascend compute so far.

      It seems that at least some LLM models from Google offer lower inference cost (while likely not being subsidized).

  • Alphane Moon@lemmy.worldOPM
    link
    fedilink
    English
    arrow-up
    3
    arrow-down
    1
    ·
    edit-2
    1 month ago

    While it does seem like DeepSeek has made some important achievements in terms of compute efficiency (although I still remain sceptical), it’s funny how so many people (including some in the media) uncritically believed their initial claims.

    A similar situation to all the astro surfing around users switching to [competing Chinese social network] with respect to the possible US TikTok ban. I don’t deny that this may have happened on some level, but it’s clear that the internet is an easy avenue for botting and creating fake sentiment.