After raising $415 million in its second round of funding in a year and taking the value of the company to approximately $2 billion, Mistral AI, a Paris startup, released its latest large language model, with the model architecture seemingly similar to that of GPT-4, in a totally inconspicuous way.
The company posted a link to the download link on X.
magnet:?xt=urn:btih:5546272da9065eddeb6fcd7ffddeef5b75be79a7&dn=mixtral-8x7b-32kseqlen&tr=udp%3A%2F%https://t.co/uV4WVdtpwZ%3A6969%2Fannounce&tr=http%3A%2F%https://t.co/g0m9cEUz0T%3A80%2Fannounce
RELEASE a6bbd9affe0c2725c1b7410d66833e24
— Mistral AI (@MistralAI) December 8, 2023
Can't blame Mistral AI, as Google Gemini solely captured the tech world's attention.
One X user commented that companies usually post a blog on their website first and then announce the product on social media. Mistral AI did the opposite; the blog came three days later.
An open-source alternative to GPT?
A step up from its previous model, Mistral-7B-v0.1, Mixtral-8x7B aims to improve how machines comprehend and generate text. To conceptualize it, envision a collaborative team of specialized experts, each possessing expertise in a distinct area, just like ChatGPT.
The latest release is a high-quality, sparse mixture of expert models (SMoE) licensed under Apache 2.0. Outperforming Llama 2-70B on most benchmarks with 6x faster inference, Mixtral stands out as the strongest open-sourced model.
Notably, it also surpasses GPT-3.5 on standard benchmarks. It exhibits capabilities such as handling a context of 32k tokens, supporting multiple languages, excelling in code generation, and achieving a score of 8.3 on MT-Bench as an instruction-following model.
"On MT-Bench, it reaches a score of 8.30, making it the best open-source model, with a performance comparable to GPT3.5," said the company blog.
It can handle specialized tasks
Despite having 45B total parameters, Mixtral operates effectively with only 12B parameters per token. This allows it to process input and generate output at the speed and cost equivalent to a 12B model.
Excited to release @MistralAI La Plateforme!
Three chat endpoints with competitive pricing:
Mistral-tiny: Mistral 7B Instruct v0.2, upgraded base model with higher context length 8K –> 32K and better finetuning, 6.84 –> 7.61 on MT Bench.
Mistral-small: Mistral 8x7B Instruct… pic.twitter.com/hTUhTB5VdZ— Devendra Chaplot @ #NeurIPS2023 (@dchaplot) December 11, 2023
Mixtral undergoes pre-training on data extracted from the open web, with simultaneous training of both experts and routers.
Only a couple of months ago, the company had its first round of funding, in which it raised $113 million, taking the company's valuation to $260 million.
Mistral AI is also committed to open-sourcing its technology, making its computer code freely accessible for copying, modification, and reuse. This approach contrasts with rivals like OpenAI and Google, who express concerns that open-source technology may pose risks, potentially enabling the spread of disinformation and other harmful content.
Originally published on Interesting Engineering : Original article