Mistral

Powerful and less restricted Open Source LLMs from Mistral AI

About

Mistral AI, a company known for its innovative AI models, has recently released two significant models: Mistral 7B and Mixtral 8x7B. Mistral 7B is the first dense model released by Mistral AI, with 7.3 billion parameters, while Mixtral 8x7B is a high-quality sparse mixture of experts model (SMoE) with 46.7 billion total parameters, but it only uses 12.9 billion parameters per token during inference. Mixtral 8x7B is designed to be highly efficient, processing input and generating output at the same speed and cost as a 12.9B model. It is pre-trained on data extracted from the open web and supports multiple languages, including French, German, Spanish, Italian, and English. Mixtral 8x7B has been shown to match or outperform other models such as Llama 2 and GPT-3.5 on various benchmarks. Mistral AI has made these models available for download and open-sourced them to empower users to test and refine their applications. The company's focus on open and accessible AI solutions positions it as a significant player in the rapidly evolving AI technology landscape

Features

Mistral 7B

8k token context length

Trained on a smaller dataset with 7 billion parameters

A small language model that costs considerably less than larger models like GPT-4

Mixtral 8x7B

Handles a 32k-token context

Supports multiple languages, including English, French, Italian, German, and Spanish

Capable of coding and, if fine-tuned, can become an instruction-following model, achieving an 8.3 score on the MT-Bench

These models have shown impressive performance, with Mixtral 8x7B matching or outperforming other models like Llama 2 and GPT-3.5 on various benchmarks. Additionally, Mixtral 8x7B has been optimized for instruction following and has a high MT-Bench score, making it a top-performing open-source model

Limitations

Safety Measures: Mistral introduce safety measures causing it to be less restrictive but still more flexible than OpenAI GPTs.

Factualness: Factual information should be taken with a grain of salt.

Mathematical Reasoning: Mistral models are not good at mathematical reasoning.

Reasoning: Mistal models still falls behind GPT-4 in reasoning but mixtral is said to be better than GPT-3.5.

Usage Tips

System Prompt: Use the system prompt to modify the Mistral behavior. i.e. You are a helpful assistant that responds in concise and clear language.

Temperature: Increase the Temperature if you want more creativity in the generated text.

Top P: Reduce the Top P value if you want more control over the generated text. i.e. Top P = 0.1 will only use the top 10% of the most likely words.

Top K: Reduce the Top K value if you want more control over the generated text. i.e. Top K = 40 will only use the top 40 most likely words.