top of page

A New Generation of
uguese Large Language Models


March 13, 2024

Almost a year ago, we launched Sabiá-65B, the first LLM specialized in Portuguese, which achieved performance close to GPT-3.5-turbo in Portuguese tasks.

Today we present Sabiá-2, our newest family of language models trained on Portuguese text, especially in the Brazilian domain.

We compared Sabiás with commercial and open-source LLMs in several Brazilian exams:

  • Admission to universities (USP, UNICAMP and Enem)

  • Higher education (Enade 2022 and 2023),

  • Law (OAB),

  • Accounting (CFCES)

  • Medicine (Revalida and residency exams at USP and UNICAMP)

  • Postgraduate degree in computer engineering (Poscomp)

Due to specialization in Portuguese,  we offer Sabiá-2 at a much more affordable cost. See the comparison chart below:

newplot (8).png

Our best model until now, the Sabiá-2 Medium, matches or exceeds the performance of GPT-4 in 23 of 64 exams and surpasses GPT-3.5 in 58 of 64 exams. See the results for the Enade 2022 and 2023 exams below:

newplot (7).png

Note that none of the models were specifically trained to perform these exams. Furthermore, the tests were released at the end of 2023 (with the exception of Enade 2022), after the training date for the Sabiá models. Therefore, it is unlikely that the models were exposed to these questions during training.

For more details, check out our technical report:

How to access the models

The Sabiá-2 models can be accessed in three ways:

In addition, we also have example codes demonstrating how to make requests to our API in Python or JavaScript (link).


We recently integrated our models with LangChain, allowing us to use all of the library's resources, powered by a language model specialized in Brazilian Portuguese and Brazilian culture. This page presents the documentation of our chat and presents an example code of how to create a RAG system in a few steps.

bottom of page