Contributing to
AI research

Advancing knowledge with Brazilian intelligence

At Maritaca, research is an essential part of our DNA. We continuously invest in developing new technologies, focusing on natural language processing (NLP), Portuguese language models and responsible AI.

Our team actively contributes to the scientific community with papers, studies and collaborations that push the boundaries of what artificial intelligence can do — always with attention to local context and the cultural and linguistic specifics of Brazil.

Scientific publications

Sabiá-4: Technical Report

2026

This technical report introduces Sabiá-4 and Sabiazinho-4, a new generation of language models focused on Brazilian Portuguese. The models were developed in four stages: continued pre-training on Portuguese and Brazilian legal corpora, long-context extension to 128K tokens, supervised fine-tuning, and preference alignment.

MARCA: MAritaca Research Checklist evAluation

2026

Benchmark for evaluating models' ability to find information on the web via breadth-first search, with questions paired with checklists to assess completeness and correctness.

Capitu: Evaluating LLMs on Brazilian Literature Comprehension

2026

Brazilian literature benchmark for evaluating deep comprehension of canonical works by Portuguese language models.

Prosa: Evaluating Brazilian Portuguese Text Generation

2026

Evaluation set for Brazilian Portuguese text-generation quality, covering creative, technical and journalistic writing.

LLM Bias Bench: Measuring Opinion Bias and Sycophancy in LLMs

2026

Benchmark for measuring ideological/opinion bias and sycophancy (excessive agreement) in language models, with comparative analysis of commercial and open models.

BRoverbs — A benchmark for measuring how well LLMs understand Brazilian proverbs

2025

Dataset evaluating LLM comprehension of Brazilian proverbs, addressing gaps in existing benchmarks for linguistic and cultural nuances of Portuguese.

ClassiCC-PT: Building Industrial-Grade Corpora from Common Crawl

2025

Scalable methods to generate high-quality Portuguese corpora from Common Crawl. Result: 120 billion tokens.

OAB-Bench: Automatic Evaluation of LLM Legal Writing

2025

Automatic evaluation of legal writing using questions from the 2nd phase of Brazil's Bar exam.

TiEBe: Tracking LLM Memory of Global Events Over Time

2025

Benchmark with over 17,000 questions about global events to evaluate temporal knowledge of models.

Relations between domain specialization and model size

2025

Investigation of pre-training laws for models of varying sizes on general vs. specialized data.

Sabiá-3: Technical Report

2024

Technical report for Sabiá-3, the previous generation of the Sabiá family.

Sabiá-2: A New Generation of Portuguese Large Language Models

2024

Sabiá-2 models with advances in fluency and benchmark performance on Brazilian tasks.

Sabiá: Portuguese Large Language Models

2023

First technical report of the Sabiá family, showing significant gains in PT-BR over generalist models.

Juru: A Brazilian Legal LLM Trained on Reputable Sources

2024

First LLM trained on Brazilian legal data from reputable sources.

GPT-3.5 and GPT-4 evaluated on the ENEM

2023

Study showing Chain-of-Thought gains on Brazilian college-entrance exams.

BLUEX: A multimodal benchmark based on USP and UNICAMP exams

2023

Multimodal benchmark based on USP and UNICAMP entrance exams.

For research and education

Supporting academic and scientific projects with Brazilian technology

Maritaca believes that the advancement of artificial intelligence must walk hand in hand with education and open research. That's why we offer free API credits to students, professors and researchers who want to use the Sabiá family — LLMs specialized in Portuguese — in teaching or scientific projects.

This initiative seeks to democratize access to our technology, contributing to the training of new professionals, the development of relevant solutions and the production of knowledge in the Portuguese AI ecosystem.

Submit your application