Paulo Cavalin

Title

Research Scientist

Bio

Paulo Cavalin is a Research Scientist of the Conversational Intelligence Group, at IBM Research - Brazil, conducting both theoretical and applied research in Machine Learning, with particular focus on Natural Language Processing problems such as text classification and machine translation for conversational systems. Currently he is working with Foundation Models, focusing on understand their applicability for endangered, very-low-resource, languages such as Brazilian Indigenous Languages.

He holds a Ph.D. degree in Automated Production Engineering/Computer Science from École de Technologie Supérieure (ETS) - Université du Québec, Montreal (QC) - Canada, obtained in 2011, and 15+ years of experience in research in AI-related areas such as Machine Learning, Pattern Recognition, Computer Vision, and Social Data Analytics.

He is also an author of dozens of peer-reviewed scientific papers and an inventor of several patents. A detailed list of can be found at Google Scholar profile.

Publications

Sentence-level Aggregation of Lexical Metrics Correlate Stronger with Human Judgements than Corpus-level Aggregation
- - Paulo Rodrigo Cavalin
  - Pedro Henrique Leite Da Silva Pires Domingues
  - et al.
- 2025
- AAAI 2025
Fixing Rogue Memorization in Many-to-One Multilingual Translators of Extremely-Low-Resource Languages by Rephrasing Training Samples
- - Paulo Rodrigo Cavalin
  - Pedro Domingues
  - et al.
- 2024
- NAACL 2024
Theoretical and Empirical Advantages of Dense-Vector to One-Hot Encoding of Intent Classes in Open-World Scenarios
- - Paulo Rodrigo Cavalin
  - Claudio Santos Pinhanez
- 2024
- LREC-COLING 2024
Quantifying the Ethical Dilemma of Using Culturally Toxic Training Data in AI Tools for Indigenous Languages
- - Pedro Domingues
  - Claudio Santos Pinhanez
  - et al.
- 2024
- LREC-COLING 2024
Human Evaluation of the Usefulness of Fine-Tuned English Translators for the Guarani Mbya and Nheengatu Indigenous Languages
- - Claudio Santos Pinhanez
  - Paulo Rodrigo Cavalin
  - et al.
- 2024
- PROPOR 2024
Training Large Language Encoders with the Curated Carolina Corpus
- - Guilherme Lamartine Mello
  - Paulo Rodrigo Cavalin
  - et al.
- 2024
- PROPOR 2024
Balancing Social Impact, Opportunities, and Ethical Constraints of Using AI in the Documentation and Vitalization of Indigenous Languages
- - Claudio S. Pinhanez
  - Paulo Cavalin
  - et al.
- 2023
- IJCAI 2023
Understanding Native Language Identification for Brazilian Indigenous Languages
- - Paulo Rodrigo Cavalin
  - Pedro Henrique Leite Da Silva Pires Domingues
  - et al.
- 2023
- ACL 2023
Using meta-knowledge mined from identifiers to improve intent recognition in conversational systems
- - Claudio Pinhanez
  - Paulo Cavalin
  - et al.
- 2021
- ACL-IJCNLP 2021
Towards a Method to Classify Language Style for Enhancing Conversational Systems
- - Paulo Cavalin
  - Victor Ribeiro
  - et al.
- 2021
- IJCNN 2021

Visit Google Scholar

Projects

Conversational Intelligence
Exploring AI-based conversational systems in a human-centered approach

Top collaborators

Paulo Cavalin

Title

Bio

Publications

Sentence-level Aggregation of Lexical Metrics Correlate Stronger with Human Judgements than Corpus-level Aggregation

Fixing Rogue Memorization in Many-to-One Multilingual Translators of Extremely-Low-Resource Languages by Rephrasing Training Samples

Theoretical and Empirical Advantages of Dense-Vector to One-Hot Encoding of Intent Classes in Open-World Scenarios

Quantifying the Ethical Dilemma of Using Culturally Toxic Training Data in AI Tools for Indigenous Languages

Human Evaluation of the Usefulness of Fine-Tuned English Translators for the Guarani Mbya and Nheengatu Indigenous Languages

Training Large Language Encoders with the Curated Carolina Corpus

Balancing Social Impact, Opportunities, and Ethical Constraints of Using AI in the Documentation and Vitalization of Indigenous Languages

Understanding Native Language Identification for Brazilian Indigenous Languages

Using meta-knowledge mined from identifiers to improve intent recognition in conversational systems

Towards a Method to Classify Language Style for Enhancing Conversational Systems

Patents

Domain Adaptation-based Disguising Of Prompts For Data Privacy In Foundation Models

Estimate Ore Content Based On Spatial Geological Data Through 3d Convolutional Neural Networks

Automated Machine Learning Model Selection

Conversational Systems Content Related To External Events

Conversational Systems Content Related To External Events

Concept Prediction To Create New Intents And Assign Examples Automatically In Dialog Systems

Stratigraphic Layer Identification From Seismic And Well Data With Stratigraphic Knowledge Base

Adapting Conversational Agent Communications To Different Stylistic Models

Product Quality Analysis And Control

Stratigraphic Layer Identification From Seismic And Well Data With Stratigraphic Knowledge Base

Projects

Conversational Intelligence

Top collaborators

Claudio Pinhanez

Julio Nogima

Heloisa Candello