Leshem Choshen

Publications

Benchmark Agreement Testing Done Right: A Guide for LLM Benchmark Evaluation
- - Yotam Perlitz
  - Ariel Gera
  - et al.
- 2025
- NeurIPS 2025
DOVE: A Large-Scale Multi-Dimensional Predictions Dataset Towards Meaningful LLM Evaluation
- - Eliya Habba
  - Ofir Arviv
  - et al.
- 2025
- ACL 2025
The ShareLM Collection and Plugin: Contributing Human-Model Chats for the Benefit of the Community
- - Shachar Don-Yehiya
  - Leshem Choshen
  - et al.
- 2025
- ACL 2025
A Hitchhiker's Guide to Scaling Law Estimation
- - Leshem Choshen
  - Yang Zhang
  - et al.
- 2025
- ICML 2025
Compress then Serve: Serving Thousands of LoRA Adapters with Little Overhead
- - Rickard Gabrielsson
  - Jiacheng Zhu
  - et al.
- 2025
- ICML 2025
A Lossless Compression for AI Models
- - Moshik Lanir Hershcovitch
  - Andrew Wood
  - et al.
- 2025
- CLOUD 2025
LiveXiv - A Multi-Modal live benchmark based on Arxiv papers content
- - Nimrod Shabtay
  - Felipe Maia Polo
  - et al.
- 2025
- ICLR 2025
Efficient multi-prompt evaluation of LLMs
- - Felipe Maia Polo
  - Ronald Xu
  - et al.
- 2024
- NeurIPS 2024
NumeroLogic: Number Encoding for Enhanced LLMs' Numerical Reasoning
- - Eliyahu Schwartz
  - Leshem Choshen
  - et al.
- 2024
- EMNLP 2024
Fuse to Forget: Bias Reduction and Selective Memorization through Model Fusion
- - Kerem Zaman
  - Leshem Choshen
  - et al.
- 2024
- EMNLP 2024

Visit Google Scholar

Top collaborators

MS

Michal Shmueli-Scheuer

Michal Shmueli-Scheuer

Distinguished Engineer, AI Benchmarking and Evaluation

MS

Mírian Silva

Mírian Silva

AI Engineer

DH

Danny Harnik

Danny Harnik

STSM, Cloud Storage

ES

Eyal Shnarch

Eyal Shnarch

Manager, Retrieval and Generation