About me

Founding member (#15) at xAI reasoning RL, midtraining, pretraining and scaling. I’m excited about pushing the frontier of capable AI systems and recursive self-improvement.

xAI

Grok 4.20 — Lead the team responsible for reasoning RL training algorithm and scaling.

Grok 4 Fast — Lead of Grok 4 Fast and the Reasoning Efficiency team. Delivered models 10x more efficient than Grok 4 at matching capability. Responsible for midtraining, RL scaling, posttraining, long context serving, and release coordination — resulting in the most used xAI API model by far.

Grok 4 — Foundational contributor to Grok 3 Reasoning and Grok 4. RL training stability improvements that helped enable 100x reasoning RL scaling.

Grok 3 Mini Reasoning — Sole contributor of the Grok 3 mini-high reasoning model, reaching performance competitive with o3-mini-high.

Grok 3 Pretraining — Foundational contributor to Grok 3 pretraining, the most capable base model at the end of 2024.

Grok 1.5 and 2 — Led midtraining, long context, and post-training capability research. Personally responsible for all deliverables in the Grok 1.5 release.

Before xAI

PhD student at the Institute of Mathematics, Polish Academy of Sciences, advised by Piotr Miłoś, focusing on LLM efficiency, long context, math reasoning, and code generation.

Previously Student Researcher at Google, mentored by Yuhuai Wu and Christian Szegedy. Grateful to Henryk Michalewski and Łukasz Kaiser for supervising my bachelor thesis.

I authored Focused Transformer (LongLLaMA) - one of the earliest algorithmic works providing a solution for context extension and extrapolation (up to 256k) in large language models, back in 2023.

My early work Hierarchical Transformers Are More Efficient Language Models pioneered the direction of tokenizer-free language models and hierarchical transformers. I also published papers on using language models for automated theorem proving and was part of N2Formal team at Google.

Publication

Focused Transformer: Contrastive Training for Context Scaling Szymon Tworkowski, Konrad Staniszewski, Mikołaj Pacek, Yuhuai Wu, Henryk Michalewski, Piotr Miłoś

NeurIPS 2023

Hierarchical Transformers Are More Efficient Language Models Piotr Nawrot*, Szymon Tworkowski*, Michał Tyrolski, Łukasz Kaiser, Yuhuai Wu, Christian Szegedy, Henryk Michalewski

NAACL 2022, Findings

Magnushammer: A Transformer-based Approach to Premise Selection Maciej Mikuła*, Szymon Tworkowski*, Szymon Antoniak*, Albert Qiaochu Jiang, Jin Peng Zhou, Christian Szegedy, Łukasz Kuciński, Piotr Miłoś, Yuhuai Wu

ICLR 2024 (Top 5%)

Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers Albert Q. Jiang, Wenda Li, Szymon Tworkowski, Konrad Czechowski, Tomasz Odrzygóźdź, Piotr Miłoś, Yuhuai Wu, Mateja Jamnik

NeurIPS 2022

Formal Premise Selection With Language Models Szymon Tworkowski*, Maciej Mikuła*, Tomasz Odrzygóźdź*, Konrad Czechowski*, Szymon Antoniak*, Albert Q. Jiang, Christian Szegedy, Łukasz Kuciński, Piotr Miłoś, Yuhuai Wu

AITP 2022

Analysing The Impact of Sequence Composition on Language Model Pre-Training Yu Zhao, Yuanbin Qu, Konrad Staniszewski, Szymon Tworkowski, Wei Liu, Piotr Miłoś, Yuxiang Wu, Pasquale Minervini

ACL 2024

Structured Packing in LLM Training Improves Long Context Utilization Konrad Staniszewski*, Szymon Tworkowski*, Sebastian Jaszczur, Yu Zhao, Henryk Michalewski, Łukasz Kuciński, Piotr Miłoś

AAAI 2025

Explaining Competitive-Level Programming Solutions using LLMs Jierui Li, Szymon Tworkowski, Yingying Wu, Raymond Mooney

ACL2023 NLRSE workshop

Education

PhD, Institute of Mathematics, Polish Academy of Sciences2023 — dropped out
MS, Machine Learning, University of Warsaw2021 — 2023
BSc, Computer Science, University of Warsaw2018 — 2021

Invited talks