About me
Founding member (#15) at xAI reasoning RL, midtraining, pretraining and scaling. I’m excited about pushing the frontier of capable AI systems and recursive self-improvement.
xAI
Grok 4.20 — Lead the team responsible for reasoning RL training algorithm and scaling.
Grok 4 Fast — Lead of Grok 4 Fast and the Reasoning Efficiency team. Delivered models 10x more efficient than Grok 4 at matching capability. Responsible for midtraining, RL scaling, posttraining, long context serving, and release coordination — resulting in the most used xAI API model by far.
Grok 4 — Foundational contributor to Grok 3 Reasoning and Grok 4. RL training stability improvements that helped enable 100x reasoning RL scaling.
Grok 3 Mini Reasoning — Sole contributor of the Grok 3 mini-high reasoning model, reaching performance competitive with o3-mini-high.
Grok 3 Pretraining — Foundational contributor to Grok 3 pretraining, the most capable base model at the end of 2024.
Grok 1.5 and 2 — Led midtraining, long context, and post-training capability research. Personally responsible for all deliverables in the Grok 1.5 release.
Before xAI
PhD student at the Institute of Mathematics, Polish Academy of Sciences, advised by Piotr Miłoś, focusing on LLM efficiency, long context, math reasoning, and code generation.
Previously Student Researcher at Google, mentored by Yuhuai Wu and Christian Szegedy. Grateful to Henryk Michalewski and Łukasz Kaiser for supervising my bachelor thesis.
I authored Focused Transformer (LongLLaMA) - one of the earliest algorithmic works providing a solution for context extension and extrapolation (up to 256k) in large language models, back in 2023.
My early work Hierarchical Transformers Are More Efficient Language Models pioneered the direction of tokenizer-free language models and hierarchical transformers. I also published papers on using language models for automated theorem proving and was part of N2Formal team at Google.
Publication
Focused Transformer: Contrastive Training for Context Scaling Szymon Tworkowski, Konrad Staniszewski, Mikołaj Pacek, Yuhuai Wu, Henryk Michalewski, Piotr Miłoś
Hierarchical Transformers Are More Efficient Language Models Piotr Nawrot*, Szymon Tworkowski*, Michał Tyrolski, Łukasz Kaiser, Yuhuai Wu, Christian Szegedy, Henryk Michalewski
Magnushammer: A Transformer-based Approach to Premise Selection Maciej Mikuła*, Szymon Tworkowski*, Szymon Antoniak*, Albert Qiaochu Jiang, Jin Peng Zhou, Christian Szegedy, Łukasz Kuciński, Piotr Miłoś, Yuhuai Wu
Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers Albert Q. Jiang, Wenda Li, Szymon Tworkowski, Konrad Czechowski, Tomasz Odrzygóźdź, Piotr Miłoś, Yuhuai Wu, Mateja Jamnik
Formal Premise Selection With Language Models Szymon Tworkowski*, Maciej Mikuła*, Tomasz Odrzygóźdź*, Konrad Czechowski*, Szymon Antoniak*, Albert Q. Jiang, Christian Szegedy, Łukasz Kuciński, Piotr Miłoś, Yuhuai Wu
Analysing The Impact of Sequence Composition on Language Model Pre-Training Yu Zhao, Yuanbin Qu, Konrad Staniszewski, Szymon Tworkowski, Wei Liu, Piotr Miłoś, Yuxiang Wu, Pasquale Minervini
Structured Packing in LLM Training Improves Long Context Utilization Konrad Staniszewski*, Szymon Tworkowski*, Sebastian Jaszczur, Yu Zhao, Henryk Michalewski, Łukasz Kuciński, Piotr Miłoś
Explaining Competitive-Level Programming Solutions using LLMs Jierui Li, Szymon Tworkowski, Yingying Wu, Raymond Mooney
Education
| PhD, Institute of Mathematics, Polish Academy of Sciences | 2023 — dropped out |
| MS, Machine Learning, University of Warsaw | 2021 — 2023 |
| BSc, Computer Science, University of Warsaw | 2018 — 2021 |
Invited talks
Google DeepMind, Zürich — How to Make LLMs Utilize Long Context Efficiently? Oct 2023
University of Edinburgh, Edinburgh NLP Meeting — How to Make LLMs Utilize Long Context Efficiently? Oct 2023
ACL 2023 — Explaining Competitive-Level Programming Solutions using LLMs (AI Coffee Break interview)
AITP 2022 — Formal Premise Selection With Language Models (recording)
