Johannes Roth

I'm a computer scientist and ML engineer with a background in business information systems, applied data science, and computational neuroscience.

Right now, I'm finishing a PhD in Computer Science / Machine Learning at the Max Planck Institute in Leipzig, where I build large-scale data and ML infrastructure for NeuroAI and visual neuroscience. Before that, I worked on production-facing ML services, recommender systems, image-processing pipelines, and product data workflows.

Hello from the other side! If you're here, it means you want to learn more about me personally - so I'll keep things more casual!

Here you'll find some info about me and my hobbies, my taste in music, movies and books, and some notes on things I'm thinking about.

Black-and-white portrait of Johannes Roth in a suit

What I can do

Software engineering

Build reliable Python-backed software

Web backends, APIs, services, Linux deployments, automation, and maintainable applications that people can actually use.

Python · Django · Flask · Nginx · Gunicorn · Linux · CI/CD

ML & data systems

Turn data into working models

Computer vision, image processing, recommender logic, model evaluation, data collection, and deployment-oriented inference workflows.

PyTorch · TensorFlow · scikit-learn · CLIP · Redis · BigQuery

Business technology

Understand the product side

I studied business information systems and have worked on product matching, travel search, quality scoring, analytics, monitoring, and customer-facing data problems.

PostgreSQL · AWS · Grafana · data pipelines · analytics

Research depth

Handle ambiguity and scale

I can translate ambiguous domain questions into concrete datasets, experiments, codebases, and infrastructure with clear evaluation loops.

open source · HPC · datasets · reproducible workflows

Experience

Education
2022 - Present Doctorate

PhD in Computer Science / Machine Learning

NeuroAI, visual neuroscience, and large-scale data infrastructure

Hebart Lab · Max Planck Institute & University of Gießen

Researching NeuroAI approaches for efficient experimental design in visual neuroscience: choosing better stimuli from large naturalistic image spaces, comparing model and brain representations, and improving fMRI dataset coverage. Built the ML/data infrastructure behind these workflows, including public datasets and re:vision challenge infrastructure.

Python · PyTorch · CLIP · scikit-learn · SLURM · Docker

2018 - 2021 Education

M.Sc. Computer Science

Machine learning, data analysis, and medical image processing

Leipzig University

Studied applied machine learning and data analysis, graduating with grade 1.2. Thesis used GANs to synthesize stimuli that maximally activate targeted brain regions. Project work ranged from clothing-color recognition with segmentation models to EEG eigenspectra for states of consciousness and cloud-formation classification from satellite imagery.

Machine learning · GANs · Segmentation · EEG · Satellite imagery

2014 - 2018 Education

B.Sc. Business Information Systems

Computer science foundation with business-facing analytics and systems thinking

Leipzig University

Built a foundation across computer science, data systems, software engineering, and business-facing analytics, graduating with grade 1.5. Thesis extended a software-visualization framework to ABAP. Other projects included ABAP and full-stack work on an in-house metering app at GISA and a full e-commerce build spanning store implementation, SEO, catalogue work, design, and marketing.

Software engineering · ABAP · Full-stack development · Databases · E-commerce

Professional Experience
Jun 2021 - May 2022 Research

Research Assistant - ML in Medicine

Medical imaging models, segmentation, and uncertainty estimation

ScaDS.AI Dresden/Leipzig

Implemented and adapted recent deep-learning papers for medical imaging, especially brain-tumor segmentation and survival prediction. Built preprocessing, training, and evaluation workflows, ran experiments, and analyzed uncertainty in model predictions.

PyTorch · FT-Transformer · SAINT · UNet++

Oct 2019 - May 2021 Industry

Data Scientist / ML Engineer (Working student)

Production ML service for hotel images and travel product data

CHECK24 (Travel Vertical)

Built applied ML and backend features for the travel product. Designed and deployed a Flask + Redis service for fast image inference across millions of hotel images, using the outputs for deduplication, retrieval, classification, quality scoring, and recommendation tuning. Also worked on PHP backend features for the booking engine and outlier-detection workflows for hotel prices.

Python · PyTorch · Flask · Redis · PHP · BigQuery · Grafana

Show other experience Hide other experience freelance and earlier ML roles
Oct 2020 - May 2021 Industry

Full-stack Developer (Freelance)

Django sites, Linux hosting, and deployment automation

Kimetric UG

Delivered full-stack web projects for academic clients, from backend implementation to deployment. Built Django-based websites, configured Linux servers with Nginx and Gunicorn, and wrote deployment scripts so the sites could be maintained reliably after handoff.

Django · Nginx · Gunicorn · Linux · CI/CD

Oct 2018 - Oct 2019 Industry

Data Scientist (Working student)

Product matching, scraping, and model evaluation workflows

Webdata Solutions GmbH (now Vistex)

Worked on product matching for e-commerce data, where the hard part was turning noisy scraped web data into usable training and evaluation datasets. Rebuilt parts of the matching pipeline with a neural-network approach, improving matching accuracy from below 50% to 92%.

Python · TensorFlow · PostgreSQL · AWS

Datasets & Tools

Dataset ReLAION-2B Natural Naturalness scores for 2.1B images, identifying ~500M photographic images for vision research 167 GB · CLIP ViT-H/14
Library thingsvision Unified feature extraction API for 100+ vision models Core contributor · Python · 460k+ downloads
Dataset LAION-fMRI Deeply sampled 7T fMRI responses to 25k natural images for NeuroAI and visual neuroscience 5 subjects · 165 sessions · GLMsingle betas

Publications

  • 2026 Characterizing Universal Object Representations Across Vision Models Mahner*, Roth* et al. / arXiv preprint Shared first author

    We analyzed the object similarity structure of 162 diverse vision models to ask what, if anything, converges across architectures, objectives, and datasets. The paper separates universal from model-specific dimensions and links the more universal structure to interpretability, semantic image properties, macaque IT activity, and human similarity judgments.

  • 2025 How to sample the world for understanding the visual system Roth & Hebart / CCN Oral

    Vision neuroscience runs on large fMRI datasets, but nobody had checked whether the stimulus images in these datasets actually cover what humans see in the real world. We built LAION-natural -a reference distribution of ~120M naturalistic photographs filtered from 2 billion LAION images using a CLIP-based classifier trained on 25k actively sampled labels. Then we measured coverage: ~50% of the visual-semantic space is missing from the two most widely used datasets (NSD and THINGS).

    The good news: you don't need millions of images to fix this. In both simulations and real fMRI data, out-of-distribution generalization saturates at 5-10k samples - as long as you draw them from a diverse enough pool. We compared seven sampling strategies (random, stratified, k-Means, Core-Set, effective dimensionality optimization, active learning) and found that pool diversity matters far more than which algorithm you use to sample from it.

    The pipeline processes billions of images using CLIP embeddings, Annoy indices for nearest-neighbor search, mini-batch k-Means clustering, and Ridge regression encoding models - all at a scale that runs on a university HPC cluster, not a cloud budget.

  • 2025 Ten principles for reliable, efficient, and adaptable coding Roth et al. / Communications Psychology

    Most scientists learn to code informally - picking things up as they go, optimizing for "does it run?" over "will anyone else understand this?" This paper introduces a structured framework for writing better research code, built around the idea that researchers naturally switch between quick prototyping and careful development - and that being deliberate about which mode you're in makes all the difference.

    The ten principles span three tiers: organizing code (standardized project structures, version control, automation), writing reusable code (testing, documentation, clean interfaces), and collaborating (code review systems, shared knowledge bases, lab-wide standards). Already at 22k+ accesses, it clearly hit a nerve - these are problems every computational lab deals with but rarely talks about explicitly.

  • 2025 Fine-grained image and category information in ventral visual pathway Badwal, Bergmann, Roth et al. / J Neuroscience
Show older publications Hide older publications fMRI methods, GAN-based neuroscience, brain tumor segmentation

Recognition / service

2025 re:vision initiative LAION-fMRI consortium replication challenge · infrastructure for image-related neuroscience
2025 CMBB Replication Award for reliable coding practices in neuroscience
MPI CBS PhD student representative graduate program, MPI CBS & University of Gießen
2024/25 CNNvis interactive CNN visualization used for public ML outreach at Gießen Science Days

About Me

Johannes bouldering on a rock face in a mountain valley

Movement

My love for moving my body started with joining a breakdance crew at the age of 17. Since then I've dabbled in all kinds of movement disciplines. Today, I really enjoy bouldering, but I also do resistance training, yoga and more recently Qi Gong (good way to wake up the body in the morning!). I also love hiking in nature (and combining it with climbing is the best way to spend a day, see above).

Johannes holding an electric guitar

Music

A couple of years ago, I asked a bunch of friends, "Hey, want to start a band?" To my surprise, everyone said yes! We had to drag our front singer into it, but now he regularly screams his heart out. We mostly play pop punk covers for friends' birthday parties, but the dream of writing our own songs is still alive and kicking.

A colorful plate of food on a wooden table

Food

As a kid, one of my teachers would say "Johannes lives off plain air", because I refused to eat food from the school kitchen. Either I was very picky, or the food was just plain bad. Today I eat almost anything - but I love throwing together a plate of whole food ingredients into something delicious. Lately, I enjoyed hosting friends for dinner nights, which have been a blast (but surprisingly stressful!).

A quiet forest path

Meditation

I've been meditating for over a decade now - although it's an on-off relationship. I even attended a 10-day silent meditation retreat in 2024, which has been pretty life changing for me! This is an image from the forest near the meditation hall - I walked that path probably 100 times and discovered something new every time.

Shelves

Currently reading

Art & Fear cover
Art & Fear David Bayles & Ted Orland
The Mind Illuminated cover
The Mind Illuminated Culadasa

Music

I listen to lots of stuff - but Spotify knows me best, so here is my weekly discovery playlist for your perusal.

Say Hi

Happy to hear from friends, collaborators, future coworkers, or people who want to talk about brains, tools, climbing, books, or dinner. Email is best.

Get in Touch

Happy to chat about research, potential collaborations, or opportunities. Email is best. Also on LinkedIn, GitHub, Hugging Face, and Google Scholar.