Why Deutsche Bahn feels unreliable and how probabilities could fix it

Deutsche Bahn is basically Germany's national sport (everyone complains about it, yet everyone talks about it), it's the universal small talk topic and public enemy #1 for punctuality-obsessed Germans. And apparently, there are reasons for that.


During my first semester in Northern Germany, I got my initial taste of train travel. Back then, my semester ticket only worked for local trains in the land where my university was located. But honestly? I was absolutely thrilled. Just the possibility of hopping on a train every hour (sometimes even every half-hour!) to cruise up to the Danish border or gaze at the North Sea felt like magic.


And then came the game-changer. Deutschland-Ticket upgrade arrived for my semester ticket: slightly higher semester fee, but now valid across all of Germany on local transport. In October 2023, during semester break, I decided to put it to the test. I planned a route using only local trains (where the Germany-Ticket works). Started in Kiel, wound through the western part (Hamburg, Bremen, Münster, Dusseldorf, Frankfurt, Stuttgart) down to southern Friedrichshafen, then looped back through central Germany (Regensburg, Leipzig, Magdeburg, Lüneburg). You can check my route here.


In total, it took 31 train changes and 2,600 km to complete the loop. I spent one day in Frankfurt, a few days in Friedrichshafen, and a few days in Regensburg. In other cities, I mostly spent 1-2 hours wandering around the city centers. I didn't have any major problems, neither significant delays (more than 15 minutes) nor cancellations.

Local trains achieve punctuality rates over 90% almost nationwide. Sure, things go wrong sometimes, but significantly less than in long-distance traffic. According to DB's 1H 2025 report, the Germany-Ticket has about 13.5 million monthly users and shows high customer acceptance.


But I should probably admit that my standards might be a bit warped. When I was a kid, there was exactly one night train from my grandparents' village to the city where my parents studied. Miss it? Well, goodnight, see you tomorrow. So yeah, compared to that, hourly regional trains felt like first-class luxury.


My first truly "German DB experience" happened still on a local train to Hamburg. We were literally approaching the destination station when something went wrong with another train's locomotive ahead of us on the tracks. We sat there for about 1.5 hours, watching the clock tick. Honestly, even that wasn't so bad. The tragic part was that I missed my Flixbus connection to Prague by a heartbreaking 10 minutes. I grabbed the next bus (30-40 minutes later) and thought, that at least I can claim compensation for my precious 25-euro replacement ticket.


Fast forward several months and DB politely told me "nein." Compensation declined. But hey, at least they responded! Even after this incident, I remained mostly positive. The trains still ran on a regular schedule, they were comfortable, and most delays were manageable. Plus, that semester ticket kept working its magic.


Then came my experience with long-distance ICE trains. Different story entirely. A few times I took ICE from Berlin to Hamburg, delays every time and twice I ended up spending my entire 2-hour journey standing in the vestibule like a potted plant. The train was so overcrowded because of the cancellation of the previous train. Considering the price (even with significant discounts, it still hurts), staying positive after those trips required Olympic-level mental gymnastics.


According to the DB financial report for the first half of 2025 year: Long-distance trains are only 63.4% on time (where "on time" means delay ≤ 6 minutes). Regional trains are 90.6% on time


So what's wrong with long-distance trains? Several reasons:

  • Outdated, overloaded infrastructure (aging like fine wine, but not in a good way)
  • Construction activity everywhere (gotta fix that infrastructure!)
  • Traffic growth concentrated in already overloaded hubs
  • Staff shortages (who knew trains don't drive themselves?)
  • Individual incidents: bomb defusing, train-truck collisions, knife attacks at stations, fires... you know, the usual Tuesday

DB acknowledges they need better passenger information, especially during disruptions. They're working on more precise arrival/departure forecasts, standardized disruption reports, and flexible alternative searches. The DB Navigator now even considers Germany-Ticket when searching connections!

At some point, I realized. Most of my frustration wasn't actually caused by delays themselves, but by uncertainty. A 10-minute delay is annoying. A 10-minute delay that might turn into 30 minutes, without any warning, is stressful. And that's where Deutsche Bahn has a real opportunity. Not necessarily to eliminate delays overnight, but to make them predictable.

Real-time delay probabilities for specific trains. Imagine opening the DB app and seeing: "Your train to Berlin has a 30% chance of being less than 5 minutes late, 15% chance of 5-20 minutes late, and 5% chance of more than 20 minutes late." This information would be calculated based on: your departure station's historical performance, the specific route, previous journeys on that line today, weather conditions, current incidents or construction.


Right now, there's a massive information asymmetry. DB sits on mountains of delay statistics for every route, station, and train type. Passengers? We just stare at the platform display, hoping for the best and preparing for the worst.


Sure, this won't fix the actual problems causing delays. But it would help passengers make informed decisions. Instead of the stressful uncertainty of "will I make my meeting?" you'd have a risk management situation: "Okay, 40% chance of significant delay, I'll message my colleagues now."


This could even benefit DB. If passengers see high delay risks, they might choose alternatives, reducing pressure on vulnerable routes and stations during critical times. Less overcrowding, fewer angry passengers, win-win?


DB already has the data and they could use something like Bayesian networks to move from static historical averages ("this train is late 20% of the time") to dynamic, context-aware predictions: "Given the current weather, time of day, and a signal failure reported 10 minutes ago, your specific train now has a 75% probability of being at least 15 minutes late." One particularly suitable approach is a Bayesian network.

So how would we build this? The answer lies in a Bayesian network. Think of it as a giant web of cause and effect, where DB's historical data meets real-time information to produce actual predictions. The goal: to build a system that gives customers personalized, real-time probability that their specific train will be delayed. Simple as that. User story: "As a customer waiting for ICE 721 from Essen Hbf to Munich Hbf, I want to see the probability of my train being delayed by less than 5, 5-10, or more than 20 minutes, so I can make informed decisions."
First, we need to identify everything that influences train delays. These become the nodes in our network:
  • Target (what we're predicting) is `TrainDelay` with categories like 'On Time (5 min or less)', 'Moderate Delay (5-15 min)', 'Significant Delay (more than 15 min)'
  • Infrastructure and Operations variables: `TrackConstruction` (is there scheduled construction on the route today? Boolean), `SignalFailure` (any reported failures on this line? Real-time feed), `TrainMalfunction` (issues with this specific train unit? Boolean), `StationCongestion` (Low/Medium/High based on time and known bottlenecks)
  • External factors: `WeatherSeverity` (Clear/Rain/Snow/Storm along the route, because German weather hates German trains), `TimeOfDay` (Peak vs. Off-peak, more trains = more chances for things to go wrong), `DayOfWeek` (Weekday/Weekend/Holiday, different traffic patterns)
  • Network effects (work like domino): `PreviousDelay` (is the train arriving from its previous journey already delayed? This is huge, delays propagate through the network like gossip through a small town), `ConnectingPassengerVolume` (High/Medium/Low at major hubs, more people boarding = longer dwell times)
DAG for train delays
DAG for train delays
Now we connect the dots. Arrows point from cause to effect:
  • WeatherSeverity → TrainDelay. Snow and storms mean slower speeds, period.
  • SignalFailure → TrainDelay. Red signal? No movement. Simple rules.
  • TrackConstruction → TrainDelay. Construction zones = speed restrictions = delays.
  • PreviousDelay → TrainDelay. Late inbound almost always means late outbound. The strongest predictor.
  • TimeOfDay → StationCongestion → TrainDelay. Rush hour at Dusseldorf Hbf is chaos. More people, more time, more delay.
  • DayOfWeek → TrainDelay. Weekends have different passenger patterns, and freight traffic loves the night shift.

Each node needs a Conditional Probability Table (CPT) that defines probabilities based on its parents (the factors influencing it).

For example, WeatherSeverity has no parents, so we can use climate data for some time frame: probability of clear weather = 0.60, probability of rain = 0.30, probability of snow = 0.07, probability of storm = 0.03

Here's how different factors combine in a CPT (all numbers are theoretical):
PreviousDelayWeatherSignalFailureP(On Time)P(Moderate)P(Significant)
FalseClearFalse0.850.120.03
FalseClearTrue0.050.250.70
FalseSnowFalse0.400.400.20
TrueRainFalse0.150.350.50
TrueSnowTrue0.010.090.90
See the pattern? A train already late, in snow, with signal problems? That's your 90% probability of significant delay. Good luck, traveler.

Where do these numbers come from? DB has years of historical train movements, weather records, and disruption logs. Machine learning algorithms can extract these probabilities automatically.

Let's break down Row 1 as an example (PreviousDelay=False, Weather=Clear, SignalFailure=False):

Step 1 - filter historical records where: PreviousDelay = False, WeatherSeverity = Clear, SignalFailure = False. Let's say this gives us 10,000 journeys.

Step 2 - among these 10,000 journeys, count delay categories: On Time: 8,500 journeys, Moderate: 1,200 journeys, Significant: 300 journeys

Step 3 - divide: P(On Time) = 8,500/10,000 = 0.85, P(Moderate) = 1,200/10,000 = 0.12, P(Significant) = 300/10,000 = 0.03

The other rows follow the exact same pattern, just with different filters.

Now let's walk through a 'real' example with ICE 721 (Essen Hbf - Munich Hbf)


On zugfinder.net we can find some insights about it's 'punctuality': 17 minutes of average delay (last 30 days), 29% punctuality (up to 5 minutes). But does it really meaningful for us if we want to take this train now? Should I already be worried? Let's use some additional (fictional in our case) information which DB most likely already collects.

First, let's lay out all the numbers we'll need in one place:

Prior probabilities (base rates from historical data)

VariableProbabilityMeaning
P(Delay = Significant)0.1010% of all trains are significantly late (more than 15 min)
P(Delay = Moderate)0.3535% of trains have moderate delays (5–15 min)
P(Delay = On Time)0.5555% of trains are on time (less than 5 min delay)

Evidence observed for ICE 721 (have to be real-time data)

VariableObserved ValueExplanation
PreviousDelayTrueThe inbound train arrived 12 minutes late
WeatherRainIt's raining along the entire route
SignalFailureFalseNo signal issues reported (so far)
ConstructionTrueConstruction work near Düsseldorf
PeakHourTrueCurrent time: 5:00 PM

Likelihoods (how evidence depends on delay)

Instead of directly using P(Delay | Evidence), we model how likely the evidence is given each delay category.

EvidenceOn TimeModerateSignificant
P(PreviousDelay = True | Delay)0.050.200.60
P(Weather = Rain | Delay)0.200.350.50
P(Construction = True | Delay)0.100.250.50
P(PeakHour = True | Delay)0.300.400.55

We use the chain rule to combine everything:

P(D | E) ∝ P(D) × P(PD|D) × P(W|D) × P(C|D) × P(T|D)

Instead of multiplying by independent probabilities, we now apply Bayes' rule. We evaluate how likely the observed evidence is under each delay scenario, and combine it with prior probabilities.

Unnormalized probabilities:

For Significant Delay: 0.10 × 0.60 × 0.50 × 0.50 × 0.55 = 0.00825

For Moderate Delay: 0.35 × 0.20 × 0.35 × 0.25 × 0.40 = 0.00245

For On Time: 0.55 × 0.05 × 0.20 × 0.10 × 0.30 = 0.000165

Normalize, so all probabilities sum to 1

Sum = 0.00825 + 0.00245 + 0.000165 = 0.010865

P(Significant | E): 0.00825 / 0.010865 ≈ 76%
P(Moderate | E): 0.00245 / 0.010865 ≈ 23%
P(On Time | E): 0.000165 / 0.010865 ≈ 1%

This is what you will see on your phone display:

ICE 721 • Essen Hbf - Munich Hbf

Delay Probability

On Time: 1%Moderate: 23%Significant: 76%

Contributing Factors:

  • Previous journey: Delayed (+12 min)
  • Weather: Rain along route
  • Construction near Düsseldorf
  • Peak hour (5:00 PM)
  • No signal failures reported

Consider informing your contact in Munich - significant delay very likely.

The math is really just counting combined with probability theory. DB already has all (or almost all) the data: every delayed train, every weather report, constructions and peak hours. The only missing piece is connecting the dots and putting this information in passengers' hands.

Seeing "68% chance of significant delay" might not make the delay hurt less. But at least you'd know whether to grab that coffee or start running for the S-Bahn alternative.

We're not asking DB to fix every delay overnight. Infrastructure takes time, money, and political will. But information is cheap, DB already has the data. The only thing missing is turning that data into something useful for passengers.