My thesis is on automated testing strategies for mobile applications that generate sensory data and communicate with the cloud. I'm at METU, halfway through my MSc, and I have rewritten the core project three times.
Each time I thought I had understood the problem. Each time I was wrong about what the problem actually was.
First version: testing the UI
The natural place to start with mobile app testing is the UI layer. Buttons do what they say, screens render correctly, the app doesn't crash when you rotate the phone. I knew how to do this, the tooling existed, and I made decent progress quickly.
The problem is that UI correctness is not what makes these apps interesting or difficult. The interesting behavior is in the data pipeline: how sensor readings get captured, buffered when there's no network, and eventually synced to the cloud.
A UI test can pass perfectly while the app is silently dropping half its sensor data. That's not a testing strategy. That's expensive false confidence.
Rewrite one.
Second version: mocking everything
Testing the data pipeline meant dealing with real hardware sensors, real network conditions, and real cloud responses. To avoid that complexity I mocked all of it. Fake sensor, fake network layer, fake cloud.
Tests became fast and deterministic. They also stopped being useful. The failure modes I actually cared about came from the interaction between real components: sensors producing irregular output, connections dropping mid-sync, responses arriving out of order. None of that happens with a well-behaved mock.
I had clean green tests and no confidence in the system.
Rewrite two.
Third version: testing the hard parts
The version I'm working on now doesn't try to eliminate the messiness. It tests how the system handles it. Injecting malformed sensor readings. Simulating dropped connections at specific points in the sync cycle. Verifying recovery behavior.
These tests are slower and harder to write. They also find real bugs. That seems like the right tradeoff.
What I keep coming back to
Before writing a test, ask what it would miss. What conditions could cause the system to fail that this test wouldn't catch?
If you can answer that quickly and the answer doesn't bother you, the test is probably fine. If the list of things it misses is long, the test is measuring the wrong thing.
I still have a few months left on the thesis. I'm reasonably confident I won't rewrite it a fourth time. Reasonably.
With gusto, Fatih.