The architecture decision gets asked about frequently in investor conversations, so I want to write it down clearly once.
We chose a YOLO-based detection architecture for Countercheck. Here's the reasoning, what we evaluated, and what we rejected.
The constraint that shaped everything
Logistics environments have throughput requirements. A conveyor belt doesn't pause for your model to run a two-stage detection pipeline. The authentication has to happen at line speed, which means the inference time is bounded by the physical process, not by what's comfortable for the model.
The specific constraint varies by deployment: parcel sorting operations run at different speeds than warehouse goods-in processes. But the directional requirement is consistent. If the latency adds a bottleneck, the product fails regardless of accuracy.
This constraint ruled out architectures that are accurate but slow. Faster R-CNN is more accurate than YOLO on standard benchmarks. It's also slower. For a lab environment where you control the image capture and can take as long as you need, Faster R-CNN is often the right choice. For our deployment context, it isn't.
What YOLO does differently
Most object detection architectures use a two-stage approach: first propose regions that might contain objects, then classify those regions. YOLO does it in one pass. You run the image through the network once and get bounding boxes and class predictions simultaneously.
The single-pass architecture is faster because it eliminates the region proposal step. The tradeoff is accuracy on small objects and cases where multiple objects are densely packed. For product authentication in logistics, neither of these tradeoffs is significant. Products come through individually or in groups where each item is distinct. There are no small objects in the detection sense.
What we evaluated
Faster R-CNN: more accurate on benchmarks, unacceptable inference latency for our use case.
SSD (Single Shot Detector): similar speed to YOLO, lower accuracy on the detection tasks we need for product localization.
EfficientDet: promising on the accuracy-efficiency tradeoff, less mature tooling and community support at the time of evaluation. We might revisit this.
YOLO versions: we evaluated YOLOv3 and YOLOv4 (released April 2020). YOLOv4 improved accuracy significantly over v3 with modest latency increase. We're on YOLOv4 currently with YOLOv5 in evaluation.
The deployment architecture matters more than the model choice
The model selection gets attention because it's legible to investors and technical reviewers. The deployment decisions are harder to explain but equally important.
Running inference at line speed in a logistics environment requires hardware that's in the facility, not in a cloud data center. Network latency to a cloud inference endpoint is incompatible with conveyor belt speeds. We're running on edge hardware. The model has to fit within the compute budget of that hardware.
This constrains the model size more than the architecture choice does. A large YOLO model runs slower than a small YOLO model. Tuning the size-accuracy-speed tradeoff for each deployment environment is ongoing work.
What we'd change
Nothing fundamental about the architecture decision. YOLO for real-time detection in logistics environments is still the right call.
The evaluation process was faster than ideal because we had timing pressure from the first pilot. We made a defensible decision under time pressure and we've accumulated enough production experience to validate it. If we had more time, we would have run more systematic benchmarks on our specific data distribution before committing.
That's the standard tradeoff of early-stage: you make the best decision available with incomplete information, then validate in production.
With gusto, Fatih.