03Section

Hardware & Experiments

The test rig

The pipeline was deployed on a real Husqvarna Automower in an outdoor test environment. Onboard compute is intentionally modest; the edge node carries the heavy reasoning over a local link.

Robot

Husqvarna Automower 450X Nera

Production residential mower used as the autonomy platform.

Compute

Raspberry Pi 4

Onboard host running segmentation and orchestration.

Camera

OBSBOT Meet 2

Forward-facing USB camera mounted on the mower deck.

Edge node

MacBook Pro M2 Pro

Local edge workstation hosting the candidate VLMs.

Raspberry Pi 4 and OBSBOT camera — Onboard compute (Raspberry Pi 4)

Scenarios

What the VLM did, in cases where segmentation alone could not.

01 · Transient obstacle

Running animal

"A dog is running across the mowing path from the left." · obstacle: transient · action: STOP

Dual-frame inference correctly treats motion as transient rather than as a fixed obstacle to navigate around.

02 · Group activity

Picnic on the lawn

"People are sitting on a blanket with food and drinks." · obstacle: activity · action: REROUTE

The VLM recognises the configuration as a group activity, not a list of individual objects, and chooses a respectful detour.

03 · Open-vocabulary

Vegetation reasoning

"Ornamental flowers, not grass to be mowed." · obstacle: static · action: REROUTE

Differentiates ornamental from wild vegetation by visual reasoning · without a predefined class list.

04 · Spatial nuance

Person in context

"A person is standing in the context zone, not the path." · obstacle: none · action: SLOW

Position relative to the mower path is reasoned about, not just detected.

05 · Method ablation

Dual-frame motion

Dual-frame variant outperforms single-frame on moving-subject scenarios.

Showing two consecutive frames gives the VLM enough cue to distinguish movement from static configuration.

06 · Prompt design

Prompt v1 / v2 / v3

Three prompt variants benchmarked on an 18-image curated test set.

Errors became directly attributable to prompt design choices · a diagnostic capability unavailable in segmentation-only systems.

Headline findings

Matched or beat

the segmentation-only baseline across all tested conditions.

Unlocked

open-vocabulary reasoning that is architecturally out of reach for fixed-vocabulary models.

Improved

diagnosability · failures now attributable to model behaviour or prompt design.