03Section

Hardware & Experiments

The test rig

The pipeline was deployed on a real Husqvarna Automower in an outdoor test environment. Onboard compute is intentionally modest; the edge node carries the heavy reasoning over a local link.

Robot

Husqvarna Automower 450X Nera

Production residential mower used as the autonomy platform.

Compute

Raspberry Pi 4

Onboard host running segmentation and orchestration.

Camera

OBSBOT Meet 2

Forward-facing USB camera mounted on the mower deck.

Edge node

MacBook Pro M2 Pro

Local edge workstation hosting the candidate VLMs.

Husqvarna Automower 450X Nera
Husqvarna Automower 450X Nera
Raspberry Pi 4 and OBSBOT camera
Onboard compute (Raspberry Pi 4)

Scenarios

What the VLM did, in cases where segmentation alone could not.

01 · Transient obstacle

Running animal

"A dog is running across the mowing path from the left." · obstacle: transient · action: STOP

Dual-frame inference correctly treats motion as transient rather than as a fixed obstacle to navigate around.

02 · Group activity

Picnic on the lawn

"People are sitting on a blanket with food and drinks." · obstacle: activity · action: REROUTE

The VLM recognises the configuration as a group activity, not a list of individual objects, and chooses a respectful detour.

03 · Open-vocabulary

Vegetation reasoning

"Ornamental flowers, not grass to be mowed." · obstacle: static · action: REROUTE

Differentiates ornamental from wild vegetation by visual reasoning · without a predefined class list.

04 · Spatial nuance

Person in context

"A person is standing in the context zone, not the path." · obstacle: none · action: SLOW

Position relative to the mower path is reasoned about, not just detected.

05 · Method ablation

Dual-frame motion

Dual-frame variant outperforms single-frame on moving-subject scenarios.

Showing two consecutive frames gives the VLM enough cue to distinguish movement from static configuration.

06 · Prompt design

Prompt v1 / v2 / v3

Three prompt variants benchmarked on an 18-image curated test set.

Errors became directly attributable to prompt design choices · a diagnostic capability unavailable in segmentation-only systems.

Headline findings

Matched or beat

the segmentation-only baseline across all tested conditions.

Unlocked

open-vocabulary reasoning that is architecturally out of reach for fixed-vocabulary models.

Improved

diagnosability · failures now attributable to model behaviour or prompt design.