Building a Touchless Flight Tracker with Hand Gestures and AI

Computer Vision

Explore the creation of a touchless flight tracker, 'The Homecoming Board,' using hand gestures. This article details the tech stack, MediaPipe for computer vision, real-time flight data integration, adaptive gesture detection, and custom React hooks for a robust, user-friendly experience.

This article details the development of a touchless flight tracker, an entry for Day 5 of the Advent of AI series. The series utilizes Goose, an open-source AI agent, to rapidly develop various AI-driven applications.

Goose is a local, extensible, open-source AI agent designed to automate complex development tasks. It goes beyond code suggestions, capable of building entire projects, writing and executing code, debugging failures, orchestrating workflows, and interacting with external APIs autonomously. Designed for maximum flexibility, Goose works with any Large Language Model (LLM) and supports multi-model configurations for optimized performance and cost. It seamlessly integrates with MCP servers and is available as both a desktop application and a CLI, making it a powerful AI assistant for developers.

For Advent of AI's Day 5, the challenge was to develop "The Homecoming Board," a gesture-controlled flight arrival display. This innovative solution allows users wearing gloves or mittens to navigate with hand gestures, eliminating the need for physical screen contact in cold environments. The challenge required at least two distinct navigation gestures, real-time flight data, and optionally, audio feedback for gesture recognition.

For those eager to experience the application, it is available at flightboard.nickyt.co.

The Tech Stack

The application was built using TanStack Start (React + TypeScript with Server-Side Rendering), MediaPipe for gesture recognition, and the OpenSky Network API for real-time flight data. This project marked the first foray into computer vision application development, highlighting the accessibility of leveraging tools like MediaPipe. TanStack Start was chosen due to prior experience with a significant project, the Pomerium MCP app demo.

Key Features Implemented

The developed application includes:

  • Real-time hand tracking utilizing MediaPipe's WASM runtime.
  • Support for four distinct gesture types: closed fist, open palm, thumbs up, and thumbs down.
  • Independent gesture detection for both left and right hands.
  • Live flight data from OpenSky Network, enhanced with smart caching via TanStack Query.
  • Audio feedback for each gesture, with an option to mute sounds.
  • A gesture training system that dynamically adapts to individual hand movements.
  • Light and dark winter themes, compliant with WCAG AAA contrast accessibility standards.
  • The ability to select a preferred camera if multiple devices are available.
  • Responsive design, ensuring smooth operation on mobile devices.

Starting with a Product Requirements Document (PRD)

Development commenced with generating a Product Requirements Document (PRD). This approach, adopted for consistency across the challenges, translates challenge specifications into a structured implementation plan.

Hand Tracking with MediaPipe

Integrating MediaPipe into the browser, specifically the WASM version, was chosen for its suitability for Netlify deployment, as the in-browser runtime negates the need for Python backend considerations.

// useMediaPipe.ts - Custom hook for MediaPipe integration
const hands = new Hands({
  locateFile: (file) => `/mediapipe/${file}`,
});
hands.setOptions({
  maxNumHands: 2,
  modelComplexity: 1,
  minDetectionConfidence: 0.7,
  minTrackingConfidence: 0.5,
});

The hand tracking operates at 30-60 FPS, complete with landmark visualization. Both the video feed and landmarks are mirrored for a natural user experience. A specific refinement addressed an issue where MediaPipe detected facial features as hand-like shapes; the logic was updated to render the hand skeleton overlay only when at least one hand is genuinely detected. Exceeding the challenge's requirement of two gestures, four distinct gestures were implemented, along with independent detection for each hand.

Gesture Detection: The Hard Part

Gesture detection proved to be more complex than anticipated. An initial approach based on fixed thresholds for finger curl ratios, calculated by comparing fingertip-to-wrist distance against knuckle-to-wrist distance, showed promise during early development.

// Finger curl ratio: distance(tip, wrist) / distance(knuckle, wrist)
const fingerCurl = (finger: FingerLandmarks) => {
  const tipDist = distance(finger.tip, wrist);
  const knuckleDist = distance(finger.knuckle, wrist);
  return tipDist / knuckleDist;
};

// Gesture classification
const isClosedFist = fingersCurled >= 4 && avgCurl > fistThreshold;
const isOpenPalm = fingersExtended >= 4 && avgCurl < palmThreshold;

However, these fixed thresholds failed under varying lighting conditions and camera distances, demonstrating a classic case of overfitting. The solution was to implement a gesture training mode. Users perform each gesture multiple times, allowing the system to calculate personalized, variance-aware thresholds. This adaptive approach significantly improved the robustness and usability of the gesture detection, proving that adaptive margins are crucial for production-ready ML systems.

Flight Data Integration

Flight data is sourced from OpenSky Network's free API, which requires no authentication. The API provides real-time flight positions, which are then filtered for arrivals near a specified airport using a bounding box.

TanStack Query manages caching and auto-refresh logic:

// useFlightData.ts
const {
  data: flights,
  isLoading,
  error
} = useQuery({
  queryKey: ['flights', 'arrivals'],
  queryFn: fetchFlights,
  // Caching and refetching configuration
  // Cache for 5 minutes in dev, 20 seconds in prod
  staleTime: import.meta.env.DEV ? 300000 : 20000,
  gcTime: 5 * 60 * 1000, // Cache for 5 min
  refetchInterval: 30_000, // Auto-refresh every 30s
  retry: 3, // Exponential backoff
});

During development, OpenSky Network's strict rate limits (minimum 10-second interval) were encountered, leading to an increased staleTime of 5 minutes in development mode. In production, a staleTime of 20 seconds with a 30-second refetchInterval ensures data currency while respecting API limits.

Audio Feedback

Audio feedback for gesture recognition, initially an optional feature, became essential for a polished user experience. Distinct sounds were implemented for each gesture (whoosh for closed fist, chime for open palm, ding for thumbs up, buzz for thumbs down). Sounds are pre-cached and triggered only when a gesture changes, preventing repetitive playback on every frame.

// gestureAudio.ts - Audio caching and playback
const audioCache = new Map<GestureType, HTMLAudioElement>();
export const playGestureSound = (gesture: GestureType) => {
  let audio = audioCache.get(gesture);
  if (!audio) {
    audio = new Audio(GESTURE_SOUNDS[gesture]);
    audio.volume = currentVolume;
    audioCache.set(gesture, audio);
  }
  audio.currentTime = 0; // Reset for quick replay
  audio.play().catch(() => {}); // Ignore autoplay errors
};

A toggle in the settings allows users to enable or disable sound. This feature also offers potential accessibility benefits, providing auditory cues for navigation.

The Flight Detail Modal

A thumbs-up gesture opens a detailed modal displaying comprehensive flight information, including country flag, callsign, position, altitude, speed, heading, and last contact time. For the modal UI, ShadCN components (Dialog and Drawer) were integrated, offering a desktop modal experience and a mobile-friendly slide-up drawer, both with built-in accessibility.

<div className="fixed inset-0 z-50">
  <div className="absolute inset-0 bg-black/70 backdrop-blur-sm z-0" />
  {/* Backdrop */}
  <div className="relative z-10">
    {/* Content on top */}
    {/* Modal content */}
  </div>
</div>

On mobile, the modal dynamically transforms into a drawer component based on window width detection.

Theme System

The application features light and dark winter themes, similar to previous Advent of AI projects. The light mode uses pure white backgrounds with deep ice blue primary colors, achieving a 13:1 contrast ratio for WCAG AAA compliance. The dark mode presents a deep blue-purple night sky with bright glowing blues. Theme preferences are persisted to localStorage and respect system preferences on initial load.

Custom Hooks: Architectural Highlights

The project leverages eight custom React hooks, significantly enhancing component clarity and maintainability:

  • useMediaPipe: Encapsulates MediaPipe initialization, configuration, and cleanup, abstracting WASM loading logic from components.
  • useWebcam: Manages camera access, device selection, permission requests, and persists camera choices to localStorage.
  • useGestures: Houses the core gesture detection logic, receiving hand landmarks from MediaPipe and returning the current gesture type. It incorporates debouncing and variance-aware threshold calculations from training data.
  • useFlightData: Integrates TanStack Query for flight data fetching, handling OpenSky Network API calls, response parsing, and managing cache/refetch intervals.
  • useLocalStorage: A utility hook for synchronizing state with localStorage, used for camera selection, theme preferences, and volume settings.
  • useWindowFocus: Detects when the browser tab loses focus, enabling the camera to pause and conserve battery life and prevent unintended gesture detection.
  • useGestureTraining: Manages the gesture training workflow, collecting finger curl data, calculating mean and standard deviation, and generating personalized thresholds.
  • useAudio: Handles all sound effects, pre-caching audio files, managing playback, and respecting volume settings and mute toggles, ensuring sounds play only when gestures change.

Key Learnings

Developing this computer vision application provided valuable insights:

  • Adaptive Thresholds for Gestures: Gesture recognition requires real-world testing and adaptive thresholds. Fixed thresholds are brittle under varying lighting and distances, underscoring the necessity of a training system to account for natural variance.
  • Window Focus for Battery Optimization: Implementing window focus detection is crucial. MediaPipe's continuous processing in unfocused tabs can drain CPU. Pausing camera processing when the window loses focus conserves battery and prevents accidental gesture detection.

What's Next

The touchless flight tracker is deployed and accessible at flightboard.nickyt.co. Users are encouraged to explore its functionality, including personalized gesture training. The source code is available in the Advent of AI 2025 repository.

While the application demonstrates significant functionality developed within a short timeframe, opportunities for refinement exist. Future improvements could include multi-airport support, flight trajectory visualization, two-handed gestures, and PartyKit integration for multi-user control.

The code for the project is located at: https://github.com/nickytonline/advent-of-ai-2025