The Multimodal AI Guide: Vision, Voice, Text, and Beyond