However, the next stage of AR extends beyond visual guidance. Researchers are now developing systems that integrate multiple sources of information simultaneously, a concept known as multi-modal information fusion. These systems combine intraoperative video with radiological imaging, procedural knowledge, and patient-specific clinical data to provide context-aware surgical guidance. Powered by advanced vision-language models (VLMs), the technology allows AI to interpret both visual scenes and medical language. Instead of merely recognizing structures, such systems can answer procedural questions, identify surgical phases, and highlight potential risks during an operation.
One practical implementation discussed at the congress is the Surgical RARP Copilot, a real-time AI assistant designed for robot-assisted radical prostatectomy. Unlike earlier “narrow” algorithms trained for single tasks, the copilot is built on a foundation-model architecture trained on extensive surgical video, anatomical data, and medical literature. By analyzing the live operative feed, the system can recognize instruments, monitor surgical progress, and provide contextual insights during the procedure. Experts suggest that such AI copilots could function as a “cognitive safety net,” helping standardize surgical performance, accelerate training, and convert surgical data into measurable improvements in patient outcomes and healthcare efficiency