共 67 条 · 分类=academic
Key-Value (KV) cache has become a de facto component of modern Large Vision-Language Models (LVLMs) for inference. While it enhances decoding efficien
With the development of deep learning, medical image processing has been widely used to assist clinical research. This paper focuses on the denoising
Background: Patient-facing medical chatbots based on retrieval-augmented generation (RAG) are increasingly promoted to deliver accessible, grounded he
Large language models are increasingly deployed as autonomous coding agents and have achieved remarkably strong performance on software engineering be
While autoregressive Large Vision-Language Models (LVLMs) demonstrate remarkable proficiency in multimodal tasks, they face a "Visual Signal Dilution"
This paper presents an expert-guided active-inference-inspired framework for adaptive UAV swarm trajectory planning. The proposed method converts mult
Learned driving agents often degrade when deployed in unseen environments. This paper studies a deliberately bounded instance of that problem in the C
As autonomous vehicles slowly deploy into urban roads for limited use cases with significant edge case issues, closed facilities like marshaling yards
End-to-end (E2E) autonomous driving presents a promising approach for translating perceptual inputs directly into driving actions. However, prohibitiv
Existing learning-based occupancy prediction methods rely on large-scale 3D annotations and generalize poorly across environments. We present FreeOcc,
This work presents ThermoMesh, a passive thin-film thermoelectric mesh sensor designed to detect and characterize spatio-temporally sparse heat source
The robotic manipulation of Deformable Linear Objects (DLOs) is a fundamental challenge due to the high-dimensional, non-linear dynamics of flexible s
Effective human behavior modeling requires a representation of the human body movement that capitalizes on its compositionality. We propose a hierarch
We introduce AEGIS, A holistic benchmark for Evaluating forensic analysis of AI-Generated academic ImageS. Compared to existing benchmarks, AEGIS feat
Bronchoscopic navigation relies on registering endoscopic video to a preoperative CT scan, but respiratory motion deforms the airway by 5-20 mm, creat
Recent visual generation models have made major progress in photorealism, typography, instruction following, and interactive editing, yet they still s
We show that Fréchet Distance (FD), long considered impractical as a training objective, can in fact be effectively optimized in the representation sp
Vision-Language-Action (VLA) models have increasingly incorporated reasoning mechanisms for complex robotic manipulation. However, existing approaches
Reconstructing 3D scenes from sparse, unposed images remains challenging under real-world conditions with varying illumination and transient occlusion
Human-robot collaboration has been studied primarily in dyadic or sequential settings. However, real homes require multiadic collaboration, where mult
Driving world models serve as a pivotal technology for autonomous driving by simulating environmental dynamics. However, existing approaches predomina
The proliferation of capable and efficient machine learning (ML) models marks one of the strongest methodological shifts in signal processing (SP) in
In this study, we use machine learning to classify and interpolate the phase structure of the Vicsek flocking model across the three-dimensional param
Machine learning (ML) inference serving systems host deep neural network (DNN) models and schedule incoming inference requests across deployed GPUs. H
Machine learning models can learn from data samples to carry out various tasks efficiently. When data samples are adversarially manipulated, such as b
In recent years, physics-informed neural networks (PINNs) have gained significant attention for solving differential equations, although they suffer f
Reinforcement learning (RL) has become essential to the post-training of large language models (LLMs) for reasoning, agentic capabilities and alignmen
Sign languages, of any geographical or accentual variation, understandably face continuous scrutiny under the ever present popularity of verbal dictat
Multi-turn prompt injection follows a known attack path -- trust-building, pivoting, escalation but text-level defenses miss covert attacks where indi
Autonomous agents act through sandboxed containers and microVMs whose state spans filesystems, processes, and runtime artifacts. Checkpoint and restor