清除
共 67 条 · 分类=academic
arXiv cs.AI 1天前 academic en
Key-Value (KV) cache has become a de facto component of modern Large Vision-Language Models (LVLMs) for inference. While it enhances decoding efficien
arXiv cs.AI 1天前 academic en
With the development of deep learning, medical image processing has been widely used to assist clinical research. This paper focuses on the denoising
arXiv cs.AI 1天前 academic en
Background: Patient-facing medical chatbots based on retrieval-augmented generation (RAG) are increasingly promoted to deliver accessible, grounded he
arXiv cs.AI 1天前 academic en
Large language models are increasingly deployed as autonomous coding agents and have achieved remarkably strong performance on software engineering be
arXiv cs.AI 1天前 academic en
While autoregressive Large Vision-Language Models (LVLMs) demonstrate remarkable proficiency in multimodal tasks, they face a "Visual Signal Dilution"
arXiv cs.RO 4天前 academic en
This paper presents an expert-guided active-inference-inspired framework for adaptive UAV swarm trajectory planning. The proposed method converts mult
arXiv cs.RO 4天前 academic en
Learned driving agents often degrade when deployed in unseen environments. This paper studies a deliberately bounded instance of that problem in the C
arXiv cs.RO 4天前 academic en
As autonomous vehicles slowly deploy into urban roads for limited use cases with significant edge case issues, closed facilities like marshaling yards
arXiv cs.RO 4天前 academic en
End-to-end (E2E) autonomous driving presents a promising approach for translating perceptual inputs directly into driving actions. However, prohibitiv
arXiv cs.RO 4天前 academic en
Existing learning-based occupancy prediction methods rely on large-scale 3D annotations and generalize poorly across environments. We present FreeOcc,
arXiv cs.RO 4天前 academic en
This work presents ThermoMesh, a passive thin-film thermoelectric mesh sensor designed to detect and characterize spatio-temporally sparse heat source
arXiv cs.RO 4天前 academic en
The robotic manipulation of Deformable Linear Objects (DLOs) is a fundamental challenge due to the high-dimensional, non-linear dynamics of flexible s
arXiv cs.CV 4天前 academic en
Effective human behavior modeling requires a representation of the human body movement that capitalizes on its compositionality. We propose a hierarch
arXiv cs.CV 4天前 academic en
We introduce AEGIS, A holistic benchmark for Evaluating forensic analysis of AI-Generated academic ImageS. Compared to existing benchmarks, AEGIS feat
arXiv cs.CV 4天前 academic en
Bronchoscopic navigation relies on registering endoscopic video to a preoperative CT scan, but respiratory motion deforms the airway by 5-20 mm, creat
arXiv cs.CV 4天前 academic en
Recent visual generation models have made major progress in photorealism, typography, instruction following, and interactive editing, yet they still s
arXiv cs.CV 4天前 academic en
We show that Fréchet Distance (FD), long considered impractical as a training objective, can in fact be effectively optimized in the representation sp
arXiv cs.CV 4天前 academic en
Vision-Language-Action (VLA) models have increasingly incorporated reasoning mechanisms for complex robotic manipulation. However, existing approaches
arXiv cs.CV 4天前 academic en
Reconstructing 3D scenes from sparse, unposed images remains challenging under real-world conditions with varying illumination and transient occlusion
arXiv cs.CV 4天前 academic en
Human-robot collaboration has been studied primarily in dyadic or sequential settings. However, real homes require multiadic collaboration, where mult
arXiv cs.CV 4天前 academic en
Driving world models serve as a pivotal technology for autonomous driving by simulating environmental dynamics. However, existing approaches predomina
arXiv cs.LG 4天前 academic en
The proliferation of capable and efficient machine learning (ML) models marks one of the strongest methodological shifts in signal processing (SP) in
arXiv cs.LG 4天前 academic en
In this study, we use machine learning to classify and interpolate the phase structure of the Vicsek flocking model across the three-dimensional param
arXiv cs.LG 4天前 academic en
Machine learning (ML) inference serving systems host deep neural network (DNN) models and schedule incoming inference requests across deployed GPUs. H
arXiv cs.LG 4天前 academic en
Machine learning models can learn from data samples to carry out various tasks efficiently. When data samples are adversarially manipulated, such as b
arXiv cs.LG 4天前 academic en
In recent years, physics-informed neural networks (PINNs) have gained significant attention for solving differential equations, although they suffer f
arXiv cs.LG 4天前 academic en
Reinforcement learning (RL) has become essential to the post-training of large language models (LLMs) for reasoning, agentic capabilities and alignmen
arXiv cs.AI 4天前 academic en
Sign languages, of any geographical or accentual variation, understandably face continuous scrutiny under the ever present popularity of verbal dictat
arXiv cs.AI 4天前 academic en
Multi-turn prompt injection follows a known attack path -- trust-building, pivoting, escalation but text-level defenses miss covert attacks where indi
arXiv cs.AI 4天前 academic en
Autonomous agents act through sandboxed containers and microVMs whose state spans filesystems, processes, and runtime artifacts. Checkpoint and restor