清除
共 67 条 · 分类=academic
arXiv cs.AI 05-05 academic en
Key-Value (KV) cache has become a de facto component of modern Large Vision-Language Models (LVLMs) for inference. While it enhances decoding efficien
arXiv cs.AI 05-05 academic en
With the development of deep learning, medical image processing has been widely used to assist clinical research. This paper focuses on the denoising
arXiv cs.AI 05-05 academic en
Background: Patient-facing medical chatbots based on retrieval-augmented generation (RAG) are increasingly promoted to deliver accessible, grounded he
arXiv cs.AI 05-05 academic en
Large language models are increasingly deployed as autonomous coding agents and have achieved remarkably strong performance on software engineering be
arXiv cs.AI 05-05 academic en
While autoregressive Large Vision-Language Models (LVLMs) demonstrate remarkable proficiency in multimodal tasks, they face a "Visual Signal Dilution"
arXiv cs.RO 05-02 academic en
This paper presents an expert-guided active-inference-inspired framework for adaptive UAV swarm trajectory planning. The proposed method converts mult
arXiv cs.RO 05-02 academic en
Learned driving agents often degrade when deployed in unseen environments. This paper studies a deliberately bounded instance of that problem in the C
arXiv cs.RO 05-02 academic en
As autonomous vehicles slowly deploy into urban roads for limited use cases with significant edge case issues, closed facilities like marshaling yards
arXiv cs.RO 05-02 academic en
End-to-end (E2E) autonomous driving presents a promising approach for translating perceptual inputs directly into driving actions. However, prohibitiv
arXiv cs.RO 05-02 academic en
Existing learning-based occupancy prediction methods rely on large-scale 3D annotations and generalize poorly across environments. We present FreeOcc,
arXiv cs.RO 05-02 academic en
This work presents ThermoMesh, a passive thin-film thermoelectric mesh sensor designed to detect and characterize spatio-temporally sparse heat source
arXiv cs.RO 05-02 academic en
The robotic manipulation of Deformable Linear Objects (DLOs) is a fundamental challenge due to the high-dimensional, non-linear dynamics of flexible s
arXiv cs.CV 05-02 academic en
Effective human behavior modeling requires a representation of the human body movement that capitalizes on its compositionality. We propose a hierarch
arXiv cs.CV 05-02 academic en
We introduce AEGIS, A holistic benchmark for Evaluating forensic analysis of AI-Generated academic ImageS. Compared to existing benchmarks, AEGIS feat
arXiv cs.CV 05-02 academic en
Bronchoscopic navigation relies on registering endoscopic video to a preoperative CT scan, but respiratory motion deforms the airway by 5-20 mm, creat
arXiv cs.CV 05-02 academic en
Recent visual generation models have made major progress in photorealism, typography, instruction following, and interactive editing, yet they still s
arXiv cs.CV 05-02 academic en
We show that Fréchet Distance (FD), long considered impractical as a training objective, can in fact be effectively optimized in the representation sp
arXiv cs.CV 05-02 academic en
Vision-Language-Action (VLA) models have increasingly incorporated reasoning mechanisms for complex robotic manipulation. However, existing approaches
arXiv cs.CV 05-02 academic en
Reconstructing 3D scenes from sparse, unposed images remains challenging under real-world conditions with varying illumination and transient occlusion
arXiv cs.CV 05-02 academic en
Human-robot collaboration has been studied primarily in dyadic or sequential settings. However, real homes require multiadic collaboration, where mult
arXiv cs.CV 05-02 academic en
Driving world models serve as a pivotal technology for autonomous driving by simulating environmental dynamics. However, existing approaches predomina
arXiv cs.LG 05-02 academic en
The proliferation of capable and efficient machine learning (ML) models marks one of the strongest methodological shifts in signal processing (SP) in
arXiv cs.LG 05-02 academic en
In this study, we use machine learning to classify and interpolate the phase structure of the Vicsek flocking model across the three-dimensional param
arXiv cs.LG 05-02 academic en
Machine learning (ML) inference serving systems host deep neural network (DNN) models and schedule incoming inference requests across deployed GPUs. H
arXiv cs.LG 05-02 academic en
Machine learning models can learn from data samples to carry out various tasks efficiently. When data samples are adversarially manipulated, such as b
arXiv cs.LG 05-02 academic en
In recent years, physics-informed neural networks (PINNs) have gained significant attention for solving differential equations, although they suffer f
arXiv cs.LG 05-02 academic en
Reinforcement learning (RL) has become essential to the post-training of large language models (LLMs) for reasoning, agentic capabilities and alignmen
arXiv cs.AI 05-02 academic en
Sign languages, of any geographical or accentual variation, understandably face continuous scrutiny under the ever present popularity of verbal dictat
arXiv cs.AI 05-02 academic en
Multi-turn prompt injection follows a known attack path -- trust-building, pivoting, escalation but text-level defenses miss covert attacks where indi
arXiv cs.AI 05-02 academic en
Autonomous agents act through sandboxed containers and microVMs whose state spans filesystems, processes, and runtime artifacts. Checkpoint and restor