资讯站 - intelligence-db

arXiv cs.AI 05-05 academic en

Make Your LVLM KV Cache More Lightweight

Key-Value (KV) cache has become a de facto component of modern Large Vision-Language Models (LVLMs) for inference. While it enhances decoding efficien

arXiv cs.AI 05-05 academic en

Unsupervised Denoising of Real Clinical Low Dose Liver CT with Perceptual Attention Networks

With the development of deep learning, medical image processing has been widely used to assist clinical research. This paper focuses on the denoising

arXiv cs.AI 05-05 academic en

When RAG Chatbots Expose Their Backend: An Anonymized Case Study of Privacy and Security Risks in Patient-Facing Medical AI

Background: Patient-facing medical chatbots based on retrieval-augmented generation (RAG) are increasingly promoted to deliver accessible, grounded he

arXiv cs.AI 05-05 academic en

Can Coding Agents Reproduce Findings in Computational Materials Science?

Large language models are increasingly deployed as autonomous coding agents and have achieved remarkably strong performance on software engineering be

arXiv cs.AI 05-05 academic en

Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs

While autoregressive Large Vision-Language Models (LVLMs) demonstrate remarkable proficiency in multimodal tasks, they face a "Visual Signal Dilution"

arXiv cs.RO 05-02 academic en

Flying by Inference: Active Inference World Models for Adaptive UAV Swarms

This paper presents an expert-guided active-inference-inspired framework for adaptive UAV swarm trajectory planning. The proposed method converts mult

arXiv cs.RO 05-02 academic en

Dreaming Across Towns: Semantic Rollout and Town-Adversarial Regularization for Zero-Shot Held-Out-Town Fixed-Route Driving in CARLA

Learned driving agents often degrade when deployed in unseen environments. This paper studies a deliberately bounded instance of that problem in the C

arXiv cs.RO 05-02 academic en

Framework for Collaborative Operation of Autonomous Delivery Vehicles Within a Marshaling Yard

As autonomous vehicles slowly deploy into urban roads for limited use cases with significant edge case issues, closed facilities like marshaling yards

arXiv cs.RO 05-02 academic en

GSDrive: Reinforcing Driving Policies by Multi-mode Trajectory Probing with 3D Gaussian Splatting Environment

End-to-end (E2E) autonomous driving presents a promising approach for translating perceptual inputs directly into driving actions. However, prohibitiv

arXiv cs.RO 05-02 academic en

FreeOcc: Training-Free Embodied Open-Vocabulary Occupancy Prediction

Existing learning-based occupancy prediction methods rely on large-scale 3D annotations and generalize poorly across environments. We present FreeOcc,

arXiv cs.RO 05-02 academic en

Design and Characteristics of a Thin-Film ThermoMesh for the Efficient Embedded Sensing of a Spatio-Temporally Sparse Heat Source

This work presents ThermoMesh, a passive thin-film thermoelectric mesh sensor designed to detect and characterize spatio-temporally sparse heat source

arXiv cs.RO 05-02 academic en

RopeDreamer: A Kinematic Recurrent State Space Model for Dynamics of Flexible Deformable Linear Objects

The robotic manipulation of Deformable Linear Objects (DLOs) is a fundamental challenge due to the high-dimensional, non-linear dynamics of flexible s

arXiv cs.CV 05-02 academic en

Action Motifs: Self-Supervised Hierarchical Representation of Human Body Movements

Effective human behavior modeling requires a representation of the human body movement that capitalizes on its compositionality. We propose a hierarch

arXiv cs.CV 05-02 academic en

AEGIS: A Holistic Benchmark for Evaluating Forensic Analysis of AI-Generated Academic Images

We introduce AEGIS, A holistic benchmark for Evaluating forensic analysis of AI-Generated academic ImageS. Compared to existing benchmarks, AEGIS feat

arXiv cs.CV 05-02 academic en

Stop Holding Your Breath: CT-Informed Gaussian Splatting for Dynamic Bronchoscopy

Bronchoscopic navigation relies on registering endoscopic video to a preoperative CT scan, but respiratory motion deforms the airway by 5-20 mm, creat

arXiv cs.CV 05-02 academic en

Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling

Recent visual generation models have made major progress in photorealism, typography, instruction following, and interactive editing, yet they still s

arXiv cs.CV 05-02 academic en

Representation Fréchet Loss for Visual Generation

We show that Fréchet Distance (FD), long considered impractical as a training objective, can in fact be effectively optimized in the representation sp

arXiv cs.CV 05-02 academic en

LaST-R1: Reinforcing Action via Adaptive Physical Latent Reasoning for VLA Models

Vision-Language-Action (VLA) models have increasingly incorporated reasoning mechanisms for complex robotic manipulation. However, existing approaches

arXiv cs.CV 05-02 academic en

Generalizable Sparse-View 3D Reconstruction from Unconstrained Images

Reconstructing 3D scenes from sparse, unposed images remains challenging under real-world conditions with varying illumination and transient occlusion

arXiv cs.CV 05-02 academic en

OmniRobotHome: A Multi-Camera Platform for Real-Time Multiadic Human-Robot Interaction

Human-robot collaboration has been studied primarily in dyadic or sequential settings. However, real homes require multiadic collaboration, where mult

arXiv cs.CV 05-02 academic en

HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation

Driving world models serve as a pivotal technology for autonomous driving by simulating environmental dynamics. However, existing approaches predomina

arXiv cs.LG 05-02 academic en

Sequential Inference for Gaussian Processes: A Signal Processing Perspective

The proliferation of capable and efficient machine learning (ML) models marks one of the strongest methodological shifts in signal processing (SP) in

arXiv cs.LG 05-02 academic en

Mapping the Phase Diagram of the Vicsek Model with Machine Learning

In this study, we use machine learning to classify and interpolate the phase structure of the Vicsek flocking model across the three-dimensional param

arXiv cs.LG 05-02 academic en

Strait: Perceiving Priority and Interference in ML Inference Serving

Machine learning (ML) inference serving systems host deep neural network (DNN) models and schedule incoming inference requests across deployed GPUs. H

arXiv cs.LG 05-02 academic en

Defending Quantum Classifiers against Adversarial Perturbations through Quantum Autoencoders

Machine learning models can learn from data samples to carry out various tasks efficiently. When data samples are adversarially manipulated, such as b

arXiv cs.LG 05-02 academic en

An adaptive wavelet-based PINN for problems with localized high-magnitude source

In recent years, physics-informed neural networks (PINNs) have gained significant attention for solving differential equations, although they suffer f

arXiv cs.LG 05-02 academic en

Exploration Hacking: Can LLMs Learn to Resist RL Training?

Reinforcement learning (RL) has become essential to the post-training of large language models (LLMs) for reasoning, agentic capabilities and alignmen

arXiv cs.AI 05-02 academic en

Normativity and Productivism: Ableist Intelligence? A Degrowth Analysis of AI Sign Language Translation Tools for Deaf People

Sign languages, of any geographical or accentual variation, understandably face continuous scrutiny under the ever present popularity of verbal dictat

arXiv cs.AI 05-02 academic en

Latent Adversarial Detection: Adaptive Probing of LLM Activations for Multi-Turn Attack Detection

Multi-turn prompt injection follows a known attack path -- trust-building, pivoting, escalation but text-level defenses miss covert attacks where indi

arXiv cs.AI 05-02 academic en

Crab: A Semantics-Aware Checkpoint/Restore Runtime for Agent Sandboxes

Autonomous agents act through sandboxed containers and microVMs whose state spans filesystems, processes, and runtime artifacts. Checkpoint and restor