资讯站 - intelligence-db

arXiv cs.RO 05-05 academic en

Thinking in Text and Images: Interleaved Vision--Language Reasoning Traces for Long-Horizon Robot Manipulation

Long-horizon robotic manipulation requires plans that are both logically coherent and geometrically grounded. Existing Vision-Language-Action policies

arXiv cs.RO 05-05 academic en

Stereo Multistage Spatial Attention for Real-Time Mobile Manipulation Under Visual Scale Variation and Disturbances

Robots operating in open, unstructured real-world environments must rely on onboard visual perception while autonomously moving across different locat

arXiv cs.RO 05-05 academic en

MSACT: Multistage Spatial Alignment for Stable Low-Latency Fine Manipulation

Real-world fine manipulation, particularly in bimanual manipulation, typically requires low-latency control and stable visual localization, while coll

arXiv cs.RO 05-05 academic en

High-Speed Vision Improves Zero-Shot Semantic Understanding of Human Actions

Understanding human actions from visual observations is essential for human--robot interaction, particularly when semantic interpretation of unfamilia

arXiv cs.RO 05-05 academic en

Tempus: A Temporally Scalable Resource-Invariant GEMM Streaming Framework for Versal AI Edge

Scaling laws for Large Language Models (LLMs) establish that model quality improves with computational scale, yet edge deployment imposes strict const

arXiv cs.RO 05-05 academic en

Linking Behaviour and Perception to Evaluate Meaningful Human Control over Partially Automated Driving

Partial driving automation creates a tension: drivers remain legally responsible for vehicle behaviour, yet their active control is significantly redu

arXiv cs.RO 05-05 academic en

Robust Fusion of Object-Level V2X for Learned 3D Object Detection

Perception for automated driving is largely based on onboard environmental sensors, such as cameras and radar, which are cost-effective but limited by

arXiv cs.RO 05-05 academic en

Recovering Hidden Reward in Diffusion-Based Policies

This paper introduces EnergyFlow, a framework that unifies generative action modeling with inverse reinforcement learning by parameterizing a scalar e

arXiv cs.RO 05-05 academic en

Paired-CSLiDAR: Height-Stratified Registration for Cross-Source Aerial-Ground LiDAR Pose Refinement

We introduce Paired-CSLiDAR (CSLiDAR), a cross-source aerial-ground LiDAR benchmark for single-scan pose refinement: refining a ground-scan pose withi

arXiv cs.RO 05-05 academic en

Affordance Agent Harness: Verification-Gated Skill Orchestration

Affordance grounding requires identifying where and how an agent should interact in open-world scenes, where actionable regions are often small, occlu

arXiv cs.CV 05-05 academic en

Exploring the Limits of End-to-End Feature-Affinity Propagation for Single-Point Supervised Infrared Small Target Detection

Single-point supervised infrared small target detection (IRSTD) drastically reduces dense annotation costs. Current state-of-the-art (SOTA) methods ac

arXiv cs.CV 05-05 academic en

Quantum Gradient-Based Approach for Edge and Corner Detection Using Sobel Kernels

Edge detection refers to identifying points in a digital image where intensity changes sharply, indicating object boundaries or structural features. C

arXiv cs.CV 05-05 academic en

Modeling Subjective Urban Perception with Human Gaze

Urban perception describes how people subjectively evaluate urban environments, shaping how cities are experienced and understood. Existing computatio

arXiv cs.CV 05-05 academic en

Map2World: Segment Map Conditioned Text to 3D World Generation

3D world generation is essential for applications such as immersive content creation or autonomous driving simulation. Recent advances in 3D world gen

arXiv cs.CV 05-05 academic en

GMGaze: MoE-Based Context-Aware Gaze Estimation with CLIP and Multiscale Transformer

Gaze estimation methods commonly use facial appearances to predict the direction of a person gaze. However, previous studies show three major challeng

arXiv cs.CV 05-05 academic en

Let ViT Speak: Generative Language-Image Pre-training

In this paper, we present \textbf{Gen}erative \textbf{L}anguage-\textbf{I}mage \textbf{P}re-training (GenLIP), a minimalist generative pretraining fra

arXiv cs.CV 05-05 academic en

Posterior Augmented Flow Matching

Flow matching (FM) trains a time-dependent vector field that transports samples from a simple prior to a complex data distribution. However, for high-

arXiv cs.LG 05-05 academic en

NonZero: Interaction-Guided Exploration for Multi-Agent Monte Carlo Tree Search

Monte Carlo Tree Search (MCTS) scales poorly in cooperative multi-agent domains because expansion must consider an exponentially large set of joint ac

arXiv cs.LG 05-05 academic en

Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoring

Reward models (RMs) have become an indispensable fixture of the language model (LM) post-training playbook, enabling policy alignment and test-time sc

arXiv cs.LG 05-05 academic en

Learning the Helmholtz equation operator with DeepONet for non-parametric 2D geometries

This paper deals with solving the 2D Helmholtz equation on non-parametric domains, leveraging a physics-informed neural operator network based on the

arXiv cs.LG 05-05 academic en

Observable Performance Does Not Fully Reflect System Organization: A Multi-Level Analysis of Gait Dynamics Under Occlusal Constraint

In biomechanical systems, observable performance is often used as a proxy for underlying system organization. However, this assumption implicitly pres

arXiv cs.LG 05-05 academic en

SAVGO: Learning State-Action Value Geometry with Cosine Similarity for Continuous Control

While representation and similarity learning have improved the sample efficiency of Reinforcement Learning (RL), they are rarely used to shape policy

arXiv cs.LG 05-05 academic en

RunAgent: Interpreting Natural-Language Plans with Constraint-Guided Execution

Humans solve problems by executing targeted plans, yet large language models (LLMs) remain unreliable for structured workflow execution. We propose Ru

arXiv cs.LG 05-05 academic en

Generating Statistical Charts with Validation-Driven LLM Workflows

Generating diverse, readable statistical charts from tabular data remains challenging for LLMs, as many failures become apparent after rendering and a

arXiv cs.LG 05-05 academic en

HyCOP: Hybrid Composition Operators for Interpretable Learning of PDEs

We introduce HyCOP, a modular framework that learns parametric PDE solution operators by composing simple modules (advection, diffusion, learned closu

arXiv cs.AI 05-05 academic en

To Call or Not to Call: A Framework to Assess and Optimize LLM Tool Calling

Agentic AI architectures augment LLMs with external tools, unlocking strong capabilities. However, tool use is not always beneficial; some calls may b

arXiv cs.AI 05-05 academic en

Position: agentic AI orchestration should be Bayes-consistent

LLMs excel at predictive tasks and complex reasoning tasks, but many high-value deployments rely on decisions under uncertainty, for example, which to

arXiv cs.AI 05-05 academic en

Meritocratic Fairness in Budgeted Combinatorial Multi-armed Bandits via Shapley Values

We propose a new framework for meritocratic fairness in budgeted combinatorial multi-armed bandits with full-bandit feedback (BCMAB-FBF). Unlike semi-

arXiv cs.AI 05-05 academic en

Directed Social Regard: Surfacing Targeted Advocacy, Opposition, Aid, Harms, and Victimization in Online Media

The language in online platforms, influence operations, and political rhetoric frequently directs a mix of pro-social sentiment (e.g., advocacy, helpf

arXiv cs.AI 05-05 academic en

GeoContra: From Fluent GIS Code to Verifiable Spatial Analysis with Geography-Grounded Repair

Reliable spatial analysis in GIScience requires preserving coordinate semantics, topology, units, and geographic plausibility. Current LLM-based GIS s