清除
共 67 条 · 分类=academic
arXiv cs.RO 05-05 academic en
Long-horizon robotic manipulation requires plans that are both logically coherent and geometrically grounded. Existing Vision-Language-Action policies
arXiv cs.RO 05-05 academic en
Robots operating in open, unstructured real-world environments must rely on onboard visual perception while autonomously moving across different locat
arXiv cs.RO 05-05 academic en
Real-world fine manipulation, particularly in bimanual manipulation, typically requires low-latency control and stable visual localization, while coll
arXiv cs.RO 05-05 academic en
Understanding human actions from visual observations is essential for human--robot interaction, particularly when semantic interpretation of unfamilia
arXiv cs.RO 05-05 academic en
Scaling laws for Large Language Models (LLMs) establish that model quality improves with computational scale, yet edge deployment imposes strict const
arXiv cs.RO 05-05 academic en
Partial driving automation creates a tension: drivers remain legally responsible for vehicle behaviour, yet their active control is significantly redu
arXiv cs.RO 05-05 academic en
Perception for automated driving is largely based on onboard environmental sensors, such as cameras and radar, which are cost-effective but limited by
arXiv cs.RO 05-05 academic en
This paper introduces EnergyFlow, a framework that unifies generative action modeling with inverse reinforcement learning by parameterizing a scalar e
arXiv cs.RO 05-05 academic en
We introduce Paired-CSLiDAR (CSLiDAR), a cross-source aerial-ground LiDAR benchmark for single-scan pose refinement: refining a ground-scan pose withi
arXiv cs.RO 05-05 academic en
Affordance grounding requires identifying where and how an agent should interact in open-world scenes, where actionable regions are often small, occlu
arXiv cs.CV 05-05 academic en
Single-point supervised infrared small target detection (IRSTD) drastically reduces dense annotation costs. Current state-of-the-art (SOTA) methods ac
arXiv cs.CV 05-05 academic en
Edge detection refers to identifying points in a digital image where intensity changes sharply, indicating object boundaries or structural features. C
arXiv cs.CV 05-05 academic en
Urban perception describes how people subjectively evaluate urban environments, shaping how cities are experienced and understood. Existing computatio
arXiv cs.CV 05-05 academic en
3D world generation is essential for applications such as immersive content creation or autonomous driving simulation. Recent advances in 3D world gen
arXiv cs.CV 05-05 academic en
Gaze estimation methods commonly use facial appearances to predict the direction of a person gaze. However, previous studies show three major challeng
arXiv cs.CV 05-05 academic en
In this paper, we present \textbf{Gen}erative \textbf{L}anguage-\textbf{I}mage \textbf{P}re-training (GenLIP), a minimalist generative pretraining fra
arXiv cs.CV 05-05 academic en
Flow matching (FM) trains a time-dependent vector field that transports samples from a simple prior to a complex data distribution. However, for high-
arXiv cs.LG 05-05 academic en
Monte Carlo Tree Search (MCTS) scales poorly in cooperative multi-agent domains because expansion must consider an exponentially large set of joint ac
arXiv cs.LG 05-05 academic en
Reward models (RMs) have become an indispensable fixture of the language model (LM) post-training playbook, enabling policy alignment and test-time sc
arXiv cs.LG 05-05 academic en
This paper deals with solving the 2D Helmholtz equation on non-parametric domains, leveraging a physics-informed neural operator network based on the
arXiv cs.LG 05-05 academic en
In biomechanical systems, observable performance is often used as a proxy for underlying system organization. However, this assumption implicitly pres
arXiv cs.LG 05-05 academic en
While representation and similarity learning have improved the sample efficiency of Reinforcement Learning (RL), they are rarely used to shape policy
arXiv cs.LG 05-05 academic en
Humans solve problems by executing targeted plans, yet large language models (LLMs) remain unreliable for structured workflow execution. We propose Ru
arXiv cs.LG 05-05 academic en
Generating diverse, readable statistical charts from tabular data remains challenging for LLMs, as many failures become apparent after rendering and a
arXiv cs.LG 05-05 academic en
We introduce HyCOP, a modular framework that learns parametric PDE solution operators by composing simple modules (advection, diffusion, learned closu
arXiv cs.AI 05-05 academic en
Agentic AI architectures augment LLMs with external tools, unlocking strong capabilities. However, tool use is not always beneficial; some calls may b
arXiv cs.AI 05-05 academic en
LLMs excel at predictive tasks and complex reasoning tasks, but many high-value deployments rely on decisions under uncertainty, for example, which to
arXiv cs.AI 05-05 academic en
We propose a new framework for meritocratic fairness in budgeted combinatorial multi-armed bandits with full-bandit feedback (BCMAB-FBF). Unlike semi-
arXiv cs.AI 05-05 academic en
The language in online platforms, influence operations, and political rhetoric frequently directs a mix of pro-social sentiment (e.g., advocacy, helpf
arXiv cs.AI 05-05 academic en
Reliable spatial analysis in GIScience requires preserving coordinate semantics, topology, units, and geographic plausibility. Current LLM-based GIS s