清除
共 67 条 · 分类=academic
arXiv cs.RO 1天前 academic en
Long-horizon robotic manipulation requires plans that are both logically coherent and geometrically grounded. Existing Vision-Language-Action policies
arXiv cs.RO 1天前 academic en
Robots operating in open, unstructured real-world environments must rely on onboard visual perception while autonomously moving across different locat
arXiv cs.RO 1天前 academic en
Real-world fine manipulation, particularly in bimanual manipulation, typically requires low-latency control and stable visual localization, while coll
arXiv cs.RO 1天前 academic en
Understanding human actions from visual observations is essential for human--robot interaction, particularly when semantic interpretation of unfamilia
arXiv cs.RO 1天前 academic en
Scaling laws for Large Language Models (LLMs) establish that model quality improves with computational scale, yet edge deployment imposes strict const
arXiv cs.RO 1天前 academic en
Partial driving automation creates a tension: drivers remain legally responsible for vehicle behaviour, yet their active control is significantly redu
arXiv cs.RO 1天前 academic en
Perception for automated driving is largely based on onboard environmental sensors, such as cameras and radar, which are cost-effective but limited by
arXiv cs.RO 1天前 academic en
This paper introduces EnergyFlow, a framework that unifies generative action modeling with inverse reinforcement learning by parameterizing a scalar e
arXiv cs.RO 1天前 academic en
We introduce Paired-CSLiDAR (CSLiDAR), a cross-source aerial-ground LiDAR benchmark for single-scan pose refinement: refining a ground-scan pose withi
arXiv cs.RO 1天前 academic en
Affordance grounding requires identifying where and how an agent should interact in open-world scenes, where actionable regions are often small, occlu
arXiv cs.CV 1天前 academic en
Single-point supervised infrared small target detection (IRSTD) drastically reduces dense annotation costs. Current state-of-the-art (SOTA) methods ac
arXiv cs.CV 1天前 academic en
Edge detection refers to identifying points in a digital image where intensity changes sharply, indicating object boundaries or structural features. C
arXiv cs.CV 1天前 academic en
Urban perception describes how people subjectively evaluate urban environments, shaping how cities are experienced and understood. Existing computatio
arXiv cs.CV 1天前 academic en
3D world generation is essential for applications such as immersive content creation or autonomous driving simulation. Recent advances in 3D world gen
arXiv cs.CV 1天前 academic en
Gaze estimation methods commonly use facial appearances to predict the direction of a person gaze. However, previous studies show three major challeng
arXiv cs.CV 1天前 academic en
In this paper, we present \textbf{Gen}erative \textbf{L}anguage-\textbf{I}mage \textbf{P}re-training (GenLIP), a minimalist generative pretraining fra
arXiv cs.CV 1天前 academic en
Flow matching (FM) trains a time-dependent vector field that transports samples from a simple prior to a complex data distribution. However, for high-
arXiv cs.LG 1天前 academic en
Monte Carlo Tree Search (MCTS) scales poorly in cooperative multi-agent domains because expansion must consider an exponentially large set of joint ac
arXiv cs.LG 1天前 academic en
Reward models (RMs) have become an indispensable fixture of the language model (LM) post-training playbook, enabling policy alignment and test-time sc
arXiv cs.LG 1天前 academic en
This paper deals with solving the 2D Helmholtz equation on non-parametric domains, leveraging a physics-informed neural operator network based on the
arXiv cs.LG 1天前 academic en
In biomechanical systems, observable performance is often used as a proxy for underlying system organization. However, this assumption implicitly pres
arXiv cs.LG 1天前 academic en
While representation and similarity learning have improved the sample efficiency of Reinforcement Learning (RL), they are rarely used to shape policy
arXiv cs.LG 1天前 academic en
Humans solve problems by executing targeted plans, yet large language models (LLMs) remain unreliable for structured workflow execution. We propose Ru
arXiv cs.LG 1天前 academic en
Generating diverse, readable statistical charts from tabular data remains challenging for LLMs, as many failures become apparent after rendering and a
arXiv cs.LG 1天前 academic en
We introduce HyCOP, a modular framework that learns parametric PDE solution operators by composing simple modules (advection, diffusion, learned closu
arXiv cs.AI 1天前 academic en
Agentic AI architectures augment LLMs with external tools, unlocking strong capabilities. However, tool use is not always beneficial; some calls may b
arXiv cs.AI 1天前 academic en
LLMs excel at predictive tasks and complex reasoning tasks, but many high-value deployments rely on decisions under uncertainty, for example, which to
arXiv cs.AI 1天前 academic en
We propose a new framework for meritocratic fairness in budgeted combinatorial multi-armed bandits with full-bandit feedback (BCMAB-FBF). Unlike semi-
arXiv cs.AI 1天前 academic en
The language in online platforms, influence operations, and political rhetoric frequently directs a mix of pro-social sentiment (e.g., advocacy, helpf
arXiv cs.AI 1天前 academic en
Reliable spatial analysis in GIScience requires preserving coordinate semantics, topology, units, and geographic plausibility. Current LLM-based GIS s