共 67 条 · 分类=academic
Long-horizon robotic manipulation requires plans that are both logically coherent and geometrically grounded. Existing Vision-Language-Action policies
Robots operating in open, unstructured real-world environments must rely on onboard visual perception while autonomously moving across different locat
Real-world fine manipulation, particularly in bimanual manipulation, typically requires low-latency control and stable visual localization, while coll
Understanding human actions from visual observations is essential for human--robot interaction, particularly when semantic interpretation of unfamilia
Scaling laws for Large Language Models (LLMs) establish that model quality improves with computational scale, yet edge deployment imposes strict const
Partial driving automation creates a tension: drivers remain legally responsible for vehicle behaviour, yet their active control is significantly redu
Perception for automated driving is largely based on onboard environmental sensors, such as cameras and radar, which are cost-effective but limited by
This paper introduces EnergyFlow, a framework that unifies generative action modeling with inverse reinforcement learning by parameterizing a scalar e
We introduce Paired-CSLiDAR (CSLiDAR), a cross-source aerial-ground LiDAR benchmark for single-scan pose refinement: refining a ground-scan pose withi
Affordance grounding requires identifying where and how an agent should interact in open-world scenes, where actionable regions are often small, occlu
Single-point supervised infrared small target detection (IRSTD) drastically reduces dense annotation costs. Current state-of-the-art (SOTA) methods ac
Edge detection refers to identifying points in a digital image where intensity changes sharply, indicating object boundaries or structural features. C
Urban perception describes how people subjectively evaluate urban environments, shaping how cities are experienced and understood. Existing computatio
3D world generation is essential for applications such as immersive content creation or autonomous driving simulation. Recent advances in 3D world gen
Gaze estimation methods commonly use facial appearances to predict the direction of a person gaze. However, previous studies show three major challeng
In this paper, we present \textbf{Gen}erative \textbf{L}anguage-\textbf{I}mage \textbf{P}re-training (GenLIP), a minimalist generative pretraining fra
Flow matching (FM) trains a time-dependent vector field that transports samples from a simple prior to a complex data distribution. However, for high-
Monte Carlo Tree Search (MCTS) scales poorly in cooperative multi-agent domains because expansion must consider an exponentially large set of joint ac
Reward models (RMs) have become an indispensable fixture of the language model (LM) post-training playbook, enabling policy alignment and test-time sc
This paper deals with solving the 2D Helmholtz equation on non-parametric domains, leveraging a physics-informed neural operator network based on the
In biomechanical systems, observable performance is often used as a proxy for underlying system organization. However, this assumption implicitly pres
While representation and similarity learning have improved the sample efficiency of Reinforcement Learning (RL), they are rarely used to shape policy
Humans solve problems by executing targeted plans, yet large language models (LLMs) remain unreliable for structured workflow execution. We propose Ru
Generating diverse, readable statistical charts from tabular data remains challenging for LLMs, as many failures become apparent after rendering and a
We introduce HyCOP, a modular framework that learns parametric PDE solution operators by composing simple modules (advection, diffusion, learned closu
Agentic AI architectures augment LLMs with external tools, unlocking strong capabilities. However, tool use is not always beneficial; some calls may b
LLMs excel at predictive tasks and complex reasoning tasks, but many high-value deployments rely on decisions under uncertainty, for example, which to
We propose a new framework for meritocratic fairness in budgeted combinatorial multi-armed bandits with full-bandit feedback (BCMAB-FBF). Unlike semi-
The language in online platforms, influence operations, and political rhetoric frequently directs a mix of pro-social sentiment (e.g., advocacy, helpf
Reliable spatial analysis in GIScience requires preserving coordinate semantics, topology, units, and geographic plausibility. Current LLM-based GIS s
第 1/3 页
下一页