Learning geometry from vision for 

robotic manipulation

Download links:         Thesis  |  Slides 

Fig. 1: Two types of task structures: (i) a geometric task structure (Fig. 1A) that we use; (ii) a semantic task structure[3] (Fig. 1B) that extract task semantic meanings by a tree or graph, which has been intensively studied [13, 14] in the literature. 

This thesis studies how to enable a real-world robot to efficiently learn a new task by watching human demonstration videos. Learning by watching provides a more human-intuitive task teaching interface than methods requiring coordinates programming, reward/cost design, kinesthetic teaching, or teleoperation.  However, challenges regarding massive human demonstrations, tedious data annotations or heavy training of a robot controller impede its acceptance in real-world applications. To overcome these challenges, we introduce a geometric task structure to the problem solution. 


What is a geometric task structure?

A geometric task structure uses geometric features observed in image planes[1, 2] to specify a task by either explicitly forming geometric constraints or implicitly extracting task-relevant keypoints. For example, in figure 1A, a screwing task can be specified by points and lines constraints. 

What new insights can it bring to us in robot learning?

Fig. 2: The same geometric constraint from categorical objects defines the same task, which is called task specification correspondence. The affordance of object parts alone does not define any task. Beyond affordance, the interconnections between multiple object parts provide task specification, thus represents the task and can be used to build task specification correspondence.


How does this work relate to robot learning literature?

In addition to the "learning by watching" approaches, a.k.a., "third-person visual imitation learning"[12],  our work also closely relates to methods that use task-relevant geometric features to boost up sample-efficient robot learning and task generalization (Fig. 3).  

Fig. 3: Example of current researches using keypoint structures in robot learning. The top row shows the method, keypoint structures are extracted by modules marked with a blue block.


These researches motivate us to push from a simple intuition that “task-relevant keypoints should be good for robot learning” to a complete investigation on:

This 8 min video uses a hammering task as an introductory demo for this research.

References