Redefining Open Minds: 如何找研究題目(一)：尋找不一樣的維度

如何找研究題目？

(How to come up with new research ideas?)

Jia-Bin Huang

jbhuang0604@gmail.com

Latest update: April 1st, 2010

The only difference between a rut and a grave is their dimensions. - Ellen Glasgow

一、尋找不一樣的維度 　　　　neXt = X^d

B 抽象的維度

接下來介紹的是較抽象的維度，也就是在同一個問題做概念上的推廣，直接討論例子比較容易理解

EX 1. Alignment/Registration

　　Alignment或是Registration一直以來都是電腦視覺的基本問題，古早的Optical Flow便是在處理像素的Alignment，通常輸入是兩張或多張時間上連續的影像，找尋每個pixel在不同影像上的對應關係。所以此時的X指的是Alignment/Registration這件事情，那麼抽象的維度概念就變成了"要Alignment/Registration的東西"。

　　從像素(Pixel)往上推廣，可以得到一群像素的集合(Segmentation)，若這個Segmentation在同一個平面上，則面跟面的Alignment就變成了Homography Estimation，但是有時單純Segmentation的Alignment沒有意義，還希望這得Segmentation可以表示某種有實質意義的東西(e.g. 物件)，最常見的物件類別即是人臉，所以Face Alignment即使到現在仍然是一個相當重要的問題 [1]。但是人臉是個很規則的剛性(Rigid)物件，如果該物件會變形的話(e.g., 衣服, 國旗)，Alignment的問題就變得更加複雜，由此便產生了non-rigid registration 的問題 [2]。再往上推廣便是較廣義的物件概念，比如說椅子的長相形狀千奇百怪，要怎麼做Alignment？這樣的想法便推動了形狀比對領域(Shape Matching) [3-4]。當我們再往上想一層，既然廣義的物件可以做Alignment，那麼整張影像呢? 影像跟影像常常有相似的結構(比如說森林, 海灘, 城市街景等等)，於是就出現了Scene Alignment [5]，找尋影像和影像之間的對應。

　　小小總結Alignment的問題，Pixel->Plane->Rigid object (e.g., face)-> Non-rigid object (e.g., clothes)-> Object categories (e.g., chair, building) -> Scene (e.g., photos)。再往上想？最直覺的延伸就是動作跟動作(Action)之間，或是Video和Video之間的Alignment了，這兩種類別目前還沒有研究產生，不過我相信一定是個不錯的題目，動作方面的Alignment對於動作識別(Action Recognition)有重大功用，而Video之間的Alignment則可以提供影片分析的知識轉移(Knowledge Transfer)。

[1] RASL: Robust Alignment by Sparse and Low-Rank Decomposition for Linearly Correlated Images, CVPR 2010

[2] A new point matching algorithm for non-rigid registration, CVIU 2003

[3] Shape Matching and Object Recognition, UC Berkeley Vision Group Project Page

[4] Shape Matching and Object Recognition using Low Distortion Correspondence, CVPR 2005

[5] SIFT flow: dense correspondence across difference scenes, ECCV 2008

EX 2. Shape from X

　　視覺裡頭一個古早的問題是3D Shape Reconstruction，我們看到的二維影像是三維世界的投影，這裡要處理的問題是我們如何從一群二維影像來取得三維的資訊，這就是著名的Shape from X問題，最早人們用陰影的資訊來重建 [1]，但是這必須要假設原先的物體是Lambertian reflectance，簡單地說就是不像金屬材質一樣會反光(specular)，這樣原先的Shape from shading方法便受到了很大的限制。於是將X的維度做替換，就出現了Shape from Specular [2-3], Shape from Texture [4] 等等一系列的問題

[1] Shape-from-shading: a survey, PAMI 1999

[2] Toward a theory of shape from specular flow, ICCV 2007

[3] Specular reflections and the perception of shape, Journal of Vision 2004

[4] Shape from texture, Biological Cybernetics 1988

EX 3. Depth from X

　　人眼視覺往往可以從單張影像就從中得知影像中深度的資訊，但是對於電腦而言，要從單張影像取得深度資訊卻是件相當有難度的事情，所以這一類問題可以寫成Depth from X，其中X即是你/妳希望拿來運用的資訊，從比較早期的Stereo [1], focus [2], defocus [3], 慢慢運用投影機 [4-5] 或是coded Aperture [6-7] 所得到的額外資訊，影像中的遮蔽情況 [8]，Diffusion [9]，還有使用物件辨識的資訊來估測影像深度 [10]，都屬於這一類方法的運用。

[1]An iterative image registration technique with an application to stereo vision, IJCAI 1981

[2] Depth from focus, Pattern Recognition Letter 1987

[3] Depth from defocus: a spatial domain approach, IJCV 1994

[4] High-accuracy stereo depth maps using structured light, CVPR 2003

[5] Depth from Sliding Projections, CVPR 2009

[6] Image and Depth from a Conventional Camera with a Coded Aperture, SIGGRAPH 2007

[7] Coded Aperture Pairs for Depth From Defocus, ICCV 2009

[8] Recovering Occlusion Boundaries from a Single Image, ICCV 2007

[9] Depth from Diffusion, CVPR 2010

[10] Single Image Depth Estimation From Predicted Semantic Labels, CVPR 2010

EX 4. Qualitative Reasoning for Image Understanding

　　我們都曉得一張二維影像從三維世界的投影，然而我們在看"一張"二維的圖的時候，我們卻可以從中得到很多資訊。這一類的問題便可以寫成Infer X from one single image。從這個中心思想出發，便可針對每種特性去探討 [1]：Geometric Layout [2-4], Occlusion Relationships [5], Camera Viewpoint [6], Illumination [7-8] and Geographic Properties [9]。

　　這裡的X還可以是甚麼呢？也許可以是溫度？情緒？美與否？由這個例子我們可以看到改變問題中抽象的維度可以有這麼多有趣的作品產生。