{"id":9046,"date":"2021-07-26T13:23:59","date_gmt":"2021-07-26T05:23:59","guid":{"rendered":"https:\/\/cde.nus.edu.sg\/ece\/?page_id=9046"},"modified":"2023-03-01T13:31:50","modified_gmt":"2023-03-01T05:31:50","slug":"vision-machine-learning-lab","status":"publish","type":"page","link":"https:\/\/cde.nus.edu.sg\/ece\/vision-machine-learning-lab\/","title":{"rendered":"Vision and Machine Learning Laboratory"},"content":{"rendered":"\n<h2>\n\t\tVision and Machine Learning Laboratory\n\t<\/h2>\n\tWelcome to Vision and Machine Learning Lab at National University of Singapore!<br \/>\nMore up-to-date information can be found at:\u00a0<a href=\"https:\/\/sites.google.com\/view\/showlab\" target=\"_blank\" rel=\"noopener\">https:\/\/sites.google.com\/view\/showlab<\/a>\nWe aim to build multimodal AI Assistant on various platforms, such as social media app, AR glass, robot, video\/audio editing tool, with the ability of understanding video, audio, language collectively. This involves techniques like:\n<ul>\n<li dir=\"ltr\">\n<strong>Video Understanding<\/strong> e.g. action detection, video pre-training, object detection &amp; tracking, person re-ID, segmentation in space and time.\n<\/li>\n<li dir=\"ltr\">\n<strong>Multi-modal<\/strong> e.g. video+language, video+audio.\n<\/li>\n<li dir=\"ltr\">\n<strong>AI-Human <\/strong>cooperation and interaction\n<\/li>\n<\/ul>\n<table>\n<tbody>\n<tr>\n<td>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/cde.nus.edu.sg\/ece\/wp-content\/uploads\/sites\/3\/2021\/05\/zhengshou_stanford_photo-300x300.jpg\" alt=\"Zhengshou Stanford Photo\" width=\"190\" height=\"190\" \/><\/p>\n<\/td>\n<td>\n<strong>Lab Supervisor and PI<\/strong><strong>: Assistant Professor\u00a0Mike Shou<\/strong>\n<strong>Bio: <\/strong>Prof. Shou is a tenure-track Assistant Professor at National University of Singapore. He was a Research Scientist at Facebook AI in Bay Area. He obtained his Ph.D. degree at Columbia University in the City of New York, working with Prof. Shih-Fu Chang. He was awarded Wei Family Private Foundation Fellowship. He received the best student paper nomination at CVPR&#8217;17. His team won the first place in the International Challenge on Activity Recognition (ActivityNet) 2017. He is a Fellow of National Research Foundation (NRF) Singapore, Class of 2021.\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h3 dir=\"ltr\"><strong>News<\/strong>:<\/h3>\n<ul>\n<li dir=\"ltr\">\nOpenings for PhD, PostDoc, Visiting PhD, etc. More details at:\u00a0<a href=\"https:\/\/sites.google.com\/view\/showlab\/join-us\" target=\"_blank\" rel=\"noopener\">https:\/\/sites.google.com\/view\/showlab\/join-us<\/a>.\n<\/li>\n<li dir=\"ltr\">\nFor sponsoring research project, please contact Prof. Shou at mike.zheng.shou AT gmail.com\n<\/li>\n<\/ul>\n<h3>Activities:<\/h3>\n<section id=\"h.304ee79864d09f48_0\">\n<ul>\n<li dir=\"ltr\">\n<strong>Workshop Organizer: <\/strong>\n<ul>\n<li dir=\"ltr\">\nCVPR 2021, <strong>LOVEU<\/strong>: <a href=\"https:\/\/www.google.com\/url?q=https%3A%2F%2Fsites.google.com%2Fview%2Floveucvpr21%2Fhome&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNHw57L7y_sEYZq1BFkjGLxMpZoDWA\" target=\"_blank\" rel=\"noopener\">Long-form Video Understanding Workshop &amp; International Challenge<\/a>\n<\/li>\n<li dir=\"ltr\">\nICCV 2021, <strong>SSLL<\/strong>: <a href=\"https:\/\/www.google.com\/url?q=https%3A%2F%2Fsites.google.com%2Fview%2F1st-ssll-workshop-iccv21&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNHeMBZw0VMGhjkUspHLcRy8TrFASg\" target=\"_blank\" rel=\"noopener\">Share Stories and Lessons Learned<\/a>\n<\/li>\n<li dir=\"ltr\">\nICCV 2021, <strong>SRVU<\/strong>: <a href=\"https:\/\/www.google.com\/url?q=https%3A%2F%2Fsites.google.com%2Fview%2Fsrvu-iccv21-workshop&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNE4NCPvr4a3dy3Ij2NW0fJWp51wyg\" target=\"_blank\" rel=\"noopener\">Structured Representations for Video Understanding<\/a>\n<\/li>\n<li dir=\"ltr\">\nICCV 2021, <strong>EPIC<\/strong>: <a href=\"https:\/\/www.google.com\/url?q=https%3A%2F%2Feyewear-computing.org%2FEPIC_ICCV21%2F&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNFIWbG9-QgePmNh63Abp6y7jvl4LA\" target=\"_blank\" rel=\"noopener\">Ninth International Workshop on Egocentric Perception, Interaction and Computing: Introducing Ego4D &#8211; a massive first-person dataset and challenge<\/a>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/section>\n<section id=\"h.190a613d3eda5cb3_1\">\n<ul>\n<li dir=\"ltr\">\n<strong>Recent Professional Services: <\/strong>\n<ul>\n<li dir=\"ltr\">\nArea Chair \/ Meta-reviewer \/ Senior Program Committee:\n<ul>\n<li dir=\"ltr\">\nCVPR 2022\n<\/li>\n<li dir=\"ltr\">\nACM MM 2021, IJCAI 2021\n<\/li>\n<li dir=\"ltr\">\nACM MM 2020, ACM MM Asia 2020\n<\/li>\n<\/ul>\n<\/li>\n<li dir=\"ltr\">\nReviewers:\n<ul>\n<li dir=\"ltr\">\nConferences: CVPR, ICCV, ECCV, AAAI, IJCAI, ICLR, NeurIPS, ICML, ACM MM, etc.\n<\/li>\n<li dir=\"ltr\">\nJournals: International Journal of Computer Vision, Transactions on Pattern Analysis and Machine Intelligence, etc.\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/section>\n<section id=\"h.16b2e42e35714474_0\">\n<ul>\n<li dir=\"ltr\">\n<strong>Recent Talks: <\/strong>\n<ul>\n<li dir=\"ltr\">\nApril 2021, Invited talk at the University of Bristol, &#8220;Generic Event Boundary Detection: A Benchmark for Event Segmentation&#8221;\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/section>\n<section id=\"h.16b2e42e35714474_0\">\n<\/section>\n\t\t\t\t\t\t<h5 tabindex=\"0\">Selected Publications<\/h5>\n\t\t\t<p><strong>Full list of publications:\u00a0<\/strong>[<a href=\"https:\/\/scholar.google.com\/citations?user=h1-3lSoAAAAJ&amp;hl\" target=\"_blank\" rel=\"noopener\">Google Scholar<\/a>]<\/p>\n<ul>\n<li><strong>Generic Event Boundary Detection: A Benchmark for Event Segmentation<\/strong>.<br \/>Mike Zheng Shou, Stan W. Lei, Deepti Ghadiyaram, Weiyao Wang, Matt Feiszli.<br \/>International Conference on Computer Vision (ICCV), 2021. [<a href=\"http:\/\/www.google.com\/url?q=http%3A%2F%2Farxiv.org%2Fabs%2F2101.10511%2F&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNHG2OKLdl-ZkIJvSWkqSRmH-IgcMg\" target=\"_blank\" rel=\"noopener\">arxiv<\/a>]<br \/>The first large-scale taxonomy-free event segmentation benchmark. A stepping stone to addressing long-form video understanding. We organised a workshop called LOVEU at CVPR&#8217;21 along with competitions built upon this dataset. The competitions attracted 20+ participants!<\/li>\n<\/ul>\n<ul>\n<li><strong>Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection<\/strong>.<br \/>Ruijie Tao, Zexu Pan, Rohan Kumar Das, Xinyuan Qian, Mike Zheng Shou, Haizhou Li.<br \/>ACM Multimedia, 2021. [<a href=\"https:\/\/static.googleusercontent.com\/media\/research.google.com\/en\/\/ava\/2021\/S3_NUS_Report_AVA_ActiveSpeaker_2021.pdf\" target=\"_blank\" rel=\"noopener\">AVA challenge report<\/a>]<br \/>Leverage video + audio to detect active speakers. Secure the 3rd place in\u00a0<a href=\"https:\/\/research.google.com\/ava\/challenge.html\" target=\"_blank\" rel=\"noopener\">AVA challenge<\/a>\u00a0organised by Google Research at CVPR&#8217;21 ActivityNet.<\/li>\n<\/ul>\n<ul>\n<li><strong>On Pursuit of Designing Multi-modal Transformer for Video Grounding<\/strong>.<br \/>Meng Cao, Long Chen, Mike Zheng Shou, Can Zhang, Yuexian Zou.<br \/>Preprint. 2021.<\/li>\n<\/ul>\n<ul>\n<li><strong>Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization<\/strong>.<br \/>Junting Pan, Siyu Chen, Mike Zheng Shou, Yu Liu, Jing Shao, Hongsheng Li.<br \/>IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021. [<a href=\"https:\/\/www.google.com\/url?q=https%3A%2F%2Farxiv.org%2Fabs%2F2006.07976&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNEW3g1nGrRHlx3NZS8k8aY4qowxdw\" target=\"_blank\" rel=\"noopener\">arxiv<\/a>]<br \/>Localize action in space and time. Core technique of the 1st place in\u00a0<a href=\"https:\/\/research.google.com\/ava\/challenge.html\" target=\"_blank\" rel=\"noopener\">AVA challenge<\/a>\u00a0organised by Google Research at CVPR&#8217;20 ActivityNet.<\/li>\n<\/ul>\n<ul>\n<li><strong>SF-Net: Single-Frame Supervision for Temporal Action Localization.<\/strong><br \/>Fan Ma, Linchao Zhu, Yi Yang, Shengxin Zha, Gourab Kundu, Matt Feiszli, Zheng Shou.<br \/>European Conference on Computer Vision (ECCV), 2020. Spotlight, acceptance rate top 5%. [<a href=\"https:\/\/www.google.com\/url?q=https%3A%2F%2Farxiv.org%2Fabs%2F2003.06845&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNHBbxDcfd3pjhUZo4ONLTIe9XCCfA\" target=\"_blank\" rel=\"noopener\">arxiv<\/a>]<br \/>A new form of weak supervision, comparable results to its fully-supervised counterpart with much cheaper annotation cost.<\/li>\n<\/ul>\n<ul>\n<li><strong>DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition<\/strong><strong>.<\/strong><br \/>Zheng Shou, Xudong Lin, Yannis Kalantidis, Laura Sevilla-Lara, Marcus Rohrbach, Shih-Fu Chang, Zhicheng Yan.<br \/>IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. [<a href=\"https:\/\/www.google.com\/url?q=https%3A%2F%2Farxiv.org%2Fabs%2F1901.03460&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNG0wqdzeheV74mbfbIdDlvg2WBSOA\" target=\"_blank\" rel=\"noopener\">arxiv<\/a>]<br \/>A video model that learns discriminative motion cues directly from compressed video &#8211; fast &amp; accurate.<\/li>\n<\/ul>\n<ul>\n<li><strong>AutoLoc: Weakly-supervised Temporal Action Localization in Untrimmed Videos<\/strong><strong>.<\/strong><br \/>Zheng Shou, Hang Gao, Lei Zhang, Kazuyuki Miyazawa, Shih-Fu Chang.<br \/>European Conference on Computer Vision (ECCV), 2018. [<a href=\"https:\/\/www.google.com\/url?q=https%3A%2F%2Farxiv.org%2Fabs%2F1807.08333%2F&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNG_KoULevFWqaczwWjVXSKPoh_S_Q\" target=\"_blank\" rel=\"noopener\">arxiv<\/a>]<\/li>\n<\/ul>\n<ul>\n<li><strong>Online Detection of Action Start in Untrimmed, Streaming Videos<\/strong><strong>.<\/strong><br \/>Zheng Shou*, Junting Pan*, Jonathan Chan, Kazuyuki Miyazawa, Hassan Mansour, Anthony Vetro, Xavi Gir\u00f3-i-Nieto, Shih-Fu Chang.<br \/>European Conference on Computer Vision (ECCV), 2018. [<a href=\"https:\/\/www.google.com\/url?q=https%3A%2F%2Farxiv.org%2Fabs%2F1802.06822%2F&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNFuol8SZbAfY5UxvbgYoDgcD8jPCg\" target=\"_blank\" rel=\"noopener\">arxiv<\/a>]<\/li>\n<\/ul>\n<ul>\n<li><strong>CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos<\/strong><strong>.<\/strong><br \/>Zheng Shou, Jonathan Chan, Alireza Zareian, Kazuyuki Miyazawa, Shih-Fu Chang.<br \/>IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. [<a href=\"https:\/\/www.google.com\/url?q=https%3A%2F%2Farxiv.org%2Fabs%2F1703.01515%2F&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNHrwVQ3xrJ7aoDkJWxEvjbPEIgVOQ\" target=\"_blank\" rel=\"noopener\">arxiv<\/a>] oral presentation, acceptance rate 2.6%, best student paper nomination.<\/li>\n<\/ul>\n<ul>\n<li><strong>ConvNet Architecture Search for Spatiotemporal Feature Learning<\/strong><strong>.<\/strong><br \/>Du Tran, Jamie Ray, Zheng Shou, Shih-Fu Chang, Manohar Paluri.<br \/>Technical Report, 2017. [<a href=\"https:\/\/www.google.com\/url?q=https%3A%2F%2Farxiv.org%2Fabs%2F1708.05038%2F&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNE40Ud8O-JNEegrONJE-VXbUSOwGw\" target=\"_blank\" rel=\"noopener\">arxiv<\/a>] [<a href=\"https:\/\/www.google.com\/url?q=https%3A%2F%2Fgithub.com%2Ffacebookarchive%2FC3D%2Ftree%2Fmaster%2FC3D-v1.1&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNEJH2mEUcfCa0MI70u-jB3lhS5DCQ\" target=\"_blank\" rel=\"noopener\">github<\/a>]<br \/>A open-source Res3D video backbone model that can support many video applications.<\/li>\n<\/ul>\n<ul>\n<li><strong>Single Shot Temporal Action Detection.<\/strong><br \/>Tianwei Lin, Xu Zhao, Zheng Shou.<br \/>ACM Multimedia, 2017. [<a href=\"http:\/\/www.google.com\/url?q=http%3A%2F%2Fwzmsltw.github.io%2Facmmm_370.pdf&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNHyi7HqtUK3pYtVKxL6uiyU4VQSnA\" target=\"_blank\" rel=\"noopener\">paper<\/a>] [<a href=\"https:\/\/www.google.com\/url?q=https%3A%2F%2Farxiv.org%2Fabs%2F1707.06750&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNFa357EkeHRjR67R32zMfn3DIG3nQ\" target=\"_blank\" rel=\"noopener\">challenge report<\/a>]<br \/>won the first place in both Temporal Action Proposal track and Temporal Action Localization track at the ActivityNet Challenge.<\/li>\n<\/ul>\n<ul>\n<li><strong>Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs<\/strong><strong>.<\/strong><br \/>Zheng Shou, Dongang Wang, and Shih-Fu Chang.<br \/>IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. [<a href=\"https:\/\/www.google.com\/url?q=https%3A%2F%2Farxiv.org%2Fabs%2F1703.01515%2F&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNHrwVQ3xrJ7aoDkJWxEvjbPEIgVOQ\" target=\"_blank\" rel=\"noopener\">arxiv<\/a>]<br \/>A pioneering work that proposes the first deep learning framework for temporal action localization in video.<\/li>\n<\/ul>\n\n","protected":false},"excerpt":{"rendered":"<p>Vision and Machine Learning Laboratory Welcome to Vision and Machine Learning Lab at National University of Singapore! More up-to-date information can be found at:\u00a0https:\/\/sites.google.com\/view\/showlab We aim to build multimodal AI Assistant on various platforms, such as social media app, AR glass, robot, video\/audio editing tool, with the ability of understanding video, audio, language collectively. This [&hellip;]<\/p>\n","protected":false},"author":31,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_acf_changed":false,"site-sidebar-layout":"default","site-content-layout":"default","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"default","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""}},"footnotes":""},"class_list":["post-9046","page","type-page","status-publish","hentry"],"acf":[],"_links":{"self":[{"href":"https:\/\/cde.nus.edu.sg\/ece\/wp-json\/wp\/v2\/pages\/9046","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cde.nus.edu.sg\/ece\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/cde.nus.edu.sg\/ece\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/cde.nus.edu.sg\/ece\/wp-json\/wp\/v2\/users\/31"}],"replies":[{"embeddable":true,"href":"https:\/\/cde.nus.edu.sg\/ece\/wp-json\/wp\/v2\/comments?post=9046"}],"version-history":[{"count":10,"href":"https:\/\/cde.nus.edu.sg\/ece\/wp-json\/wp\/v2\/pages\/9046\/revisions"}],"predecessor-version":[{"id":16823,"href":"https:\/\/cde.nus.edu.sg\/ece\/wp-json\/wp\/v2\/pages\/9046\/revisions\/16823"}],"wp:attachment":[{"href":"https:\/\/cde.nus.edu.sg\/ece\/wp-json\/wp\/v2\/media?parent=9046"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}