Publications

By Don Kimber (Clear Search)

2016
Publication Details
  • UIST 2016 (Demo)
  • Oct 16, 2016

Abstract

Close
We propose a robust pointing detection with virtual shadow representation for interacting with a public display. Using a depth camera, our shadow is generated by a model with an angled virtual sun light and detects the nearest point as a pointer. Position of the shadow becomes higher when user walks closer, which conveys the notion of correct distance to control the pointer and offers accessibility to the higher area of the display.
Publication Details
  • Ro-Man 2016
  • Aug 26, 2016

Abstract

Close
Two related challenges with current teleoperated robotic systems are lack of peripheral vision and awareness, and difficulty or tedium of navigating through remote spaces. We address these challenges by providing an interface with a focus plus context (F+C) view of the robot location, and where the user can navigate simply by looking where they want to go, and clicking or drawing a path on the view to indicate the desired trajectory or destination. The F+C view provides an undistorted, perspectively correct central region surrounded by a wide field of view peripheral portion, and avoids the need for separate views. The navigation method is direct and intuitive in comparison to keyboard or joystick based navigation, which require the user to be in a control loop as the robot moves. Both the F+C views and the direct click navigation were evaluated in a preliminary user study.
Publication Details
  • International Workshop on Interactive Content Consumption
  • Jun 22, 2016

Abstract

Close
The confluence of technologies such as telepresence, immersive imaging, model based virtual mirror worlds, mobile live streaming, etc. give rise to a capability for people anywhere to view and connect with present or past events nearly anywhere on earth. This capability properly belongs to a public commons, available as a birthright of all humans, and can been seen as part of an evolutionary transition supporting a global collective mind. We describe examples and elements of this capability, and suggest how they can be better integrated through a tool we call TeleViewer and a framework called WorldViews, which supports easy sharing of views as well as connecting of providers and consumers of views all around the world.
2015
Publication Details
  • MobileHCI 2015
  • Aug 24, 2015

Abstract

Close
In this paper we report findings from two user studies that explore the problem of establishing common viewpoint in the context of a wearable telepresence system. In our first study, we assessed the ability of a local person (the guide) to identify the view orientation of the remote person by looking at the physical pose of the telepresence device. In the follow-up study, we explored visual feedback methods for communicating the relative viewpoints of the remote user and the guide via a head-mounted display. Our results show that actively observing the pose of the device is useful for viewpoint estimation. However, in the case of telepresence devices without physical directional affordances, a live video feed may yield comparable results. Lastly, more abstract visualizations lead to significantly longer recognition times, but may be necessary in more complex environments.
Publication Details
  • Presented in "Everyday Telepresence" workshop at CHI 2015
  • Apr 18, 2015

Abstract

Close
As video-mediated communication reaches broad adoption, improving immersion and social interaction are important areas of focus in the design of tools for exploration and work-based communication. Here we present three threads of research focused on developing new ways of enabling exploration of a remote environment and interacting with the people and artifacts therein.
Publication Details
  • Human-Robot Interaction (HRI) 2015
  • Mar 2, 2015

Abstract

Close
Our research focuses on improving the effectiveness and usability of driving mobile telepresence robots by increasing the user's sense of immersion during the navigation task. To this end we developed a robot platform that allows immersive navigation using head-tracked stereoscopic video and a HMD. We present the result of an initial user study that compares System Usability Scale (SUS) ratings of a robot teleoperation task using head-tracked stereo vision with a baseline fixed video feed and the effect of a low or high placement of the camera(s). Our results show significantly higher ratings for the fixed video condition and no effect of the camera placement. Future work will focus on examining the reasons for the lower ratings of stereo video and and also exploring further visual navigation interfaces.
2014
Publication Details
  • MobileHCI 2014 (Industrial Case Study)
  • Sep 23, 2014

Abstract

Close
Telepresence systems usually lack mobility. Polly, a wearable telepresence device, allows users to explore remote locations or experience events remotely by means of a person that serves as a mobile "guide". We built a series of hardware prototypes and our current, most promising embodiment consists of a smartphone mounted on a stabilized gimbal that is wearable. The gimbal enables remote control of the viewing angle as well as providing active image stabilization while the guide is walking. We present qualitative findings from a series of 8 field tests using either Polly or only a mobile phone. We found that guides felt more physical comfort when using Polly vs. a phone and that Polly was accepted by other persons at the remote location. Remote participants appreciated the stabilized video and ability to control camera view. Connection and bandwidth issues appear to be the most challenging issues for Polly-like systems.

Polly: Telepresence from a Guide's Shoulder

Publication Details
  • Assistive Computer Vision and Robotics Workshop of ECCV
  • Sep 12, 2014

Abstract

Close
Polly is an inexpensive, portable telepresence device based on the metaphor of a parrot riding a guide's shoulder and acting as proxy for remote participants. Although remote users may be anyone with a desire for `tele-visits', we focus on limited mobility users. We present a series of prototypes and field tests that informed design iterations. Our current implementations utilize a smartphone on a stabilized, remotely controlled gimbal that can be hand held, placed on perches or carried by wearable frame. We describe findings from trials at campus, museum and faire tours with remote users, including quadriplegics. We found guides were more comfortable using Polly than a phone and that Polly was accepted by other people. Remote participants appreciated stabilized video and having control of the camera. One challenge is negotiation of movement and view control. Our tests suggests Polly is an effective alternative to telepresence robots, phones or fixed cameras.
2012
Publication Details
  • IPIN2012
  • Nov 13, 2012

Abstract

Close
We describe Explorer, a system utilizing mirror worlds - dynamic 3D virtual models of physical spaces that reflect the structure and activities of those spaces to help support navigation, context awareness and tasks such as planning and recollection of events. A rich sensor network dynamically updates the models, determining the position of people, status of rooms, or updating textures to reflect displays or bulletin boards. Through views on web pages, portable devices, or on 'magic window' displays located in the physical space, remote people may 'Clook in' to the space, while people within the space are provided with augmented views showing information not physically apparent. For example, by looking at a mirror display, people can learn how long others have been present, or where they have been. People in one part of a building can get a sense of activities in the rest of the building, know who is present in their office, and look in to presentations in other rooms. A spatial graph is derived from the 3D models which is used both to navigational paths and for fusion of acoustic, WiFi, motion and image sensors used for positioning. We describe usage scenarios for the system as deployed in two research labs, and a conference venue.
Publication Details
  • IPIN2012
  • Nov 13, 2012

Abstract

Close
Audio-based receiver localization in indoor environ-ments has multiple applications including indoor navigation, loca-tion tagging, and tracking. Public places like shopping malls and consumer stores often have loudspeakers installed to play music for public entertainment. Similarly, office spaces may have sound conditioning speakers installed to soften other environmental noises. We discuss an approach to leverage this infrastructure to perform audio-based localization of devices requesting local-ization in such environments, by playing barely audible controlled sounds from multiple speakers at known positions. Our approach can be used to localize devices such as smart-phones, tablets and laptops to sub-meter accuracy. The user does not need to carry any specialized hardware. Unlike acoustic approaches which use high-energy ultrasound waves, the use of barely audible (low energy) signals in our approach poses very different challenges. We discuss these challenges, how we addressed those, and experimental results on two prototypical implementations: a request-play-record localizer, and a continuous tracker. We evaluated our approach in a real world meeting room and report promising initial results with localization accuracy within half a meter 94% of the time. The system has been deployed in multiple zones of our office building and is now part of a location service in constant operation in our lab.

Through the Looking-Glass: Mirror Worlds for Augmented Awareness & Capability

Publication Details
  • ACM MM 2012
  • Oct 29, 2012

Abstract

Close
We describe a system for supporting mirror worlds - 3D virtual models of physical spaces that reflect the structure and activities of those spaces to help support context awareness and tasks such as planning and recollection of events. Through views on web pages, portable devices, or on 'magic window' displays located in the physical space, remote people may 'look in' to the space, while people within the space are provided information not apparent through unaided perception. For example, by looking at a mirror display, people can learn how long others have been present, or where they have been. People in one part of a building can get a sense of activities in the rest of the building, know who is present in their office, and look in to presentations in other rooms. The system can be used to bridge across sites and help provide different parts of an organization with a shared awareness of each other's space and activities. We describe deployments of our mirror world system at several locations.
2011

Augmented Perception through Mirror Worlds

Publication Details
  • Augmented Human 2011
  • Mar 12, 2011

Abstract

Close
We describe a system that mirrors a public physical space into cyberspace to provide people with augmented awareness of that space. Through views on web pages, portable devices, or on `Magic Window' displays located in the physical space, remote people may `look in' to the space, while people within the space are provided information not apparent through unaided perception. For example, by looking at a mirror display, people can learn how long others have been present, where they have been, etc. People in one part of a building can get a sense of the activities in the rest of the building, who is present in their office, look in to a talk in another room, etc. We describe a prototype for such a system developed in our research lab and office space.
2010

The Virtual Chocolate Factory:Mixed Reality Industrial Collaboration and Control

Publication Details
  • ACM Multimedia 2010 - Industrial Exhibits
  • Oct 25, 2010

Abstract

Close
We will exhibit several aspects of a complex mixed reality system that we have built and deployed in a real-world factory setting. In our system, virtual worlds, augmented realities, and mobile applications are all fed from the same infrastructure. In collaboration with TCHO, a chocolate maker in San Francisco, we built a virtual “mirror” world of a real-world chocolate factory and its processes. Sensor data is imported into the multi-user 3D environment from hundreds of sensors on the factory floor. The resulting virtual factory is used for simulation, visualization, and collaboration, using a set of interlinked, real-time layers of information. Another part of our infrastructure is designed to support appropriate industrial uses for mobile devices such as cell phones and tablet computers. We deployed this system at the real-world factory in 2009, and it is now is daily use there. By simultaneously developing mobile, virtual, and web-based display and collaboration environments, we aimed to create an infrastructure that did not skew toward one type of application but that could serve many at once, interchangeably. Through this mixture of mobile, social, mixed and virtual technologies, we hope to create systems for enhanced collaboration in industrial settings between physically remote people and places, such as factories in China with managers in the US.

Camera Pose Navigation using Augmented Reality

Publication Details
  • ISMAR 2010
  • Oct 13, 2010

Abstract

Close
We propose an Augmented Reality (AR) system that helps users take a picture from a designated pose, such as the position and camera angle of an earlier photo. Repeat photography is frequently used to observe and document changes in an object. Our system uses AR technology to estimate camera poses in real time. When a user takes a photo, the camera pose is saved as a 'view bookmark.' To support a user in taking a repeat photo, two simple graphics are rendered in an AR viewer on the camera's screen to guide the user to this bookmarked view. The system then uses image adjustment techniques to create an image based on the user's repeat photo that is even closer to the original.
Publication Details
  • ICME 2010, Singapore, July 19-23 2010
  • Jul 19, 2010

Abstract

Close
Virtual, mobile, and mixed reality systems have diverse uses for data visualization and remote collaboration in industrial settings, especially factories. We report our experiences in designing complex mixed-reality collaboration, control, and display systems for a real-world factory, for delivering real-time factory information to multiple users. In collaboration with (blank for review), a chocolate maker in San Francisco, our research group is building a virtual “mirror” world of a real-world chocolate factory and its processes. Real-world sensor data (such as temperature and machine state) is imported into the 3D environment from hundreds of sensors on the factory floor. Multi-camera imagery from the factory is also available in the multi-user 3D factory environment. The resulting "virtual factory" is designed for simulation, visualization, and collaboration, using a set of interlinked, real-time 3D and 2D layers of information about the factory and its processes. We are also looking at appropriate industrial uses for mobile devices such as cell phones and tablet computers, and how they intersect with virtual worlds and mixed realities. For example, an experimental iPhone web app provides mobile laboratory monitoring and control. The app allows a real-time view into the lab via steerable camera and remote control of lab machines. The mobile system is integrated with the database underlying the virtual factory world. These systems were deployed at the real-world factory and lab in 2009, and are now in beta development. Through this mashup of mobile, social, mixed and virtual technologies, we hope to create industrial systems for enhanced collaboration between physically remote people and places – for example, factories in China with managers in Japan or the US.

Geometric reconstruction from point-normal data

Publication Details
  • SIAM MI'09 monograph. Related talks: SIAM GPM'09, SIAM MI'09, and BAMA (Bay Area Mathematical Adventures)
  • May 1, 2010

Abstract

Close
Creating virtual models of real spaces and objects is cumber- some and time consuming. This paper focuses on the prob- lem of geometric reconstruction from sparse data obtained from certain image-based modeling approaches. A number of elegant and simple-to-state problems arise concerning when the geometry can be reconstructed. We describe results and counterexamples, and list open problems.
Publication Details
  • IEEE Virtual Reality 2010 conference
  • Mar 19, 2010

Abstract

Close
This project investigates practical uses of virtual, mobile, and mixed reality systems in industrial settings, in particular control and collaboration applications for factories. In collaboration with TCHO, a chocolate maker start-up in San Francisco, we have built virtual mirror-world representations of a real-world chocolate factory and are importing its data and modeling its processes. The system integrates mobile devices such as cell phones and tablet computers. The resulting "virtual factory" is a cross-reality environment designed for simulation, visualization, and collaboration, using a set of interlinked, real-time 3D and 2D layers of information about the factory and its processes.
2009

Marking up a World: Physical Markup for Virtual Content Creation (Video)

Publication Details
  • ACM Multimedia
  • Oct 21, 2009

Abstract

Close
The Pantheia system enables users to create virtual models by `marking up' the real world with pre-printed markers. The markers have prede fined meanings that guide the system as it creates models. Pantheia takes as input user captured images or video of the marked up space. This video illustrates the workings of the system and shows it being used to create three models, one of a cabinet, one of a lab, and one of a conference room. As part of the Pantheia system, we also developed a 3D viewer that spatially integrates a model with images of the model.

Interactive Models from Images of a Static Scene

Publication Details
  • Computer Graphics and Virtual Reality (CGVR '09)
  • Jul 13, 2009

Abstract

Close
FXPAL's Pantheia system enables users to create virtual models by 'marking up' a physical space with pre-printed visual markers. The meanings associated with the markers come from a markup language that enables the system to create models from a relatively sparse set of markers. This paper describes extensions to our markup language and system that support the creation of interactive virtual objects. Users place markers to define components such as doors and drawers with which an end user of the model can interact. Other interactive elements, such as controls for color changes or lighting choices, are also supported. Pantheia produced a model of a room with hinged doors, a cabinet with drawers, doors, and color options, and a railroad track.
Publication Details
  • 2009 IEEE International Conference on Multimedia and Expo (ICME)
  • Jun 30, 2009

Abstract

Close

This paper presents a tool and a novel Fast Invariant Transform (FIT) algorithm for language independent e-documents access. The tool enables a person to access an e-document through an informal camera capture of a document hardcopy. It can save people from remembering/exploring numerous directories and file names, or even going through many pages/paragraphs in one document. It can also facilitate people’s manipulation of a document or people’s interactions through documents. Additionally, the algorithm is useful for binding multimedia data to language independent paper documents. Our document recognition algorithm is inspired by the widely known SIFT descriptor [4] but can be computed much more efficiently for both descriptor construction and search. It also uses much less storage space than the SIFT approach. By testing our algorithm with randomly scaled and rotated document pages, we can achieve a 99.73% page recognition rate on the 2188-page ICME06 proceedings and 99.9% page recognition rate on a 504-page Japanese math book.

Image-based Lighting Adjustment Method for Browsing Object Images

Publication Details
  • 2009 IEEE International Conference on Multimedia and Expo (ICME)
  • Jun 30, 2009

Abstract

Close
In this paper, we describe an automatic lighting adjustment method for browsing object images. From a set of images of an object, taken under different lighting conditions, we generate two types of illuminated images: a textural image which eliminates unwanted specular reflections of the object, and a highlight image in which specularities of the object are highly preserved. Our user interface allows viewers to digitally zoom into any region of the image, and the lighting adjusted images are automatically generated for the selected region and displayed. Switching between the textural and the highlight images helps viewers to understand characteristics of the object surface.
Publication Details
  • Immerscom 2009
  • May 27, 2009

Abstract

Close
We describe Pantheia, a system that constructs virtual models of real spaces from collections of images, through the use of visual markers that guide and constrain model construction. To create a model users simply `mark up' the real world scene by placing pre-printed markers that describe scene elements or impose semantic constraints. Users then collect still images or video of the scene. From this input, Pantheia automatically and quickly produces a model. The Pantheia system was used to produce models of two rooms that demonstrate the e ectiveness of the approach.
2008

Rethinking the Podium

Publication Details
  • Chapter in "Interactive Artifacts and Furniture Supporting Collaborative Work and Learning", ed. P. Dillenbourg, J. Huang, and M. Cherubini. Published Nov. 28, 2008, Springer. Computer Supported Collaborative learning Series Vol 10.
  • Nov 28, 2008

Abstract

Close
As the use of rich media in mobile devices and smart environments becomes more sophisticated, so must the design of the everyday objects used as controllers and interfaces. Many new interfaces simply tack electronic systems onto existing forms. However, an original physical design for a smart artefact, that integrates new systems as part of the form of the device, can enhance the end-use experience. The Convertible Podium is an experiment in the design of a smart artefact with complex integrated systems for the use of rich media in meeting rooms. It combines the highly designed look and feel of a modern lectern with systems that allow it to serve as a central control station for rich media manipulation. The interface emphasizes tangibility and ease of use in controlling multiple screens, multiple media sources (including mobile devices) and multiple distribution channels, and managing both data and personal representation in remote telepresence.

Virtual Physics Circus (video)

Publication Details
  • ACM Multimedia 2008
  • Oct 27, 2008

Abstract

Close
This video shows the Virtual Physics Circus, a kind of playground for experimenting with simple physical models. The system makes it easy to create worlds with common physical objects such as swings, vehicles, ramps, and walls, and interactively play with those worlds. The system can be used as a creative art medium as well as to gain understanding and intuition about physical systems. The system can be controlled by a number of UI devices such as mouse, keyboard, joystick, and tags which are tracked in 6 degrees of freedom.
Publication Details
  • IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2008
  • Jun 24, 2008

Abstract

Close
Current approaches to pose estimation and tracking can be classified into two categories: generative and discriminative. While generative approaches can accurately determine human pose from image observations, they are computationally intractable due to search in the high dimensional human pose space. On the other hand, discriminative approaches do not generalize well, but are computationally efficient. We present a hybrid model that combines the strengths of the two in an integrated learning and inference framework. We extend the Gaussian process latent variable model (GPLVM) to include an embedding from observation space (the space of image features) to the latent space. GPLVM is a generative model, but the inclusion of this mapping provides a discriminative component, making the model observation driven. Observation Driven GPLVM (OD-GPLVM) not only provides a faster inference approach, but also more accurate estimates (compared to GPLVM) in cases where dynamics are not sufficient for the initialization of search in the latent space. We also extend OD-GPLVM to learn and estimate poses from parameterized actions/gestures. Parameterized gestures are actions which exhibit large systematic variation in joint angle space for different instances due to difference in contextual variables. For example, the joint angles in a forehand tennis shot are function of the height of the ball (Figure 2). We learn these systematic variations as a function of the contextual variables. We then present an approach to use information from scene/object to provide context for human pose estimation for such parameterized actions.
2007
Publication Details
  • The 3rd International Conference on Collaborative Computing: Networking, Applications and Worksharing
  • Nov 12, 2007

Abstract

Close
This paper summarizes our environment-image/videosupported collaboration technologies developed in the past several years. These technologies use environment images and videos as active interfaces and use visual cues in these images and videos to orient device controls, annotations and other information access. By using visual cues in various interfaces, we expect to make the control interface more intuitive than buttonbased control interfaces and command-based interfaces. These technologies can be used to facilitate high-quality audio/video capture with limited cameras and microphones. They can also facilitate multi-screen presentation authoring and playback, teleinteraction, environment manipulation with cell phones, and environment manipulation with digital pens.

DOTS: Support for Effective Video Surveillance

Publication Details
  • Fuji Xerox Technical Report No. 17, pp. 83-100
  • Nov 1, 2007

Abstract

Close
DOTS (Dynamic Object Tracking System) is an indoor, real-time, multi-camera surveillance system, deployed in a real office setting. DOTS combines video analysis and user interface components to enable security personnel to effectively monitor views of interest and to perform tasks such as tracking a person. The video analysis component performs feature-level foreground segmentation with reliable results even under complex conditions. It incorporates an efficient greedy-search approach for tracking multiple people through occlusion and combines results from individual cameras into multi-camera trajectories. The user interface draws the users' attention to important events that are indexed for easy reference. Different views within the user interface provide spatial information for easier navigation. DOTS, with over twenty video cameras installed in hallways and other public spaces in our office building, has been in constant use for a year. Our experiences led to many changes that improved performance in all system components.
Publication Details
  • ICDSC 2007, pp. 132-139
  • Sep 25, 2007

Abstract

Close
Our analysis and visualization tools use 3D building geometry to support surveillance tasks. These tools are part of DOTS, our multicamera surveillance system; a system with over 20 cameras spread throughout the public spaces of our building. The geometric input to DOTS is a floor plan and information such as cubicle wall heights. From this input we construct a 3D model and an enhanced 2D floor plan that are the bases for more specific visualization and analysis tools. Foreground objects of interest can be placed within these models and dynamically updated in real time across camera views. Alternatively, a virtual first-person view suggests what a tracked person can see as she moves about. Interactive visualization tools support complex camera-placement tasks. Extrinsic camera calibration is supported both by visualizations of parameter adjustment results and by methods for establishing correspondences between image features and the 3D model.

DOTS: Support for Effective Video Surveillance

Publication Details
  • ACM Multimedia 2007, pp. 423-432
  • Sep 24, 2007

Abstract

Close
DOTS (Dynamic Object Tracking System) is an indoor, real-time, multi-camera surveillance system, deployed in a real office setting. DOTS combines video analysis and user interface components to enable security personnel to effectively monitor views of interest and to perform tasks such as tracking a person. The video analysis component performs feature-level foreground segmentation with reliable results even under complex conditions. It incorporates an efficient greedy-search approach for tracking multiple people through occlusion and combines results from individual cameras into multi-camera trajectories. The user interface draws the users' attention to important events that are indexed for easy reference. Different views within the user interface provide spatial information for easier navigation. DOTS, with over twenty video cameras installed in hallways and other public spaces in our office building, has been in constant use for a year. Our experiences led to many changes that improved performance in all system components.
Publication Details
  • ICME 2007, pp. 1015-1018
  • Jul 2, 2007

Abstract

Close
We describe a new interaction technique that allows users to control nonlinear video playback by directly manipulating objects seen in the video. This interaction technique is simi-lar to video "scrubbing" where the user adjusts the playback time by moving the mouse along a slider. Our approach is superior to variable-scale scrubbing in that the user can con-centrate on interesting objects and does not have to guess how long the objects will stay in view. Our method relies on a video tracking system that tracks objects in fixed cameras, maps them into 3D space, and handles hand-offs between cameras. In addition to dragging objects visible in video windows, users may also drag iconic object representations on a floor plan. In that case, the best video views are se-lected for the dragged objects.
Publication Details
  • ICME 2007, pp. 675-678
  • Jul 2, 2007

Abstract

Close
In this paper we describe the analysis component of an indoor, real-time, multi-camera surveillance system. The analysis includes: (1) a novel feature-level foreground segmentation method which achieves efficient and reliable segmentation results even under complex conditions, (2) an efficient greedy search based approach for tracking multiple people through occlusion, and (3) a method for multi-camera handoff that associates individual trajectories in adjacent cameras. The analysis is used for an 18 camera surveillance system that has been running continuously in an indoor business over the past several months. Our experiments demonstrate that the processing method for people detection and tracking across multiple cameras is fast and robust.

Featured Wand for 3D Interaction

Publication Details
  • ICME 2007
  • Jul 2, 2007

Abstract

Close
Our featured wand, automatically tracked by video cameras, provides an inexpensive and natural way for users to interact with devices such as large displays. The wand supports six degrees of freedom for manipulation of 3D applications like Google Earth. Our system uses a 'line scan' to estimate the wand pose tracking which simplifies processing. Several applications are demonstrated.
2006
Publication Details
  • Proceedings of IEEE Multimedia Signal Processing 2006
  • Oct 3, 2006

Abstract

Close
This paper presents a method for facilitating document redirection in a physical environment via a mobile camera. With this method, a user is able to move documents among electronic devices, post a paper document to a selected public display, or make a printout of a white board with simple point-and-capture operations. More specifically, the user can move a document from its source to a destination by capturing a source image and a destination image in a consecutive order. The system uses SIFT (Scale Invariant Feature Transform) features of captured images to identify the devices a user is pointing to, and issues corresponding commands associated with identified devices. Unlike RF/IR based remote controls, this method uses object visual features as an all time 'transmitter' for many tasks, and therefore is easy to deploy. We present experiments on identifying three public displays and a document scanner in a conference room for evaluation.
Publication Details
  • International Conference on Pattern Recognition
  • Aug 20, 2006

Abstract

Close
This paper describes a framework for detecting unusual events in surveillance videos. Most surveillance systems consist of multiple video streams, but traditional event detection systems treat individual video streams independently or combine them in the feature extraction level through geometric reconstruction. Our framework combines multiple video streams in the inference level, with a coupled hidden Markov Model (CHMM). We use two-stage training to bootstrap a set of usual events, and train a CHMM over the set. By thresholding the likelihood of a test segment being generated by the model, we build a unusual event detector. We evaluate the performance of our detector through qualitative and quantitative experiments on two sets of real world videos.
2005
Publication Details
  • Proceedings of SPIE International Symposium ITCom 2005 on Multimedia Systems and Applications VIII, Boston, Massachusetts, USA, October 2005.
  • Dec 7, 2005

Abstract

Close
Meeting environments, such as conference rooms, executive briefing centers, and exhibition spaces, are now commonly equipped with multiple displays, and will become increasingly display-rich in the future. Existing authoring / presentation tools such as PowerPoint, however, provide little support for effective utilization of multiple displays. Even using advanced multi-display enabled multimedia presentation tools, the task of assigning material to displays is tedious and distracts presenters from focusing on content. This paper describes a framework for automatically assigning presentation material to displays, based on a model of the quality of views of audience members. The framework is based on a model of visual fidelity which takes into account presentation content, audience members' locations, the limited resolution of human eyes, and display location, orientation, size, resolution, and frame rate. The model can be used to determine presentation material placement based on average or worst case audience member view quality, and to warn about material that would be illegible. By integrating this framework with a previous system for multi-display presentation [PreAuthor, others], we created a tool that accepts PowerPoint and/or other media input files, and automatically generates a layout of material onto displays for each state of the presentation. The tool also provides an interface allowing the presenter to modify the automatically generated layout before or during the actual presentation. This paper discusses the framework, possible application scenarios, examples of the system behavior, and our experience with system use.
Publication Details
  • IEEE Trans. Multimedia, Vol. 7 No. 5, pp. 981-990
  • Oct 11, 2005

Abstract

Close
Abstract-We present a system for automatically extracting the region of interest and controlling virtual cameras control based on panoramic video. It targets applications such as classroom lectures and video conferencing. For capturing panoramic video, we use the FlyCam system that produces high resolution, wide-angle video by stitching video images from multiple stationary cameras. To generate conventional video, a region of interest (ROI) can be cropped from the panoramic video. We propose methods for ROI detection, tracking, and virtual camera control that work in both the uncompressed and compressed domains. The ROI is located from motion and color information in the uncompressed domain and macroblock information in the compressed domain, and tracked using a Kalman filter. This results in virtual camera control that simulates human controlled video recording. The system has no physical camera motion and the virtual camera parameters are readily available for video indexing.
Publication Details
  • Paper presented at SIGGRAPH 2005, Los Angeles.
  • Sep 29, 2005

Abstract

Close
The Convertible Podium is a central control station for rich media in next-generation classrooms. It integrates flexible control systems for multimedia software and hardware, and is designed for use in classrooms with multiple screens, multiple media sources and multiple distribution channels. The built-in custom electronics and unique convertible podium frame allows intuitive conversion between use modes (either manual or automatic). The at-a-touch sound and light control system gives control over the classroom environment. Presentations can be pre-authored for effective performance, and quickly altered on the fly. The counter-weighted and motorized conversion system allows one person to change modes simply by lifting the top of the Podium to the correct position for each mode. The Podium is lightweight, mobile, and wireless, and features an onboard 21" LCD display, document cameras and other capture devices, tangible controls for hardware and software, and also possesses embedded RFID sensing for automatic data retrieval and file management. It is designed to ease the tasks involved in authoring and presenting in a rich media classroom, as well as supporting remote telepresence and integration with other mobile devices.
Publication Details
  • Short presentation in UbiComp 2005 workshop in Tokyo, Japan.
  • Sep 11, 2005

Abstract

Close
As the use of rich media in mobile devices and smart environments becomes more sophisticated, so must the design of the everyday objects used as containers or controllers. Rather than simply tacking electronics onto existing furniture or other objects, the design of a smart object can enhance existing ap-plications in unexpected ways. The Convertible Podium is an experiment in the design of a smart object with complex integrated systems, combining the highly designed look and feel of a modern lectern with systems that allow it to serve as a central control station for rich media manipulation in next-generation confer-ence rooms. It enables easy control of multiple independent screens, multiple media sources (including mobile devices) and multiple distribution channels. The Podium is designed to ease the tasks involved in authoring and presenting in a rich media meeting room, as well as supporting remote telepresence and in-tegration with mobile devices.
Publication Details
  • ICME 2005
  • Jul 20, 2005

Abstract

Close
A common problem with teleconferences is awkward turn-taking - particularly 'collisions,' whereby multiple parties inadvertently speak over each other due to communication delays. We propose a model for teleconference discussions including the effects of delays, and describe tools that can improve the quality of those interactions. We describe an interface to gently provide latency awareness, and to give advanced notice of 'incoming speech' to help participants avoid collisions. This is possible when codec latencies are significant, or when a low bandwidth side channel or out-of-band signaling is available with lower latency than the primary video channel. We report on results of simulations, and of experiments carried out with transpacific meetings, that demonstrate these tools can improve the quality of teleconference discussions.

AN ONLINE VIDEO COMPOSITION SYSTEM

Publication Details
  • IEEE International Conference on Multimedia & Expo July 6-8, 2005, Amsterdam, The Netherlands
  • Jul 6, 2005

Abstract

Close
This paper presents an information-driven online video composition system. The composition work handled by the system includes dynamically setting multiple pan/tilt/zoom (PTZ) cameras to proper poses and selecting the best close-up view for passive viewers. The main idea of the composition system is to maximize captured video information with limited cameras. Unlike video composition based on heuristic rules, our video composition is formulated as a process of minimizing distortions between ideal signals (i.e. signals with infinite spatial-temporal resolution) and displayed signals. The formulation is consistent with many well-known empirical approaches widely used in previous systems and may provide analytical explanations to those approaches. Moreover, it provides a novel approach for studying video composition tasks systematically. The composition system allows each user to select a personal close-up view. It manages PTZ cameras and a video switcher based on both signal characteristics and users' view selections. Additionally, it can automate the video composition process based on past users' view-selections when immediate selections are not available. We demonstrate the performance of this system with real meetings.
2004
Publication Details
  • Springer Lecture Notes in Computer Science - Advances in Multimedia Information Processing, Proc. PCM 2004 5th Pacific Rim Conference on Multimedia, Tokyo, Japan
  • Dec 1, 2004

Abstract

Close
For some years, our group at FX Palo Alto Laboratory has been developing technologies to support meeting recording, collaboration, and videoconferencing. This paper presents several systems that use video as an active interface, allowing remote devices and information to be accessed "through the screen." For example, SPEC enables collaborative and automatic camera control through an active video window. The NoteLook system allows a user to grab an image from a computer display, annotate it with digital ink, then drag it to that or a different display. The ePIC system facilitates natural control of multi-display and multi-device presentation spaces, while the iLight system allows remote users to "draw" with light on a local object. All our systems serve as platforms for researching more sophisticated algorithms to support additional functionality and ease of use.

Remote Interactive Graffiti

Publication Details
  • Proc. ACM Multimedia 2004
  • Oct 12, 2004

Abstract

Close
We present an installation that allows distributed internet participants to "draw" on a public scene using light. The iLight system is a camera/projector system designed for remote collaboration. Using a familiar digital drawing interface, remote users "draw" on a live video image of a real-life object or scene. Graphics drawn by the user are then projected onto the scene, where they are visible in the camera image. Because camera distortions are corrected and the video is aligned with the image canvas, drawn graphics appear exactly where desired. Thus the remote users may harmlessly mark a physical object to serve their own their artistic and/or expressive needs. We also describe how local participants may interact with remote users through the projected images. Besides the intrinsic "neat factor" of action at a distance, this installation serves as an experiment in how multiple users from different locales and cultures can create a social space that interacts with a physical one, as well as raising issues of free expression in a non-destructive context.
Publication Details
  • Proceedings of 2004 IEEE International Conference on Multimedia and Expo (ICME 2004)
  • Jun 27, 2004

Abstract

Close
Using a machine to assist remote environment management can save people's time, effort, and traveling cost. This paper proposes a trainable mobile robot system, which allows people to watch a remote site through a set of cameras installed on the robot, drive the platform around, and control remote devices using mouse or pen based gestures performed in video windows. Furthermore, the robot can learn device operations when it is being used by humans. After being used for a while, the robot can automatically select device control interfaces, or launch a pre-defined operation sequence based on its sensory inputs.
Publication Details
  • Proceedings of 2004 IEEE International Conference on Multimedia and Expo (ICME 2004)
  • Jun 27, 2004

Abstract

Close
Many conference rooms are now equipped with multiple multi-media devices, such as plasma displays and surrounding speakers, to enhance presentation quality. However, most existing presentation authoring tools are based on the one-display-and-one-speaker assumption, which makes it difficult to organize and playback a presentation dispatched to multiple devices, thus hinders users from taking full advantage of additional multimedia devices. In this paper, we propose and implement a tool to facilitate authoring and playback of a multi-channel presentation in a media devices distributed environment. The tool, named PreAuthor, provides an intuitive and visual way to author a multi-channel presentation by dragging and dropping "hyper-slides" on corresponding visual representations of various devices. PreAuthor supports "hyper-slide" synchronization among various output devices during preview and playback. It also offers multiple options for the presenter to view the presentation in a rendered image sequence, live video, 3D VRML model, or real environment.
Publication Details
  • JOINT AMI/PASCAL/IM2/M4 Workshop on Multimodal Interaction and Related Machine Learning Algorithms
  • Jun 22, 2004

Abstract

Close
For some years, our group at FX Palo Alto Laboratory has been developing technologies to support meeting recording, collaboration, and videoconferencing. This paper presents a few of our more interesting research directions. Many of our systems use a video image as an interface, allowing devices and information to be accessed "through the screen." For example, SPEC enables hybrid collaborative and automatic camera control through an active video window. The NoteLook system allows a user to grab an image from a computer display, annotate it with digital ink, then drag it to that or a different display, while automatically generating timestamps for later video review. The ePIC system allows natural use and control of multi-display and multi-device presentation spaces, and the iLight system allows remote users to "draw" with light on a local object. All our systems serve as platforms for researching more sophisticated algorithms that will hopefully support additional advanced functions and ease of use.
2003
Publication Details
  • Proc. ACM Multimedia 2003, pp. 546-554
  • Nov 1, 2003

Abstract

Close
We present a system that allows remote and local participants to control devices in a meeting environment using mouse or pen based gestures "through" video windows. Unlike state-of-the-art device control interfaces that require interaction with text commands, buttons, or other artificial symbols, our approach allows users to interact with devices through live video of the environment. This naturally extends our video supported pan/tilt/zoom (PTZ) camera control system, by allowing gestures in video windows to control not only PTZ cameras, but also other devices visible in video images. For example, an authorized meeting participant can show a presentation on a screen by dragging the file on a personal laptop and dropping it on the video image of the presentation screen. This paper presents the system architecture, implementation tradeoffs, and various meeting control scenarios.
Publication Details
  • Proc. IEEE Intl. Conf. on Image Processing
  • Sep 14, 2003

Abstract

Close
This paper presents a video acquisition system that can learn automatic video capture from human's camera operations. Unlike a predefined camera control system, this system can easily adapt to its environment changes with users' help. By collecting users' camera-control operations under various environments, the control system can learn video capture from human, and use these learned skills to operate its cameras when remote viewers don't, won't, or can't operate the system. Moreover, this system allows remote viewers to control their own virtual cameras instead of watching the same video produced by a human operator or a fully automatic system. The online learning algorithm and the camera management algorithm are demonstrated using field data.
Publication Details
  • Proceedings of INTERACT '03, pp. 583-590.
  • Sep 1, 2003

Abstract

Close
In a meeting room environment with multiple public wall displays and personal notebook computers, it is possible to design a highly interactive experience for manipulating and annotating slides. For the public displays, we present the ModSlideShow system with a discrete modular model for linking the displays into groups, along with a gestural interface for manipulating the flow of slides within a display group. For the applications on personal devices, an augmented reality widget with panoramic video supports interaction among the various displays. This widget is integrated into our NoteLook 3.0 application for annotating, capturing and beaming slides on pen-based notebook computers.
Publication Details
  • 2003 International Conference on Multimedia and Expo
  • Jul 6, 2003

Abstract

Close
This paper presents an information-driven audiovisual signal acquisition approach. This approach has several advantages: users are encouraged to assist in signal acquisition; available sensors are managed based on both signal characteristics and users' suggestions. The problem formulation is consistent with many well-known empirical approaches widely used in previous systems and may provide analytical explanations to these approaches. We demonstrate the use of this approach to pan/tilt/zoom (PTZ) camera management with field data.