Publications

By Lynn Wilcox (Clear Search)

2016

Abstract

Close
We previously created the HyperMeeting system to support a chain of geographically and temporally distributed meetings in the form of a hypervideo. This paper focuses on playback plans that guide users through the recorded meeting content by automatically following available hyperlinks. Our system generates playback plans based on users' interests or prior meeting attendance and presents a dialog that lets users select the most appropriate plan. Prior experience with playback plans revealed users' confusion with automatic link following within a sequence of meetings. To address this issue, we designed three timeline visualizations of playback plans. A user study comparing the timeline designs indicated that different visualizations are preferred for different tasks, making switching among them important. The study also provided insights that will guide research of personalized hypervideo, both inside and outside a meeting context.
2015

Abstract

Close
New technology comes about in a number of different ways. It may come from advances in scientific research, through new combinations of existing technology, or by simply from imagining what might be possible in the future. This video describes the evolution of Tabletop Telepresence, a system for remote collaboration through desktop videoconferencing combined with a digital desk. Tabletop Telepresence provides a means to share paper documents between remote desktops, interact with documents and request services (such as translation), and communicate with a remote person through a teleconference. It was made possible by combining advances in camera/projector technology that enable a fully functional digital desk, embodied telepresence in video conferencing and concept art that imagines future workstyles.
Publication Details
  • ACM Multimedia
  • Oct 18, 2015

Abstract

Close
While synchronous meetings are an important part of collaboration, it is not always possible for all stakeholders to meet at the same time. We created the concept of hypermeetings to support meetings with asynchronous attendance. Hypermeetings consist of a chain of video-recorded meetings with hyperlinks for navigating through the video content. HyperMeeting supports the synchronized viewing of prior meetings during a videoconference. Natural viewing behavior such as pausing generates hyperlinks between the previously recorded meetings and the current video recording. During playback, automatic link-following guided by playback plans present the relevant content to users. Playback plans take into account the user's meeting attendance and viewing history and match them with features such as speaker segmentation. A user study showed that participants found hyperlinks useful but did not always understand where they would take them. The study results provide a good basis for future system improvements.
2014
Publication Details
  • International Journal of Multimedia Information Retrieval Special Issue on Cross-Media Analysis
  • Sep 4, 2014

Abstract

Close
Media Embedded Target, or MET, is an iconic mark printed in a blank margin of a page that indicates a media link is associated with a nearby region of the page. It guides the user to capture the region and thus retrieve the associated link through visual search within indexed content. The target also serves to separate page regions with media links from other regions of the page. The capture application on the cell phone displays a sight having the same shape as the target near the edge of a camera-view display. The user moves the phone to align the sight with the target printed on the page. Once the system detects correct sight-target alignment, the region in the camera view is captured and sent to the recognition engine which identifies the image and causes the associated media to be displayed on the phone. Since target and sight alignment defines a capture region, this approach saves storage by only indexing visual features in the predefined capture region, rather than indexing the entire page. Target-sight alignment assures that the indexed region is fully captured. We compare the use of MET for guiding capture with two standard methods: one that uses a logo to indicate that media content is available and text to define the capture region and another that explicitly indicates the capture region using a visible boundary mark.
Publication Details
  • ACM SIGIR International Workshop on Social Media Retrieval and Analysis
  • Jul 11, 2014

Abstract

Close
We examine the use of clustering to identify selfies in a social media user's photos for use in estimating demographic information such as age, gender, and race. Faces are first detected within a user's photos followed by clustering using visual similarity. We define a cluster scoring scheme that uses a combination of within-cluster visual similarity and average face size in a cluster to rank potential selfie-clusters. Finally, we evaluate this ranking approach over a collection of Twitter users and discuss methods that can be used for improving performance in the future.
2011
Publication Details
  • ACM Multimedia 2011
  • Nov 28, 2011

Abstract

Close
Embedded Media Markers (EMMs) are nearly transparent icons printed on paper documents that link to associated digital media. By using the document content for retrieval, EMMs are less visually intrusive than barcodes and other glyphs while still providing an indication for the presence of links. An initial implementation demonstrated good overall performance but exposed difficulties in guaranteeing the creation of unambiguous EMMs. We developed an EMM authoring tool that supports the interactive authoring of EMMs via visualizations that show the user which areas on a page may cause recognition errors and automatic feedback that moves the authored EMM away from those areas. The authoring tool and the techniques it relies on have been applied to corpora with different visual characteristics to explore the generality of our approach.
Publication Details
  • ACM International Conference on Multimedia Retrieval (ICMR)
  • Apr 17, 2011

Abstract

Close
User-generated video from mobile phones, digital cameras, and other devices is increasing, yet people rarely want to watch all the captured video. More commonly, users want a single still image for printing or a short clip from the video for creating a panorama or for sharing. Our interface aims to help users search through video for these images or clips in a more efficient fashion than fast-forwarding or "scrubbing" through a video by dragging through locations on a slider. It is based on a hierarchical structure of keyframes in the video, and combines a novel user interface design for browsing a video segment tree with new algorithms for keyframe selection, segment identification, and clustering. These algorithms take into account the need for quality keyframes and balance the desire for short navigation paths and similarity-based clusters. Our user interface presents keyframe hierarchies and displays visual cues for keeping the user oriented while browsing the video. The system adapts to the task by using a non-temporal clustering algorithm when a the user wants a single image. When the user wants a video clip, the system selects one of two temporal clustering algorithm based on a measure of the repetitiveness of the video. User feedback provided us with valuable suggestions for improvements to our system.
Publication Details
  • IS&T and SPIE International Conference on Multimedia Content Access: Algorithms and Systems
  • Jan 23, 2011

Abstract

Close
This paper describes research activities at FX Palo Alto Laboratory (FXPAL) in the area of multimedia browsing, search, and retrieval. We first consider interfaces for organization and management of personal photo collections. We then survey our work on interactive video search and retrieval. Throughout we discuss the evolution of both the research challenges in these areas and our proposed solutions.
Publication Details
  • Fuji Xerox Technical Report
  • Jan 1, 2011

Abstract

Close
Embedded Media Markers, or simply EMMs, are nearly transparent iconic marks printed on paper documents that signify the existence of media associated with that part of the document. EMMs also guide users' camera operations for media retrieval. Users take a picture of an EMM-signified document patch using a cell phone, and the media associated with the EMM-signified document location is displayed on the phone. Unlike bar codes, EMMs are nearly transparent and thus do not interfere with the document appearance. Retrieval of media associated with an EMM is based on image local features of the captured EMM-signified document patch. This paper describes a technique for semi-automatically placing an EMM at a location in a document, in such a way that it encompasses sufficient identification features with minimal disturbance to the original document.
2010
Publication Details
  • ACM International Conference on Multimodal Interfaces
  • Nov 8, 2010

Abstract

Close
Embedded Media Barcode Links, or simply EMBLs, are optimally blended iconic barcode marks, printed on paper documents, that signify the existence of multimedia associated with that part of the document content (Figure 1). EMBLs are used for multimedia retrieval with a camera phone. Users take a picture of an EMBL-signified document patch using a cell phone, and the multimedia associated with the EMBL-signified document location is displayed on the phone. Unlike a traditional barcode which requires an exclusive space, the EMBL construction algorithm acts as an agent to negotiate with a barcode reader for maximum user and document benefits. Because of this negotiation, EMBLs are optimally blended with content and thus have less interference with the original document layout and can be moved closer to a media associated location. Retrieval of media associated with an EMBL is based on the barcode identification of a captured EMBL. Therefore, EMBL retains nearly all barcode identification advantages, such as accuracy, speed, and scalability. Moreover, EMBL takes advantage of users' knowledge of a traditional barcode. Unlike Embedded Media Maker (EMM) which requires underlying document features for marker identification, EMBL has no requirement for the underlying features. This paper will discuss the procedures for EMBL construction and optimization. It will also give experimental results that strongly support the EMBL construction and optimization ideas.
Publication Details
  • ACM Multimedia 2010
  • Oct 25, 2010

Abstract

Close
An Embedded Media Marker (EMM) is a transparent mark printed on a paper document that signifies the availability of additional media associated with that part of the document. Users take a picture of the EMM using a camera phone, and the media associated with that part of the document is displayed on the phone. Unlike bar codes, EMMs are nearly transparent and thus do not interfere with the document appearance. Retrieval of media associated with an EMM is based on image features of the document within the EMM boundary. Unlike other feature-based retrieval methods, the EMM clearly indicates to the user the existence and type of media associated with the document location. A semi-automatic authoring tool is used to place an EMM at a location in a document, in such a way that it encompasses sufficient identification features with minimal disturbance to the original document. We will demonstrate how to create an EMM-enhanced document, and how the EMM enables access to the associated media on a cell phone.
Publication Details
  • JCDL 2010
  • Jun 21, 2010

Abstract

Close
Photo libraries are growing in quantity and size, requiring better support for locating desired photographs. MediaGLOW is an interactive visual workspace designed to address this concern. It uses attributes such as visual appearance, GPS locations, user-assigned tags, and dates to filter and group photos. An automatic layout algorithm positions photos with similar attributes near each other to support users in serendipitously finding multiple relevant photos. In addition, the system can explicitly select photos similar to specified photos. We conducted a user evaluation to determine the benefit provided by similarity layout and the relative advantages offered by the different layout similarity criteria and attribute filters. Study participants had to locate photos matching probe statements. In some tasks, participants were restricted to a single layout similarity criterion and filter option. Participants used multiple attributes to filter photos. Layout by similarity without additional filters turned out to be one of the most used strategies and was especially beneficial for geographical similarity. Lastly, the relative appropriateness of the single similarity criterion to the probe significantly affected retrieval performance.
Publication Details
  • In Proc. of CHI 2010
  • Apr 10, 2010

Abstract

Close
PACER is a gesture-based interactive paper system that supports fine-grained paper document content manipulation through the touch screen of a cameraphone. Using the phone's camera, PACER links a paper document to its digital version based on visual features. It adopts camera-based phone motion detection for embodied gestures (e.g. marquees, underlines and lassos), with which users can flexibly select and interact with document details (e.g. individual words, symbols and pixels). The touch input is incorporated to facilitate target selection at fine granularity,and to address some limitations of the embodied interaction, such as hand jitter and low input sampling rate. This hybrid interaction is coupled with other techniques such as semi-real time document tracking and loose physical-digital document registration, offering a gesture-based command system. We demonstrate the use of PACER in various scenarios including work-related reading, maps and music score playing. A preliminary user study on the design has produced encouraging user feedback, and suggested future research for better understanding of embodied vs. touch interaction and one vs. two handed interaction.

Abstract

Close
Browsing and searching for documents in large, online enterprise document repositories are common activities. While internet search produces satisfying results for most user queries, enterprise search has not been as successful because of differences in document types and user requirements. To support users in finding the information they need in their online enterprise repository, we created DocuBrowse, a faceted document browsing and search system. Search results are presented within the user-created document hierarchy, showing only directories and documents matching selected facets and containing text query terms. In addition to file properties such as date and file size, automatically detected document types, or genres, serve as one of the search facets. Highlighting draws the user’s attention to the most promising directories and documents while thumbnail images and automatically identified keyphrases help select appropriate documents. DocuBrowse utilizes document similarities, browsing histories, and recommender system techniques to suggest additional promising documents for the current facet and content filters.
Publication Details
  • IUI 2010 Best Paper Award
  • Feb 7, 2010

Abstract

Close
Embedded Media Markers, or simply EMMs, are nearly transparent iconic marks printed on paper documents that signify the existence of media associated with that part of the document. EMMs also guide users' camera operations for media retrieval. Users take a picture of an EMMsignified document patch using a cell phone, and the media associated with the EMM-signified document location is displayed on the phone. Unlike bar codes, EMMs are nearly transparent and thus do not interfere with the document contents. Retrieval of media associated with an EMM is based on image local features of the captured EMMsignified document patch. This paper describes a technique for semi-automatically placing an EMM at a location in a document, in such a way that it encompasses sufficient identification features with minimal disturbance to the original document.
Publication Details
  • Fuji Xerox Technical Report No. 19, pp. 88-100
  • Jan 1, 2010

Abstract

Close
Browsing and searching for documents in large, online enterprise document repositories is an increasingly common problem. While users are familiar and usually satisfied with Internet search results for information, enterprise search has not been as successful because of differences in data types and user requirements. To support users in finding the information they need from electronic and scanned documents in their online enterprise repository, we created an automatic detector for genres such as papers, slides, tables, and photos. Several of those genres correspond roughly to file name extensions but are identified automatically using features of the document. This genre identifier plays an important role in our faceted document browsing and search system. The system presents documents in a hierarchy as typically found in enterprise document collections. Documents and directories are filtered to show only documents matching selected facets and containing optional query terms and to highlight promising directories. Thumbnail images and automatically identified keyphrases help select desired documents.
2009
Publication Details
  • 2009 IEEE International Conference on Multimedia and Expo (ICME)
  • Jun 30, 2009

Abstract

Close

This paper presents a tool and a novel Fast Invariant Transform (FIT) algorithm for language independent e-documents access. The tool enables a person to access an e-document through an informal camera capture of a document hardcopy. It can save people from remembering/exploring numerous directories and file names, or even going through many pages/paragraphs in one document. It can also facilitate people’s manipulation of a document or people’s interactions through documents. Additionally, the algorithm is useful for binding multimedia data to language independent paper documents. Our document recognition algorithm is inspired by the widely known SIFT descriptor [4] but can be computed much more efficiently for both descriptor construction and search. It also uses much less storage space than the SIFT approach. By testing our algorithm with randomly scaled and rotated document pages, we can achieve a 99.73% page recognition rate on the 2188-page ICME06 proceedings and 99.9% page recognition rate on a 504-page Japanese math book.

Publication Details
  • ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 5, Issue 2
  • May 1, 2009

Abstract

Close
Hyper-Hitchcock consists of three components for creating and viewing a form of interactive video called detail-on-demand video: a hypervideo editor, a hypervideo player, and algorithms for automatically generating hypervideo summaries. Detail-on-demand video is a form of hypervideo that supports one hyperlink at a time for navigating between video sequences. The Hyper-Hitchcock editor enables authoring of detail-on-demand video without programming and uses video processing to aid in the authoring process. The Hyper-Hitchcock player uses labels and keyframes to support navigation through and back hyperlinks. Hyper-Hitchcock includes techniques for automatically generating hypervideo summaries of one or more videos that take the form of multiple linear summaries of different lengths with links from the shorter to the longer summaries. User studies on authoring and viewing provided insight into the various roles of links in hypervideo and found that player interface design greatly affects people's understanding of hypervideo structure and the video they access.
Publication Details
  • IUI '09
  • Feb 8, 2009

Abstract

Close
We designed an interactive visual workspace, MediaGLOW, that supports users in organizing personal and shared photo collections. The system interactively places photos with a spring layout algorithm using similarity measures based on visual, temporal, and geographic features. These similarity measures are also used for the retrieval of additional photos. Unlike traditional spring-based algorithms, our approach provides users with several means to adapt the layout to their tasks. Users can group photos in stacks that in turn attract neighborhoods of similar photos. Neighborhoods partition the workspace by severing connections outside the neighborhood. By placing photos into the same stack, users can express a desired organization that the system can use to learn a neighborhood-specific combination of distances.
2008
Publication Details
  • ACM Multimedia 2008
  • Oct 27, 2008

Abstract

Close
This demo introduces a tool for accessing an e-document by capturing one or more images of a real object or document hardcopy. This tool is useful when a file name or location of the file is unknown or unclear. It can save field workers and office workers from remembering/exploring numerous directories and file names. Frequently, it can convert tedious keyboard typing in a search box to a simple camera click. Additionally, when a remote collaborator cannot clearly see an object or a document hardcopy through remote collaboration cameras, this tool can be used to automatically retrieve and send the original e-document to a remote screen or printer.
Publication Details
  • ACM Multimedia
  • Oct 27, 2008

Abstract

Close
Retail establishments want to know about traffic flow and patterns of activity in order to better arrange and staff their business. A large number of fixed video cameras are commonly installed at these locations. While they can be used to observe activity in the retail environment, assigning personnel to this is too time consuming to be valuable for retail analysis. We have developed video processing and visualization techniques that generate presentations appropriate for examining traffic flow and changes in activity at different times of the day. Taking the results of video tracking software as input, our system aggregates activity in different regions of the area being analyzed, determines the average speed of moving objects in the region, and segments time based on significant changes in the quantity and/or location of activity. Visualizations present the results as heat maps to show activity and object counts and average velocities overlaid on the map of the space.
2007

DOTS: Support for Effective Video Surveillance

Publication Details
  • Fuji Xerox Technical Report No. 17, pp. 83-100
  • Nov 1, 2007

Abstract

Close
DOTS (Dynamic Object Tracking System) is an indoor, real-time, multi-camera surveillance system, deployed in a real office setting. DOTS combines video analysis and user interface components to enable security personnel to effectively monitor views of interest and to perform tasks such as tracking a person. The video analysis component performs feature-level foreground segmentation with reliable results even under complex conditions. It incorporates an efficient greedy-search approach for tracking multiple people through occlusion and combines results from individual cameras into multi-camera trajectories. The user interface draws the users' attention to important events that are indexed for easy reference. Different views within the user interface provide spatial information for easier navigation. DOTS, with over twenty video cameras installed in hallways and other public spaces in our office building, has been in constant use for a year. Our experiences led to many changes that improved performance in all system components.

DOTS: Support for Effective Video Surveillance

Publication Details
  • ACM Multimedia 2007, pp. 423-432
  • Sep 24, 2007

Abstract

Close
DOTS (Dynamic Object Tracking System) is an indoor, real-time, multi-camera surveillance system, deployed in a real office setting. DOTS combines video analysis and user interface components to enable security personnel to effectively monitor views of interest and to perform tasks such as tracking a person. The video analysis component performs feature-level foreground segmentation with reliable results even under complex conditions. It incorporates an efficient greedy-search approach for tracking multiple people through occlusion and combines results from individual cameras into multi-camera trajectories. The user interface draws the users' attention to important events that are indexed for easy reference. Different views within the user interface provide spatial information for easier navigation. DOTS, with over twenty video cameras installed in hallways and other public spaces in our office building, has been in constant use for a year. Our experiences led to many changes that improved performance in all system components.
Publication Details
  • CHI 2007, pp. 1167-1176
  • Apr 28, 2007

Abstract

Close
A common video surveillance task is to keep track of people moving around the space being monitored. It is often difficult to track activity between cameras because locations such as hallways in office buildings can look quite similar and do not indicate the spatial proximity of the cameras. We describe a spatial video player that orients nearby video feeds with the field of view of the main playing video to aid in tracking between cameras. This is compared with the traditional bank of cameras with and without interactive maps for identifying and selecting cameras. We additionally explore the value of static and rotating maps for tracking activity between cameras. The study results show that both the spatial video player and the map improve user performance when compared to the camera-bank interface. Also, subjects change cameras more often with the spatial player than either the camera bank or the map, when available.
2006
Publication Details
  • In Proceedings of the fourth ACM International Workshop on Video Surveillance & Sensor Networks VSSN '06, Santa Barbara, CA, pp. 19-26
  • Oct 27, 2006

Abstract

Close
Video surveillance systems have become common across a wide number of environments. While these installations have included more video streams, they also have been placed in contexts with limited personnel for monitoring the video feeds. In such settings, limited human attention, combined with the quantity of video, makes it difficult for security personnel to identify activities of interest and determine interrelationships between activities in different video streams. We have developed applications to support security personnel both in analyzing previously recorded video and in monitoring live video streams. For recorded video, we created storyboard visualizations that emphasize the most important activity as heuristically determined by the system. We also developed an interactive multi-channel video player application that connects camera views to map locations, alerts users to unusual and suspicious video, and visualizes unusual events along a timeline for later replay. We use different analysis techniques to determine unusual events and to highlight them in video images. These tools aid security personnel by directing their attention to the most important activity within recorded video or among several live video streams.
Publication Details
  • UIST 2006 Companion
  • Oct 16, 2006

Abstract

Close
Video surveillance requires keeping the human in the loop. Software can aid security personnel in monitoring and using video. We have developed a set of interface components designed to locate and follow important activity within security video. By recognizing and visualizing localized activity, presenting overviews of activity over time, and temporally and geographically contextualizing video playback, we aim to support security personnel in making use of the growing quantity of security video.
Publication Details
  • UIST 2006 Companion
  • Oct 16, 2006

Abstract

Close
With the growing quantity of security video, it becomes vital that video surveillance software be able to support security personnel in monitoring and tracking activities. We have developed a multi-stream video player that plays recorded and live videos while drawing the users' attention to activity in the video. We will demonstrate the features of the video player and in particular, how it focuses on keeping the human in the loop and drawing their attention to activities in the video.
Publication Details
  • Interactive Video; Algorithms and Technologies Hammoud, Riad (Ed.) 2006, XVI, 250 p., 109 illus., Hardcover.
  • Jun 7, 2006

Abstract

Close
This chapter describes tools for browsing and searching through video to enable users to quickly locate video passages of interest. Digital video databases containing large numbers of video programs ranging from several minutes to several hours in length are becoming increasingly common. In many cases, it is not sufficient to search for relevant videos, but rather to identify relevant clips, typically less than one minute in length, within the videos. We offer two approaches for finding information in videos. The first approach provides an automatically generated interactive multi-level summary in the form of a hypervideo. When viewing a sequence of short video clips, the user can obtain more detail on the clip being watched. For situations where browsing is impractical, we present a video search system with a flexible user interface that incorporates dynamic visualizations of the underlying multimedia objects. The system employs automatic story segmentation, and displays the results of text and image-based queries in ranked sets of story summaries. Both approaches help users to quickly drill down to potentially relevant video clips and to determine the relevance by visually inspecting the material.
2005
Publication Details
  • INTERACT 2005, LNCS 3585, pp. 781-794
  • Sep 12, 2005

Abstract

Close
A video database can contain a large number of videos ranging from several minutes to several hours in length. Typically, it is not sufficient to search just for relevant videos, because the task still remains to find the relevant clip, typically less than one minute of length, within the video. This makes it important to direct the users attention to the most promising material and to indicate what material they already investigated. Based on this premise, we created a video search system with a powerful and flexible user interface that incorporates dynamic visualizations of the underlying multimedia objects. The system employes an automatic story segmentation, combines text and visual search, and displays search results in ranked sets of story keyframe collages. By adapting the keyframe collages based on query relevance and indicating which portions of the video have already been explored, we enable users to quickly find relevant sections. We tested our system as part of the NIST TRECVID interactive search evaluation, and found that our user interface enabled users to find more relevant results within the allotted time than other systems employing more sophisticated analysis techniques but less helpful user interfaces.
Publication Details
  • Sixteenth ACM Conference on Hypertext and Hypermedia
  • Sep 6, 2005

Abstract

Close
Hyper-Hitchcock is a hypervideo editor enabling the direct manipulation authoring of a particular form of hypervideo called "detail-on-demand video." This form of hypervideo allows a single link out of the currently playing video to provide more details on the content currently being presented. The editor includes a workspace to select, group, and arrange video clips into several linear sequences. Navigational links placed between the video elements are assigned labels and return behaviors appropriate to the goals of the hypervideo and the role of the destination video. Hyper-Hitchcock was used by students in a Computers and New Media class to author hypervideos on a variety of topics. The produced hypervideos provide examples of hypervideo structures and the link properties and behaviors needed to support them. Feedback from students identified additional link behaviors and features required to support new hypervideo genres. This feedback is valuable for the redesign of Hyper-Hitchcock and the design of hypervideo editors in general.
Publication Details
  • ACM Transactions on Multimedia Computing, Communications, and Applications
  • Aug 8, 2005

Abstract

Close
Organizing digital photograph collections according to events such as holiday gatherings or vacations is a common practice among photographers. To support photographers in this task, we present similarity-based methods to cluster digital photos by time and image content. The approach is general, unsupervised, and makes minimal assumptions regarding the structure or statistics of the photo collection. We present several variants of an automatic unsupervised algorithm to partition a collection of digital photographs based either on temporal similarity alone, or on temporal and content-based similarity. First, inter-photo similarity is quantified at multiple temporal scales to identify likely event clusters. Second, the final clusters are determined according to one of three clustering goodness criteria. The clustering criteria trade off computational complexity and performance. We also describe a supervised clustering method based on learning vector quantization. Finally, we review the results of an experimental evaluation of the proposed algorithms and existing approaches on two test collections.
Publication Details
  • International Conference on Image and Video Retrieval 2005
  • Jul 21, 2005

Abstract

Close
Large video collections present a unique set of challenges to the search system designer. Text transcripts do not always provide an accurate index to the visual content, and the performance of visually based semantic extraction techniques is often inadequate for search tasks. The searcher must be relied upon to provide detailed judgment of the relevance of specific video segments. We describe a video search system that facilitates this user task by efficiently presenting search results in semantically meaningful units to simplify exploration of query results and query reformulation. We employ a story segmentation system and supporting user interface elements to effectively present query results at the story level. The system was tested in the 2004 TRECVID interactive search evaluations with very positive results.
Publication Details
  • CHI 2005 Extended Abstracts, ACM Press, pp. 1395-1398
  • Apr 1, 2005

Abstract

Close
We present a search interface for large video collections with time-aligned text transcripts. The system is designed for users such as intelligence analysts that need to quickly find video clips relevant to a topic expressed in text and images. A key component of the system is a powerful and flexible user interface that incorporates dynamic visualizations of the underlying multimedia objects. The interface displays search results in ranked sets of story keyframe collages, and lets users explore the shots in a story. By adapting the keyframe collages based on query relevance and indicating which portions of the video have already been explored, we enable users to quickly find relevant sections. We tested our system as part of the NIST TRECVID interactive search evaluation, and found that our user interface enabled users to find more relevant results within the allotted time than those of many systems employing more sophisticated analysis techniques.
2004
Publication Details
  • UIST 2004 Companion, pp. 37-38
  • Oct 24, 2004

Abstract

Close
As the size of the typical personal digital photo collection reaches well into the thousands or photos, advanced tools to manage these large collections are more and more necessary. In this demonstration, we present a semi-automatic approach that opportunistically takes advantage of the current state-of-the-art technology in face detection and recognition and combines it with user interface techniques to facilitate the task of labeling people in photos. We show how we use an accurate face detector to automatically extract faces from photos. Instead of having a less accurate face recognizer classify faces, we use it to sort faces by their similarity to a face model. We demonstrate our photo application that uses the extracted faces as UI proxies for actions on the underlying photos along with the sorting strategy to identify candidate faces for quick and easy face labeling.
Publication Details
  • Proceedings of the International Workshop on Multimedia Information Retrieval, ACM Press, pp. 99-106
  • Oct 10, 2004

Abstract

Close
With digital still cameras, users can easily collect thousands of photos. We have created a photo management application with the goal of making photo organization and browsing simple and quick, even for very large collections. A particular concern is the management of photos depicting people. We present a semi-automatic approach designed to facilitate the task of labeling photos with people that opportunistically takes advantage of the strengths of current state-of-the-art technology in face detection and recognition. In particular, an accurate face detector is used to automatically extract faces from photos while the less accurate face recognizer is used not to classify the detected faces, but to sort faces by their similarity to a chosen model. This sorting is used to present candidate faces within a user interface designed for quick and easy face labeling. We present results of a simulation of the usage model that demonstrate the improved ease that is achieved by our method.
Publication Details
  • Proceedings of the Working Conference on Advanced Visual Interfaces, AVI 2004, pp. 290-297
  • May 25, 2004

Abstract

Close
We introduced detail-on-demand video as a simple type of hypervideo that allows users to watch short video segments and to follow hyperlinks to see additional detail. Such video lets users quickly access desired information without having to view the entire contents linearly. A challenge for presenting this type of video is to provide users with the appropriate affordances to understand the hypervideo structure and to navigate it effectively. Another challenge is to give authors tools that allow them to create good detail-on-demand video. Guided by user feedback, we iterated designs for a detail-on-demand video player. We also conducted two user studies to gain insight into people's understanding of hypervideo and to improve the user interface. We found that the interface design was tightly coupled to understanding hypervideo structure and that different designs greatly affected what parts of the video people accessed. The studies also suggested new guidelines for hypervideo authoring.

MiniMedia Surfer: Browsing Video Segments on Small Displays

Publication Details
  • CHI 2004 short paper
  • Apr 27, 2004

Abstract

Close
It is challenging to browse multimedia on mobile devices with small displays. We present MiniMedia Surfer, a prototype application for interactively searching a multimedia collection for video segments of interest. Transparent layers are used to support browsing subtasks: keyword query, exploration of results through keyframes, and playback of video. This layered interface smoothly blends the key tasks of the browsing process and deals with the small screen size. During exploration, the user can adjust the transparency levels of the layers using pen gestures. Details of the video segments are displayed in an expandable timeline that supports gestural interaction.
2003
Publication Details
  • Proc. ACM Multimedia 2003. pp. 364-373
  • Nov 1, 2003

Abstract

Close
We present similarity-based methods to cluster digital photos by time and image content. The approach is general, unsupervised, and makes minimal assumptions regarding the structure or statistics of the photo collection. We present results for the algorithm based solely on temporal similarity, and jointly on temporal and content-based similarity. We also describe a supervised algorithm based on learning vector quantization. Finally, we include experimental results for the proposed algorithms and several competing approaches on two test collections.
Publication Details
  • Proc. ACM Multimedia 2003, pp. 546-554
  • Nov 1, 2003

Abstract

Close
We present a system that allows remote and local participants to control devices in a meeting environment using mouse or pen based gestures "through" video windows. Unlike state-of-the-art device control interfaces that require interaction with text commands, buttons, or other artificial symbols, our approach allows users to interact with devices through live video of the environment. This naturally extends our video supported pan/tilt/zoom (PTZ) camera control system, by allowing gestures in video windows to control not only PTZ cameras, but also other devices visible in video images. For example, an authorized meeting participant can show a presentation on a screen by dragging the file on a personal laptop and dropping it on the video image of the presentation screen. This paper presents the system architecture, implementation tradeoffs, and various meeting control scenarios.
Publication Details
  • Proc. ACM Multimedia 2003. pp. 92-93
  • Nov 1, 2003

Abstract

Close
To simplify the process of editing interactive video, we developed the concept of "detail-on-demand" video as a subset of general hypervideo. Detail-on-demand video keeps the authoring and viewing interfaces relatively simple while supporting a wide range of interactive video applications. Our editor, Hyper-Hitchcock, provides a direct manipulation environment in which authors can combine video clips and place hyperlinks between them. To summarize a video, Hyper-Hitchcock can also automatically generate a hypervideo composed of multiple video summary levels and navigational links between these summaries and the original video. Viewers may interactively select the amount of detail they see, access more detailed summaries, and navigate to the source video through the summary.
Publication Details
  • Proc. ACM Multimedia 2003. pp. 392-401
  • Nov 1, 2003

Abstract

Close
In this paper, we describe how a detail-on-demand representation for interactive video is used in video summarization. Our approach automatically generates a hypervideo composed of multiple video summary levels and navigational links between these summaries and the original video. Viewers may interactively select the amount of detail they see, access more detailed summaries, and navigate to the source video through the summary. We created a representation for interactive video that supports a wide range of interactive video applications and Hyper-Hitchcock, an editor and player for this type of interactive video. Hyper-Hitchcock employs methods to determine (1) the number and length of levels in the hypervideo summary, (2) the video clips for each level in the hypervideo, (3) the grouping of clips into composites, and (4) the links between elements in the summary. These decisions are based on an inferred quality of video segments and temporal relations those segments.

Detail-on-Demand Hypervideo

Publication Details
  • Proc. ACM Multimedia 2003. pp. 600-601
  • Nov 1, 2003

Abstract

Close
We demonstrate the use of detail-on-demand hypervideo in interactive training and video summarization. Detail-on-demand video allows viewers to watch short video segments and to follow hyperlinks to see additional detail. The player for detail-ondemand video displays keyframes indicating what links are available at each point in the video. The Hyper-Hitchcock authoring tool helps users create hypervideo by automatically dividing video into clips that can be combined in a direct manipulation interface. Clips can be grouped into composites and hyperlinks can be placed between clips and composites. A summarization algorithm creates multi-level hypervideo summaries from linear video by automatically selecting clips and placing links between them.
Publication Details
  • SPIE Information Technologies and Communications
  • Sep 9, 2003

Abstract

Close
Hypervideo is a form of interactive video that allows users to follow links to other video. A simple form of hypervideo, called "detail-on-demand video," provides at most one link from one segment of video to another, supporting a singlebutton interaction. Detail-on-demand video is well suited for interactive video summaries, because the user can request a more detailed summary while watching the video. Users interact with the video is through a special hypervideo player that displays keyframes with labels indicating when a link is available. While detail-on-demand summaries can be manually authored, it is a time-consuming task. To address this issue, we developed an algorithm to automatically generate multi-level hypervideo summaries. The highest level of the summary consists of the most important clip from each take or scene in the video. At each subsequent level, more clips from each take or scene are added in order of their importance. We give one example in which a hypervideo summary is created for a linear training video. We also show how the algorithm can be modified to produce a hypervideo summary for home video.
Publication Details
  • Human-Computer Interaction INTERACT '03, IOS Press, pp. 33-40
  • Sep 1, 2003

Abstract

Close
To simplify the process of editing interactive video, we developed the concept of "detail-on-demand" video as a subset of general hypervideo where a single button press reveals additional information about the current video sequence. Detail-on-demand video keeps the authoring and viewing interfaces relatively simple while supporting a wide range of interactive video applications. Our editor, Hyper-Hitchcock, builds on prior work on automatic analysis to find the best quality video clips. It introduces video composites as an abstraction for grouping and manipulating sets of video clips. Navigational links can be created between any two video clips or composites. Such links offer a variety of return behaviors for when the linked video is completed that can be tailored to different materials. Initial impressions from a pilot study indicate that Hyper-Hitchcock is easy to learn although the behavior of links is not immediately intuitive for all users.
Publication Details
  • Human-Computer Interaction INTERACT '03, IOS Press, pp. 196-203
  • Sep 1, 2003

Abstract

Close
With digital still cameras, users can easily collect thousands of photos. Our goal is to make organizing and browsing photos simple and quick, while retaining scalability to large collections. To that end, we created a photo management application concentrating on areas that improve the overall experience without neglecting the mundane components of such an application. Our application automatically divides photos into meaningful events such as birthdays or trips. Several user interaction mechanisms enhance the user experience when organizing photos. Our application combines a light table for showing thumbnails of the entire photo collection with a tree view that supports navigating, sorting, and filtering photos by categories such as dates, events, people, and locations. A calendar view visualizes photos over time and allows for the quick assignment of dates to scanned photos. We fine-tuned our application by using it with large personal photo collections provided by several users.
Publication Details
  • Proceedings of Hypertext '03, pp. 124-125
  • Aug 26, 2003

Abstract

Close
Existing hypertext systems have emphasized either the navigational or spatial expression of relationships between objects. We are exploring the combination of these modes of expression in Hyper-Hitchcock, a hypervideo editor. Hyper-Hitchcock supports a form of hypervideo called "detail-on-demand video" due to its applicability to situations where viewers need to take a link to view more details on the content currently being presented. Authors of detail-on-demand video select, group, and spatially arrange video clips into linear sequences in a two-dimensional workspace. Hyper-Hitchcock uses a simple spatial parser to determine the temporal order of selected video clips. Authors add navigational links between the elements in those sequences. This combination of navigational and spatial hypertext modes of expression separates the clip sequence from the navigational structure of the hypervideo. Such a combination can be useful in cases where multiple forms of inter-object relationships must be expressed on the same content.
Publication Details
  • IEEE International Conference on Multimedia and Expo, v. II, pp. 753-756
  • Jul 7, 2003

Abstract

Close
We created an alternative approach to existing video summaries that gives viewers control over the summaries by selecting hyperlinks to other video with additional information. We structure such summaries as "detail-on-demand" video, a subset of general hypervideo in which at most one link to another video sequence is available at any given time. Our editor for such video, Hyper-Hitchcock, provides a workspace in which an author can select and arrange video clips, generate composites from clips and from other composites, and place links between composites. To simplify dealing with a large number of clips, Hyper-Hitchcock generates iconic representations for composites that can be used to manipulate the composite as a whole. In addition to providing an authoring environment, Hyper-Hitchcock can automatically generate multi-level hypervideo summaries for immediate use or as the starting point for author modification.