Publications

FXPAL publishes in top scientific conferences and journals.

2008
Publication Details
  • IADIS e-Learning 2008
  • Jul 22, 2008

Abstract

Close
While researchers have been exploring automatic presentation capture since the 1990's, real world adoption has been limited. Our research focuses on simplifying presentation capture and retrieval to reduce adoption barriers. ProjectorBox is our attempt to create a smart appliance that seamlessly captures, indexes, and archives presentation media, with streamlined user interfaces for searching, skimming, and sharing content. In this paper we describe the design of ProjectorBox and compare its use across corporate and educational settings. While our evaluation confirms the usability and utility of our approach across settings, it also highlights differences in usage and user needs, suggesting enhancements for both markets. We describe new features we have implemented to address corporate needs for enhanced privacy and security, and new user interfaces for content discovery.
Publication Details
  • SIGIR 2008. (Singapore, Singapore, July 20 - 24, 2008). ACM, New York, NY, 315-322. Best Paper Award.
  • Jul 22, 2008

Abstract

Close
We describe a new approach to information retrieval: algorithmic mediation for intentional, synchronous collabo- rative exploratory search. Using our system, two or more users with a common information need search together, simultaneously. The collaborative system provides tools, user interfaces and, most importantly, algorithmically-mediated retrieval to focus, enhance and augment the team's search and communication activities. Collaborative search outperformed post hoc merging of similarly instrumented single user runs. Algorithmic mediation improved both collaborative search (allowing a team of searchers to nd relevant in- formation more efficiently and effectively), and exploratory search (allowing the searchers to find relevant information that cannot be found while working individually).
Publication Details
  • ACM Conf. on Image and Video Retrieval (CIVR) 2008
  • Jul 7, 2008

Abstract

Close
We have developed an interactive video search system that allows the searcher to rapidly assess query results and easily pivot on those results to form new queries. The system is intended to maximize the use of the discriminative power of the human searcher. This is accomplished by providing a hierarchical segmentation, streamlined interface, and redundant visual cues throughout. The typical search scenario includes a single searcher with the ability to search with text and content-based queries. In this paper, we evaluate new variations on our basic search system. In particular we test the system using only visual content-based search capabilities, and using paired searchers in a realtime collaboration. We present analysis and conclusions from these experiments.

FXPAL Collaborative Exploratory Video Search System

Publication Details
  • CIVR 2008 VideOlympics (Demo)
  • Jul 7, 2008

Abstract

Close
This paper describes FXPAL's collaborative, exploratory interactive video search application. We introduce a new approach to information retrieval: algorithmic mediation in support of intentional, synchronous collaborative exploratory search. Using our system, two or more users with a common information need search together, simultaneously. The collaborative system provides tools, user interfaces and, most importantly, algorithmically-mediated retrieval to focus, enhance and augment the team's search and communication activities.

Collaborative Information Seeking in Electronic Environments

Publication Details
  • Information Seeking Support Systems Workshop. An Invitational Workshop Sponsored by the National Science Foundation. Available online at http://www.ils.unc.edu/ISSS/
  • Jun 26, 2008

Abstract

Close
Collaboration in information seeking, while common in practice, is just being recognized as an important research area. Several studies have documented various collaboration strategies that people have adopted (and adapted), and some initial systems have been built. This field is in its infancy, however. We need to understand which real-world tasks are best suited for collaborative work. We need to extend models of information seeking to accommodate explicit and implicit collaboration. We need to invent a suite of algorithms to mediate search activities. We need to devise evaluation metrics that take into account multiple people's contributions to search.
Publication Details
  • IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2008
  • Jun 24, 2008

Abstract

Close
Current approaches to pose estimation and tracking can be classified into two categories: generative and discriminative. While generative approaches can accurately determine human pose from image observations, they are computationally intractable due to search in the high dimensional human pose space. On the other hand, discriminative approaches do not generalize well, but are computationally efficient. We present a hybrid model that combines the strengths of the two in an integrated learning and inference framework. We extend the Gaussian process latent variable model (GPLVM) to include an embedding from observation space (the space of image features) to the latent space. GPLVM is a generative model, but the inclusion of this mapping provides a discriminative component, making the model observation driven. Observation Driven GPLVM (OD-GPLVM) not only provides a faster inference approach, but also more accurate estimates (compared to GPLVM) in cases where dynamics are not sufficient for the initialization of search in the latent space. We also extend OD-GPLVM to learn and estimate poses from parameterized actions/gestures. Parameterized gestures are actions which exhibit large systematic variation in joint angle space for different instances due to difference in contextual variables. For example, the joint angles in a forehand tennis shot are function of the height of the ball (Figure 2). We learn these systematic variations as a function of the contextual variables. We then present an approach to use information from scene/object to provide context for human pose estimation for such parameterized actions.

Vital Sign Estimation from Passive Thermal Video

Publication Details
  • IEEE Computer Society Conference on Computer Vision and Pattern Recognition
  • Jun 24, 2008

Abstract

Close
Conventional wired detection of vital signs limits the use of these important physiological parameters by many applications, such as airport health screening, elder care, and workplace preventive care. In this paper, we explore contact-free heart rate and respiratory rate detection through measuring infrared light modulation emitted near superficial blood vessels or a nasal area respectively. To deal with complications caused by subjects' movements, facial expressions, and partial occlusions of the skin, we propose a novel algorithm based on contour segmentation and tracking, clustering of informative pixels, and dominant frequency component estimation. The proposed method achieves robust subject regions-of-interest alignment and motion compensation in infrared video with low SNR. It relaxes some strong assumptions used in previous work and substantially improves on previously reported performance. Preliminary experiments on heart rate estimation for 20 subjects and respiratory rate estimation for 8 subjects exhibit promising results.

1st International Workshop on Collaborative Information Retrieval

Publication Details
  • JCDL 2008
  • Jun 20, 2008

Abstract

Close
Explicit support for collaboration is becoming increasingly important for certain kinds of collection-building activities in digital libraries. In the last few years, several research groups have also pursued various issues related to collaboration during search [4][5][6]. We can represent collaboration in search on two dimensions - synchrony and intent. Asynchronous collaboration means that people are not working on the same problem simultaneously; implicit collaboration occurs when the system uses information from others' use of the system to inform new searches, but does not guarantee consistency of search goals. In this workshop, we are concerned with the top-left quadrant of Figure 1 that represents small groups of people working toward a common goal at the same time. These synchronous, explicit collaborations could occur amongst remotely situated users, each with their own computers, or amongst a co-located group sharing devices; these spatial configurations add yet another dimension to be considered when designing collaborative search systems.
Publication Details
  • 1st International Workshop on Collaborative Information Retrieval. JCDL 2008.
  • Jun 20, 2008

Abstract

Close
People can help other people find information in networked information seeking environments. Recently, many such systems and algorithms have proliferated in industry and in academia. Unfortunately, it is difficult to compare the systems in meaningful ways because they often define collaboration in different ways. In this paper, we propose a model of possible kinds of collaboration, and illustrate it with examples from literature. The model contains four dimensions: intent, concurrency, depth and location. This model can be used to classify existing systems and to suggest possible opportunities for design in this space.

Simple and Effective Defense Against Evil Twin Access Points

Publication Details
  • Proceedings ACM WiSec, pp. 220-235, 2008
  • Mar 31, 2008

Abstract

Close
Wireless networking is becoming widespread in many public places such as cafes. Unsuspecting users may become victims of attacks based on ``evil twin'' access points. These rogue access points are operated by criminals in an attempt to launch man-in-the-middle attacks. We present a simple protection mechanism against binding to an evil twin. The mechanism leverages short authentication string protocols for the exchange of cryptographic keys. The short string verification is performed by encoding the short strings as a sequence of colors, rendered sequentially by the user's device and by the designated access point of the cafe. The access point must have a light capable of showing two colors and must be mounted prominently in a position where users can have confidence in its authenticity. We conducted a usability study with patrons in several cafes and participants found our protection mechanism very usable.
Publication Details
  • TRECVid 2007
  • Mar 1, 2008

Abstract

Close
In 2007 FXPAL submitted results for two tasks: rushes summarization and interactive search. The rushes summarization task has been described at the ACM Multimedia workshop. Interested readers are referred to that publication for details. We describe our interactive search experiments in this notebook paper.

Exiting the Cleanroom: On Ecological Validity and Ubiquitous Computing

Publication Details
  • Human-Computer Interaction Journal
  • Feb 15, 2008

Abstract

Close
Over the past decade and a half, corporations and academies have invested considerable time and money in the realization of ubiquitous computing. Yet design approaches that yield ecologically valid understandings of ubiquitous computing systems, which can help designers make design decisions based on how systems perform in the context of actual experience, remain rare. The central question underlying this paper is: what barriers stand in the way of real-world, ecologically valid design for ubicomp? Using a literature survey and interviews with 28 developers, we illustrate how issues of sensing and scale cause ubicomp systems to resist iteration, prototype creation, and ecologically valid evaluation. In particular, we found that developers have difficulty creating prototypes that are both robust enough for realistic use and able to handle ambiguity and error, and that they struggle to gather useful data from evaluations either because critical events occur infrequently, because the level of use necessary to evaluate the system is difficult to maintain, or because the evaluation itself interferes with use of the system. We outline pitfalls for developers to avoid as well as practical solutions, and we draw on our results to outline research challenges for the future. Crucially, we do not argue for particular processes, sets of metrics, or intended outcomes but rather focus on prototyping tools and evaluation methods that support realistic use in realistic settings that can be selected according to the needs and goals of a particular developer or researcher.
2007
Publication Details
  • The 3rd International Conference on Collaborative Computing: Networking, Applications and Worksharing
  • Nov 12, 2007

Abstract

Close
This paper summarizes our environment-image/videosupported collaboration technologies developed in the past several years. These technologies use environment images and videos as active interfaces and use visual cues in these images and videos to orient device controls, annotations and other information access. By using visual cues in various interfaces, we expect to make the control interface more intuitive than buttonbased control interfaces and command-based interfaces. These technologies can be used to facilitate high-quality audio/video capture with limited cameras and microphones. They can also facilitate multi-screen presentation authoring and playback, teleinteraction, environment manipulation with cell phones, and environment manipulation with digital pens.

Collaborative Exploratory Search

Publication Details
  • HCIR 2007, Boston, Massachusetts (HCIR = Human Computer Interaction and Information Retrieval)
  • Nov 2, 2007

Abstract

Close
We propose to mitigate the deficiencies of correlated search with collaborative search, that is, search in which a small group of people shares a common information need and actively (and synchronously) collaborates to achieve it. Furthermore, we propose a system architecture that mediates search activity of multiple people by combining their inputs and by specializing results delivered to them to take advantage of their skills and knowledge.

DOTS: Support for Effective Video Surveillance

Publication Details
  • Fuji Xerox Technical Report No. 17, pp. 83-100
  • Nov 1, 2007

Abstract

Close
DOTS (Dynamic Object Tracking System) is an indoor, real-time, multi-camera surveillance system, deployed in a real office setting. DOTS combines video analysis and user interface components to enable security personnel to effectively monitor views of interest and to perform tasks such as tracking a person. The video analysis component performs feature-level foreground segmentation with reliable results even under complex conditions. It incorporates an efficient greedy-search approach for tracking multiple people through occlusion and combines results from individual cameras into multi-camera trajectories. The user interface draws the users' attention to important events that are indexed for easy reference. Different views within the user interface provide spatial information for easier navigation. DOTS, with over twenty video cameras installed in hallways and other public spaces in our office building, has been in constant use for a year. Our experiences led to many changes that improved performance in all system components.
Publication Details
  • UIST 2007 Poster & Demo
  • Oct 7, 2007

Abstract

Close
We are exploring the use of collaborative games to generate meaningful textual tags for photos. We have designed Pho-toPlay to take advantage of the social engagement typical of board games and provide a collocated ludic environment conducive to the creation of text tags. We evaluated Photo-Play and found that it was fun and socially engaging for players. The milieu of the game also facilitated playing with personal photos, which resulted in more specific tags such as named entities than when playing with randomly selected online photos. Players also had a preference for playing with personal photos.
Publication Details
  • TRECVID Video Summarization Workshop at ACM Multimedia 2007
  • Sep 28, 2007

Abstract

Close
This paper describes a system for selecting excerpts from unedited video and presenting the excerpts in a short sum- mary video for eciently understanding the video contents. Color and motion features are used to divide the video into segments where the color distribution and camera motion are similar. Segments with and without camera motion are clustered separately to identify redundant video. Audio fea- tures are used to identify clapboard appearances for exclu- sion. Representative segments from each cluster are selected for presentation. To increase the original material contained within the summary and reduce the time required to view the summary, selected segments are played back at a higher rate based on the amount of detected camera motion in the segment. Pitch-preserving audio processing is used to bet- ter capture the sense of the original audio. Metadata about each segment is overlayed on the summary to help the viewer understand the context of the summary segments in the orig- inal video.
Publication Details
  • ICDSC 2007, pp. 132-139
  • Sep 25, 2007

Abstract

Close
Our analysis and visualization tools use 3D building geometry to support surveillance tasks. These tools are part of DOTS, our multicamera surveillance system; a system with over 20 cameras spread throughout the public spaces of our building. The geometric input to DOTS is a floor plan and information such as cubicle wall heights. From this input we construct a 3D model and an enhanced 2D floor plan that are the bases for more specific visualization and analysis tools. Foreground objects of interest can be placed within these models and dynamically updated in real time across camera views. Alternatively, a virtual first-person view suggests what a tracked person can see as she moves about. Interactive visualization tools support complex camera-placement tasks. Extrinsic camera calibration is supported both by visualizations of parameter adjustment results and by methods for establishing correspondences between image features and the 3D model.

DOTS: Support for Effective Video Surveillance

Publication Details
  • ACM Multimedia 2007, pp. 423-432
  • Sep 24, 2007

Abstract

Close
DOTS (Dynamic Object Tracking System) is an indoor, real-time, multi-camera surveillance system, deployed in a real office setting. DOTS combines video analysis and user interface components to enable security personnel to effectively monitor views of interest and to perform tasks such as tracking a person. The video analysis component performs feature-level foreground segmentation with reliable results even under complex conditions. It incorporates an efficient greedy-search approach for tracking multiple people through occlusion and combines results from individual cameras into multi-camera trajectories. The user interface draws the users' attention to important events that are indexed for easy reference. Different views within the user interface provide spatial information for easier navigation. DOTS, with over twenty video cameras installed in hallways and other public spaces in our office building, has been in constant use for a year. Our experiences led to many changes that improved performance in all system components.
Publication Details
  • IEEE Intl. Conf. on Semantic Computing
  • Sep 17, 2007

Abstract

Close
We present methods for semantic annotation of multimedia data. The goal is to detect semantic attributes (also referred to as concepts) in clips of video via analysis of a single keyframe or set of frames. The proposed methods integrate high performance discriminative single concept detectors in a random field model for collective multiple concept detection. Furthermore, we describe a generic framework for semantic media classification capable of capturing arbitrary complex dependencies between the semantic concepts. Finally, we present initial experimental results comparing the proposed approach to existing methods.
Publication Details
  • Workshop at Ubicomp 2007
  • Sep 16, 2007

Abstract

Close
The past two years at UbiComp, our workshops on design and usability in next generation conference rooms engendered lively conversations in the community of people working in smart environments. The community is clearly vital and growing. This year we would like to build on the energy from previous workshops while taking on a more interactive and exploratory format. The theme for this workshop is "embodied meeting support" and includes three tracks: mobile interaction, tangible interaction, and sensing in smart environments. We encourage participants to present work that focuses on one track or that attempts to bridge multiple tracks.
Publication Details
  • ACM Conf. on Image and Video Retrieval 2007
  • Jul 29, 2007

Abstract

Close
This paper describes FXPAL's interactive video search application, "MediaMagic". FXPAL has participated in the TRECVID interactive search task since 2004. In our search application we employ a rich set of redundant visual cues to help the searcher quickly sift through the video collection. A central element of the interface and underlying search engine is a segmentation of the video into stories, which allows the user to quickly navigate and evaluate the relevance of moderately-sized, semantically-related chunks.
Publication Details
  • ICME 2007
  • Jul 2, 2007

Abstract

Close
The recent emergence of multi-core processors enables a new trend in the usage of computers. Computer vision applications, which require heavy computation and lots of bandwidth, usually cannot run in real-time. Recent multi-core processors can potentially serve the needs of such workloads. In addition, more advanced algorithms can be developed utilizing the new computation paradigm. In this paper, we study the performance of an articulated body tracker on multi-core processors. The articulated body tracking workload encapsulates most of the important aspects of a computer vision workload. It takes multiple camera inputs of a scene with a single human object, extracts useful features, and performs statistical inference to find the body pose. We show the importance of properly parallelizing the workload in order to achieve great performance: speedups of 26 on 32 cores. We conclude that: (1) data-domain parallelization is better than function-domain parallelization for computer vision applications; (2) data-domain parallelism by image regions and particles is very effective; (3) reducing serial code in edge detection brings significant performance improvements; (4) domain knowledge about low/mid/high level of vision computation is helpful in parallelizing the workload.

Featured Wand for 3D Interaction

Publication Details
  • ICME 2007
  • Jul 2, 2007

Abstract

Close
Our featured wand, automatically tracked by video cameras, provides an inexpensive and natural way for users to interact with devices such as large displays. The wand supports six degrees of freedom for manipulation of 3D applications like Google Earth. Our system uses a 'line scan' to estimate the wand pose tracking which simplifies processing. Several applications are demonstrated.
Publication Details
  • ICME 2007, pp. 1015-1018
  • Jul 2, 2007

Abstract

Close
We describe a new interaction technique that allows users to control nonlinear video playback by directly manipulating objects seen in the video. This interaction technique is simi-lar to video "scrubbing" where the user adjusts the playback time by moving the mouse along a slider. Our approach is superior to variable-scale scrubbing in that the user can con-centrate on interesting objects and does not have to guess how long the objects will stay in view. Our method relies on a video tracking system that tracks objects in fixed cameras, maps them into 3D space, and handles hand-offs between cameras. In addition to dragging objects visible in video windows, users may also drag iconic object representations on a floor plan. In that case, the best video views are se-lected for the dragged objects.