Publications

By Andreas Girgensohn (Clear Search)

2003
Publication Details
  • Proc. IEEE Intl. Conf. on Image Processing
  • Sep 14, 2003

Abstract

Close
We present similarity-based methods to cluster digital photos by time and image content. This approach is general, unsupervised, and makes minimal assumptions regarding the structure or statistics of the photo collection. We describe versions of the algorithm using temporal similarity with and without content-based similarity, and compare the algorithms with existing techniques, measured against ground-truth clusters created by humans.
Publication Details
  • SPIE Information Technologies and Communications
  • Sep 9, 2003

Abstract

Close
Hypervideo is a form of interactive video that allows users to follow links to other video. A simple form of hypervideo, called "detail-on-demand video," provides at most one link from one segment of video to another, supporting a singlebutton interaction. Detail-on-demand video is well suited for interactive video summaries, because the user can request a more detailed summary while watching the video. Users interact with the video is through a special hypervideo player that displays keyframes with labels indicating when a link is available. While detail-on-demand summaries can be manually authored, it is a time-consuming task. To address this issue, we developed an algorithm to automatically generate multi-level hypervideo summaries. The highest level of the summary consists of the most important clip from each take or scene in the video. At each subsequent level, more clips from each take or scene are added in order of their importance. We give one example in which a hypervideo summary is created for a linear training video. We also show how the algorithm can be modified to produce a hypervideo summary for home video.

The Plasma Poster Network: Posting Multimedia Content in Public Places

Publication Details
  • Human-Computer Interaction INTERACT '03, IOS Press, pp. 599-606
  • Sep 1, 2003

Abstract

Close
Much effort has been expended in creating online information resources to foster social networks, create synergies between collocated and remote colleagues, and enhance social capital within organizations. Following the observation that physical bulletin boards serve an important community building and maintenance function, in this paper we describe a network of large screen, digital bulletin boards, the Plasma Poster Network. The function of this system is to bridge the gap between online community interactions and shared physical spaces. We describe our motivation, a fieldwork study of information sharing practices within our organization, and an internal deployment of Plasma Posters.

Weaving Between Online and Offline Community Participation

Publication Details
  • Human-Computer Interaction INTERACT '03, IOS Press, pp. 729-732
  • Sep 1, 2003

Abstract

Close
Much effort has been expended in creating online spaces for people to meet, network, share and organize. However, there is relatively little work, in comparison, that has addressed creating awareness of online community activities for those gathered together physically. We describe our efforts to advertise the online community spaces of CHIplace and CSCWplace using large screen, interactive bulletin boards that show online community information mixed with content generated at the conference itself. Our intention was to raise awareness of the online virtual community within the offline, face-to-face event. We describe the two deployments, at CHI 2002 and at CSCW 2002, and provide utilization data regarding people's participation within the physical and virtual locales.
Publication Details
  • Human-Computer Interaction INTERACT '03, IOS Press, pp. 33-40
  • Sep 1, 2003

Abstract

Close
To simplify the process of editing interactive video, we developed the concept of "detail-on-demand" video as a subset of general hypervideo where a single button press reveals additional information about the current video sequence. Detail-on-demand video keeps the authoring and viewing interfaces relatively simple while supporting a wide range of interactive video applications. Our editor, Hyper-Hitchcock, builds on prior work on automatic analysis to find the best quality video clips. It introduces video composites as an abstraction for grouping and manipulating sets of video clips. Navigational links can be created between any two video clips or composites. Such links offer a variety of return behaviors for when the linked video is completed that can be tailored to different materials. Initial impressions from a pilot study indicate that Hyper-Hitchcock is easy to learn although the behavior of links is not immediately intuitive for all users.
Publication Details
  • Human-Computer Interaction INTERACT '03, IOS Press, pp. 196-203
  • Sep 1, 2003

Abstract

Close
With digital still cameras, users can easily collect thousands of photos. Our goal is to make organizing and browsing photos simple and quick, while retaining scalability to large collections. To that end, we created a photo management application concentrating on areas that improve the overall experience without neglecting the mundane components of such an application. Our application automatically divides photos into meaningful events such as birthdays or trips. Several user interaction mechanisms enhance the user experience when organizing photos. Our application combines a light table for showing thumbnails of the entire photo collection with a tree view that supports navigating, sorting, and filtering photos by categories such as dates, events, people, and locations. A calendar view visualizes photos over time and allows for the quick assignment of dates to scanned photos. We fine-tuned our application by using it with large personal photo collections provided by several users.
Publication Details
  • Proceedings of Hypertext '03, pp. 124-125
  • Aug 26, 2003

Abstract

Close
Existing hypertext systems have emphasized either the navigational or spatial expression of relationships between objects. We are exploring the combination of these modes of expression in Hyper-Hitchcock, a hypervideo editor. Hyper-Hitchcock supports a form of hypervideo called "detail-on-demand video" due to its applicability to situations where viewers need to take a link to view more details on the content currently being presented. Authors of detail-on-demand video select, group, and spatially arrange video clips into linear sequences in a two-dimensional workspace. Hyper-Hitchcock uses a simple spatial parser to determine the temporal order of selected video clips. Authors add navigational links between the elements in those sequences. This combination of navigational and spatial hypertext modes of expression separates the clip sequence from the navigational structure of the hypervideo. Such a combination can be useful in cases where multiple forms of inter-object relationships must be expressed on the same content.
Publication Details
  • IEEE International Conference on Multimedia and Expo, v. I, pp. 221-224
  • Jul 7, 2003

Abstract

Close
A novel method is presented for inaudibly hiding information in an audio signal by subtly applying time-scale modification to segments of the signal. The sequence, duration, and degree of the time-scale modifications are the parameters which encode information in the altered signal. By comparing the altered signal with a reference copy, compressed and expanded regions can be identified and the hidden data recovered. This approach is novel and has several advantages over other methods: it is theoretically noiseless, it introduces no spectral distortion, and it is robust to all known methods of reproduction, compression, and transmission.
Publication Details
  • IEEE International Conference on Multimedia and Expo, v. II, pp. 77-80
  • Jul 7, 2003

Abstract

Close
We created an improved layout algorithm for automatically generating visual video summaries reminiscent of comic book pages. The summaries are comprised of images from the video that are sized according to their importance. The algorithm performs a global optimization with respect to a layout cost function that encompasses features such as the number of resized images and the amount of whitespace in the presentation. The algorithm creates summaries that: always fit exactly into the requested area, are varied by containing few rows with images of the same size, and have little whitespace at the end of the last row. The layout algorithm is fast enough to allow the interactive resizing of the summaries and the subsequent generation of a new layout.
Publication Details
  • IEEE International Conference on Multimedia and Expo, v. II, pp. 753-756
  • Jul 7, 2003

Abstract

Close
We created an alternative approach to existing video summaries that gives viewers control over the summaries by selecting hyperlinks to other video with additional information. We structure such summaries as "detail-on-demand" video, a subset of general hypervideo in which at most one link to another video sequence is available at any given time. Our editor for such video, Hyper-Hitchcock, provides a workspace in which an author can select and arrange video clips, generate composites from clips and from other composites, and place links between composites. To simplify dealing with a large number of clips, Hyper-Hitchcock generates iconic representations for composites that can be used to manipulate the composite as a whole. In addition to providing an authoring environment, Hyper-Hitchcock can automatically generate multi-level hypervideo summaries for immediate use or as the starting point for author modification.
2002
Publication Details
  • ACM Multimedia 2002
  • Dec 1, 2002

Abstract

Close
We present methods for automatic and semi-automatic creation of music videos, given an arbitrary audio soundtrack and source video. Significant audio changes are automatically detected; similarly, the source video is automatically segmented and analyzed for suitability based on camera motion and exposure. Video with excessive camera motion or poor contrast is penalized with a high unsuitability score, and is more likely to be discarded in the final edit. High quality video clips are then automatically selected and aligned in time with significant audio changes. Video clips are adjusted to match the audio segments by selecting the most suitable region of the desired length. Besides a fully automated solution, our system can also start with clips manually selected and ordered using a graphical interface. The video is then created by truncating the selected clips (preserving the high quality portions) to produce a video digest that is synchronized with the soundtrack music, thus enhancing the impact of both.
Publication Details
  • ACM 2002 Conference on Computer Supported Cooperative Work
  • Nov 16, 2002

Abstract

Close
Technology can play an important role in enabling people to interact with each other. The Web is one such technology with the affordances for sharing information and for connecting people to people. In this paper, we describe the design of two social interaction Web sites for two different social groups. We review several related efforts to provide principles for creating social interaction environments and describe the specific principles that guided our design. To examine the effectiveness of the two sites, we analyze the usage data. Finally, we discuss approaches for encouraging participation and lessons learned.
Publication Details
  • International Journal of Human-Computer Studies, 56, pp. 75-107
  • Feb 1, 2002

Abstract

Close
We describe our experiences with the design, implementation, deployment, and evaluation of a Portholes tool which provides group and collaboration awareness through the Web. The research objective was to explore how such a system would improve communication and facilitate a shared understanding among distributed development groups. During the deployment of our Portholes system, we conducted a naturalistic study by soliciting user feedback and evolving the system in response. Many of the initial reactions of potential users indicated that our system projected the wrong image so that we designed a new version that provided explicit cues about being in public and who is looking back to suggest a social rather than information interface. We implemented the new design as a Java applet and evaluated design choices with a preference study. Our experiences with different Portholes versions and user reactions to them provide insights for designing awareness tools beyond Portholes systems. Our approach is for the studies to guide and to provide feedback for the design and technical development of our system.
2001
Publication Details
  • IEEE Computer, 34(9), pp. 61-67
  • Sep 1, 2001

Abstract

Close

To meet the diverse needs of business, education, and personal video users, the authors developed three visual interfaces that help identify potentially useful or relevant video segments. In such interfaces, keyframes-still images automatically extracted from video footage-can distinguish videos, summarize them, and provide access points. Well-chosen keyframes enhance a listing's visual appeal and help users select videos. Keyframe selection can vary depending on the application's requirements: A visual summary of a video-captured meeting may require only a few highlight keyframes, a video editing system might need a keyframe for every clip, while a browsing interface requires an even distribution of keyframes over the video's full length. The authors conducted user studies for each of their three interfaces, gathering input for subsequent interface improvements. The studies revealed that finding a similarity measure for collecting video clips into groups that more closely match human perception poses a challenge. Another challenge is to further improve the video-segmentation algorithm used for selecting keyframes. A new version will provide users with more information and control without sacrificing the interface's ease of use.

Publication Details
  • In Proceedings of Human-Computer Interaction (INTERACT '01), IOS Press, Tokyo, Japan, pp. 464-471
  • Jul 9, 2001

Abstract

Close
Hitchcock is a system to simplify the process of editing video. Its key features are the use of automatic analysis to find the best quality video clips, an algorithm to cluster those clips into meaningful piles, and an intuitive user interface for combining the desired clips into a final video. We conducted a user study to determine how the automatic clip creation and pile navigation support users in the editing process. The study showed that users liked the ease-of-use afforded by automation, but occasionally had problems navigating and overriding the automated editing decisions. These findings demonstrate the need for a proper balance between automation and user control. Thus, we built a new version of Hitchcock that retains the automatic editing features, but provides additional controls for navigation and for allowing users to modify the system decisions.
2000
Publication Details
  • In Proceedings of UIST '00, ACM Press, pp. 81-89, 2000.
  • Nov 4, 2000

Abstract

Close
Hitchcock is a system that allows users to easily create custom videos from raw video shot with a standard video camera. In contrast to other video editing systems, Hitchcock uses automatic analysis to determine the suitability of portions of the raw video. Unsuitable video typically has fast or erratic camera motion. Hitchcock first analyzes video to identify the type and amount of camera motion: fast pan, slow zoom, etc. Based on this analysis, a numerical "unsuitability" score is computed for each frame of the video. Combined with standard editing rules, this score is used to identify clips for inclusion in the final video and to select their start and end points. To create a custom video, the user drags keyframes corresponding to the desired clips into a storyboard. Users can lengthen or shorten the clip without specifying the start and end frames explicitly. Clip lengths are balanced automatically using a spring-based algorithm.
Publication Details
  • In Multimedia Tools and Applications, 11(3), pp. 347-358, 2000.
  • Aug 1, 2000

Abstract

Close
In accessing large collections of digitized videos, it is often difficult to find both the appropriate video file and the portion of the video that is of interest. This paper describes a novel technique for determining keyframes that are different from each other and provide a good representation of the whole video. We use keyframes to distinguish videos from each other, to summarize videos, and to provide access points into them. The technique can determine any number of keyframes by clustering the frames in a video and by selecting a representative frame from each cluster. Temporal constraints are used to filter out some clusters and to determine the representative frame for a cluster. Desirable visual features can be emphasized in the set of keyframes. An application for browsing a collection of videos makes use of the keyframes to support skimming and to provide visual summaries.
Publication Details
  • In Proceedings of IEEE International Conference on Multimedia and Expo, vol. III, pp. 1329-1332, 2000.
  • Jul 30, 2000

Abstract

Close
We describe a genetic segmentation algorithm for video. This algorithm operates on segments of a string representation. It is similar to both classical genetic algorithms that operate on bits of a string and genetic grouping algorithms that operate on subsets of a set. For evaluating segmentations, we define similarity adjacency functions, which are extremely expensive to optimize with traditional methods. The evolutionary nature of genetic algorithms offers a further advantage by enabling incremental segmentation. Applications include video summarization and indexing for browsing, plus adapting to user access patterns.
Publication Details
  • In Proceedings of the Genetic and Evolutionary Computation Conference, Morgan Kaufmann Publishers, pp. 666-673, 2000.
  • Jul 8, 2000

Abstract

Close
We describe a genetic segmentation algorithm for image data streams and video. This algorithm operates on segments of a string representation. It is similar to both classical genetic algorithms that operate on bits of a string and genetic grouping algorithms that operate on subsets of a set. It employs a segment fair crossover operation. For evaluating segmentations, we define similarity adjacency functions, which are extremely expensive to optimize with traditional methods. The evolutionary nature of genetic algorithms offers a further advantage by enabling incremental segmentation. Applications include browsing and summarizing video and collections of visually rich documents, plus a way of adapting to user access patterns.
Publication Details
  • In Proceedings of Hypertext '00, ACM Press, pp. 244-245, 2000.
  • May 30, 2000

Abstract

Close
We describe a way to make a hypermedia meeting record from multimedia meeting documents by automatically generating links through image matching. In particular, we look at video recordings and scanned paper handouts of presentation slides with ink annotations. The algorithm that we employ is the Discrete Cosine Transform (DCT). Interactions with multipath links and paper interfaces are discussed.
Publication Details
  • In RIAO'2000 Conference Proceedings, Content-Based Multimedia Information Access, C.I.D., pp. 637-648, 2000.
  • Apr 12, 2000

Abstract

Close
We present and interactive system that allows a user to locate regions of video that are similar to a video query. Thus segments of video can be found by simply providing an example of the video of interest. The user selects a video segment for the query from either a static frame-based interface or a video player. A statistical model of the query is calculated on-the-fly, and is used to find similar regions of video. The similarity measure is based on a Gaussian model of reduced frame image transform coefficients. Similarity in a single video is displayed in the Metadata Media Player. The player can be used to navigate through the video by jumping between regions of similarity. Similarity can be rapidly calculated for multiple video files as well. These results are displayed in MBase, a Web-based video browser that allows similarity in multiple video files to be visualized simultaneously.
Publication Details
  • In CHI 2000 Conference Proceedings, ACM Press, pp. 185-192, 2000.
  • Mar 31, 2000

Abstract

Close
This paper presents a method for generating compact pictorial summarizations of video. We developed a novel approach for selecting still images from a video suitable for summarizing the video and for providing entry points into it. Images are laid out in a compact, visually pleasing display reminiscent of a comic book or Japanese manga. Users can explore the video by interacting with the presented summary. Links from each keyframe start video playback and/or present additional detail. Captions can be added to presentation frames to include commentary or descriptions such as the minutes of a recorded meeting. We conducted a study to compare variants of our summarization technique. The study participants judged the manga summary to be significantly better than the other two conditions with respect to their suitability for summaries and navigation, and their visual appeal.
1999