Publications

FXPAL publishes in top scientific conferences and journals.

2019

Abstract

Close
We present a remote assistance system that enables a remotely located expert to provide guidance using hand gestures to a customer who performs a physical task in a different location. The system is built on top of a web-based real-time media communication framework which allows the customer to use a commodity smartphone to send a live video feed to the expert, from which the expert can see the view of the customer's workspace and can show his/her hand gestures over the video in real-time. The expert's hand gesture is captured with a hand tracking device and visualized with a rigged 3D hand model on the live video feed. The system can be accessed via a web browser, and it does not require any app software to be installed on the customer's device. Our system supports various types of devices including smartphone, tablet, desktop PC, and smart glass. To improve the collaboration experience, the system provides a novel gravity-aware hand visualization technique.
Publication Details
  • ACM ISS 2019
  • Nov 9, 2019

Abstract

Close
In a telepresence scenario with remote users discussing a document, it can be difficult to follow which parts are being discussed. One way to address this is by showing the user's hand position on the document, which also enables expressive gestural communication. An important practical problem is how to capture and transmit the hand movements efficiently with high resolution document images. We propose a tabletop system with two channels that integrates document capture with a 4K video camera and hand tracking with a webcam, in which the document image and hand skeleton data are transmitted at different rates and handled by a lightweight Web browser client at remote sites. To enhance the rendering, we employ velocity based smoothing and ephemeral motion traces. We tested our prototype over long distances from USA to Japan and to Italy, and report on latency and jitter performance. Our system achieves relatively low latency over a long distance in comparison with a tele-immersive system that transmits mesh data over much shorter distances.
Publication Details
  • International Conference on the Internet of Things (IoT 2019)
  • Oct 22, 2019

Abstract

Close
A motivating, core capability of most smart, Internet of Things enabled spaces (e.g., home, office, hospital, factory) is the ability to leverage context of use. Location is a key context element; particularly indoor location. Recent advances in radio ranging technologies, such as 802.11-2016 FTM, promise the availability of low-cost, near-ubiquitous time-of-flight-based ranging estimates. In this paper, we build on prior work to enhance the technology's ability to provide useful location estimates. We demonstrate meaningful improvements in coordinate-based estimation accuracy and substantial increases in room-level estimation accuracy. Furthermore, insights gained in our real-world deployment provides important implications for future Internet of Things context applications and their supporting technology deployments such as workflow management, inventory control, or healthcare information tools.
Publication Details
  • ACM MM
  • Oct 21, 2019

Abstract

Close
Despite work on smart spaces, nowadays a lot of knowledge work happens in the wild: at home, in coffee places, trains, buses, planes, and of course in crowded open office cubicles. Conducting web conferences in these settings creates privacy issues, and can also distract participants, leading to a perceived lack of professionalism from the remote peer(s). To solve this common problem, we implemented CamaLeon, a browser-based tool that uses real-time machine vision powered by deep learning to change the webcam stream sent by the remote peer: specifically, CamaLeon dynamically changes the "wild" background into one that resembles that of the office workers. In order to detect the background in wild settings, we designed and trained a fast UNet model on head and shoulder images. CamaLeon also uses a face detector to determine whether it should stream the person's face, depending on its location (or lack of presence). It uses face recognition to make sure it streams only a face that belongs to the user who connected to the meeting. The system was tested during a few real video conferencing calls at our company where 2 workers are remote. Both parties felt a sense of enhanced co-presence, and the remote participants felt more professional with their background replaced.
Publication Details
  • ACM MM
  • Oct 21, 2019

Abstract

Close
Responding to requests for information from an application, a remote person, or an organization that involve documenting the presence and/or state of physical objects can lead to incomplete or inaccurate documentation. We propose a system that couples information requests with a live object recognition tool to semi-automatically catalog requested items and collect evidence of their current state.
Publication Details
  • ACM MM
  • Oct 20, 2019

Abstract

Close
Multimedia research has now moved beyond laboratory experiments and is rapidly being deployed in real-life applications including advertisements, social interaction, search, security, automated driving, and healthcare. Hence, the developed algorithms now have a direct impact on the individuals using the abovementioned services and the society as a whole. While there is a huge potential to benefit the society using such technologies, there is also an urgent need to identify the checks and balances to ensure that the impact of such technologies is ethical and positive. This panel will bring together an array of experts who have experience collecting large-scale datasets, building multimedia algorithms, and deploying them in practical applications, as well as, a lawyer whose eyes have been on the fundamental rights at stake. They will lead a discussion on the ethics and lawfulness of dataset creation, licensing, privacy of individuals represented in the datasets, algorithmic transparency, algorithmic bias, explainability, and the implications of application deployment. Through an interactive process engaging the audience, the panel hopes to: increase the awareness of such concepts in the multimedia research community; initiate a discussion on community guidelines all for setting the future direction of conducting multimedia research in a lawful and ethical manner.
Publication Details
  • VDS'19
  • Oct 20, 2019

Abstract

Close
Computational notebooks have become a major medium for data exploration and insight communication in data science. Although expressive, dynamic, and flexible, in practice they are loose collections of scripts, charts, and tables that rarely tell a story or clearly represent the analysis process. This leads to a number of usability issues, particularly in the comprehension and exploration of notebooks. In this work, we design, implement, and evaluate Albireo, a visualization approach to summarize the structure of notebooks, with the goal of supporting more effective exploration and communication by displaying the dependencies and relationships between the cells of a notebook using a dynamic graph structure. We evaluate the system via a case study and expert interviews, with our results indicating that such a visualization is useful for an analyst’s self-reflection during exploratory programming, and also effective for communication of narratives and collaboration between analysts.

Interactive Bicluster Aggregation in Bipartite Graphs

Publication Details
  • IEEE VIS 2019
  • Oct 20, 2019

Abstract

Close
Exploring coordinated relationships is important for sensemaking of data in various fields, such as intelligence analysis. To support such investigations, visual analysis tools use biclustering to mine relationships in bipartite graphs and visualize the resulting biclusters with standard graph visualization techniques. Due to overlaps among biclusters, such visualizations can be cluttered (e.g., with many edge crossings), when there are a large number of biclusters. Prior work attempted to resolve this problem by automatically ordering nodes in a bipartite graph. However, visual clutter is still a serious problem, since the number of displayed biclusters remains unchanged. We propose bicluster aggregation as an alternative approach, and have developed two methods of interactively merging biclusters. These interactive bicluster aggregations help organize similar biclusters and reduce the number of displayed biclusters. Initial expert feedback indicates potential usefulness of these techniques in practice.
Publication Details
  • IEEE InfoVis 2019
  • Oct 20, 2019

Abstract

Close
Think-aloud protocols are widely used by user experience (UX) practitioners in usability testing to uncover issues in user interface design. It is often arduous to analyze large amounts of recorded think-aloud sessions and few UX practitioners have an opportunity to get a second perspective during their analysis due to time and resource constraints. Inspired by the recent research that shows subtle verbalization and speech patterns tend to occur when users encounter usability problems, we take the first step to design and evaluate an intelligent visual analytics tool that leverages such patterns to identify usability problem encounters and present them to UX practitioners to assist their analysis. We first conducted and recorded think-aloud sessions, and then extracted textual and acoustic features from the recordings and trained machine learning (ML) models to detect problem encounters. Next, we iteratively designed and developed a visual analytics tool, VisTA, which enables dynamic investigation of think-aloud sessions with a timeline visualization of ML predictions and input features. We conducted a between-subjects laboratory study to compare three conditions, i.e., VisTA, VisTASimple (no visualization of the ML’s input features), and Baseline (no ML information at all), with 30 UX professionals. The findings show that UX professionals identified more problem encounters when using VisTA than Baseline by leveraging the problem visualization as an overview, anticipations, and anchors as well as the feature visualization as a means to understand what ML considers and omits. Our findings also provide insights into how they treated ML, dealt with (dis)agreement with ML, and reviewed the videos (i.e., play, pause, and rewind).
Publication Details
  • IEEE VIS 2019
  • Oct 20, 2019

Abstract

Close
The analysis of bipartite networks is critical in a variety of application domains, such as exploring entity co-occurrences in intelligence analysis and investigating gene expression in bio-informatics. One important task is missing link prediction, which infers the existence of unseen links based on currently observed ones. In this paper, we propose MissBiN that involves analysts in the loop for making sense of link prediction results. MissBiN combines a novel method for link prediction and an interactive visualization for examining and understanding the algorithm outputs. Further, we conducted quantitative experiments to assess the performance of the proposed link prediction algorithm, and a case study to evaluate the overall effectiveness of MissBiN.

Abstract

Close
Localization in an indoor and/or Global Positioning System (GPS)-denied environment is paramount to drive various applications that require locating humans and/or robots in an unknown environment. Various localization systems using different ubiquitous sensors such as camera, radio frequency, inertial measurement unit have been developed. Most of these systems cannot accommodate for scenarios which have substan- tial changes in the environment such as a large number of people (unpredictable) and sudden change in the environment floor plan (unstructured). In this paper, we propose a system, InFo that can leverage real-time visual information captured by surveillance cameras and augment that with images captured by the smart device user to deliver accurate discretized location information. Through our experiments, we demonstrate that our deep learning based InFo system provides an improvement of 10% as compared to a system that does not utilize this real-time information.
Publication Details
  • British Machine Vision Conference (BMVC 2019)
  • Sep 1, 2019

Abstract

Close
Automatic medical report generation from chest X-ray images is one possibility for assisting doctors to reduce their workload. However, the different patterns and data distribution of normal and abnormal cases can bias machine learning models. Previous attempts did not focus on isolating the generation of the abnormal and normal sentences in order to increase the variability of generated paragraphs. To address this, we propose to separate abnormal and normal sentence generation by using a dual word LSTM in a hierarchical LSTM model. In addition, we conduct an analysis on the distinctiveness of generated sentences compared to the BLEU score, which increases when less distinct reports are generated. Together with this analysis, we propose a way of selecting a model that generates more distinctive sentences. We hope our findings will help to encourage the development of new metrics to better verify methods of automatic medical report generation.
Publication Details
  • To appear in Natural Language Engineering
  • Aug 16, 2019

Abstract

Close
Twitter and other social media platforms are often used for sharing interest in products. The identification of purchase decision stages, such as in the AIDA model (Awareness, Interest, Desire, Action), can enable more personalized e-commerce services and a finer-grained targeting of ads than predicting purchase intent only. In this paper, we propose and analyze neural models for identifying the purchase stage of single tweets in a user's tweet sequence. In particular, we identify three challenges of purchase stage identification: imbalanced label distribution with a high number of negative instances, limited amount of training data, and domain adaptation with no or only little target domain data. Our experiments reveal that the imbalanced label distribution is the main challenge for our models. We address it with ranking loss and perform detailed investigations of the performance of our models on the different output classes. In order to improve the generalization of the models and augment the limited amount of training data, we examine the use of sentiment analysis as a complementary, secondary task in a multitask framework. For applying our models to tweets from another product domain, we consider two scenarios: For the first scenario without any labeled data in the target product domain, we show that learning domain-invariant representations with adversarial training is most promising while for the second scenario with a small number of labeled target examples, finetuning the source model weights performs best. Finally, we conduct several analyses, including extracting attention weights and representative phrases for the different purchase stages. The results suggest that the model is learning features indicative of purchase stages and that the confusion errors are sensible.
Publication Details
  • The 17th IEEE International Conference on Embedded and Ubiquitous Computing (IEEE EUC 2019)
  • Aug 2, 2019

Abstract

Close
Human activity forecasting from videos in routine-based tasks is an open research problem that has numerous applications in robotics, visual monitoring and skill assessment. Currently, many challenges exist in activity forecasting because human actions are not fully observable from continuous recording. Additionally, a large number of human activities involve fine-grained articulated human motions that are hard to capture using frame-level representations. To overcome thesechallenges, we propose a method that forecasts human actions by learning the dynamics of local motion patterns extracted from dense trajectories using longshort-term memory (LSTM). The experiments on a pub-lic dataset validated the effectiveness of our proposed method in activity forecasting and demonstrate large improvements over the baseline two stream end-to-endmodel. We also learnt that human activity forecasting benefits from learning both the short-range motion pat-terns and long-term dependencies between actions.
Publication Details
  • 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019)
  • Jul 28, 2019

Abstract

Close
A common issue in training a deep learning, abstractive summarization model is lack of a large set of training summaries. This paper examines techniques for adapting from a labeled source domain to an unlabeled target domain in the context of an encoder-decoder model for text generation. In addition to adversarial domain adaptation (ADA), we introduce the use of artificial titles and sequential training to capture the grammatical style of the unlabeled target domain. Evaluation on adapting to/from news articles and Stack Exchange posts indicates that the use of these techniques can boost performance for both unsupervised adaptation as well as fine-tuning with limited target data.

Abstract

Close
An open challenge in current telecommunication systems including Skype and other existing research systems is a lack of physical interaction, and consequently a restricted feeling of connection for users. For example, those telecommunication systems cannot allow remote users to move pieces of a board game while playing with a local user. We propose that installing a robot arm and teleoperating it can address the problem by enabling remote physical interaction. We compare three methods for remote control to study the relationship between connection, and how it relates to agency and autonomy for each control scheme.
Publication Details
  • ACM SIGMOD/PODS workshop on Human-In-the-Loop Data Analytics (HILDA)
  • Jun 30, 2019

Abstract

Close
Manufacturing environments require changes in work procedures and settings based on changes in product demand affecting the types of products for production. Resource re-organization and time needed for worker adaptation to such frequent changes can be expensive. For example, for each change, managers in a factory may be required to manually create a list of inventory items to be picked up by workers. Uncertainty in predicting the appropriate pick-up time due to differences in worker-determined routes may make it difficult for managers to generate a fixed schedule for delivery to the assembly line. To address these problems, we propose OPaPi, a human-centric system that improves the efficiency of manufacturing by optimizing parts pick-up routes and schedules. OPaPi leverages frequent pattern mining and the traveling salesman problem solver to suggest rack placement for more efficient routes. The system further employs interactive visualization to incorporate an expert’s domain knowledge and different manufacturing constraints for real-time adaptive decision making.
Publication Details
  • Designing Interactive Systems (DIS) 2019
  • Jun 23, 2019

Abstract

Close
As our landscape of wearable technologies proliferates, we find more devices situated on our heads. However, many challenges hinder them from widespread adoption---from their awkward, bulky form factor (today's AR and VR goggles) to their socially stigmatized designs (Google Glass) and a lack of a well-developed head-based interaction design language. In this paper, we explore a socially acceptable, large, head-worn interactive wearable---a hat. We report results from a gesture elicitation study with 17 participants, extract a taxonomy of gestures, and define a set of design concerns for interactive hats. Through this lens, we detail the design and fabrication of three hat prototypes capable of sensing touch, head movements, and gestures, and including ambient displays of several types. Finally, we report an evaluation of our hat prototype and insights to inform the design of future hat technologies.
Publication Details
  • International Conference on Weblogs and Social Media (ICWSM) 2019
  • Jun 12, 2019

Abstract

Close
Millions of images are shared through social media every day. Yet, we know little about how the activities and preferences of users are dependent on the content of these images. In this paper, we seek to understand viewers engagement with photos. We design a quantitative study to expand previous research on in-app visual effects (also known as filters) through the examination of visual content identified through computer vision. This study is based on analysis of 4.9M Flickr images and is organized around three important engagement factors, likes, comments and favorites. We find that filtered photos are not equally engaging across different categories of content. Photos of food and people attract more engagement when filters are used, while photos of natural scenes and photos taken at night are more engaging when left unfiltered. In addition to contributing to the research around social media engagement and photography practices, our findings offer several design implications for mobile photo sharing platforms.
Publication Details
  • arxiv
  • Jun 5, 2019

Abstract

Close
In multi-participant postings, as in online chat conversations, several conversations or topic threads may take place concurrently. This leads to difficulties for readers reviewing the postings in not only following discussions but also in quickly identifying their essence. A two-step process, disentanglement of interleaved posts followed by summarization of each thread, addresses the issue, but disentanglement errors are propagated to the summarization step, degrading the overall performance. To address this, we propose an end-to-end trainable encoder-decoder network for summarizing interleaved posts. The interleaved posts are encoded hierarchically, i.e., word-to-word (words in a post) followed by post-to-post (posts in a channel). The decoder also generates summaries hierarchically, thread-to-thread (generate thread representations) followed by word-to-word (i.e., generate summary words). Additionally, we propose a hierarchical attention mechanism for interleaved text. Overall, our end-to-end trainable hierarchical framework enhances performance over a sequence to sequence framework by 8\% on a synthetic interleaved texts dataset.
Publication Details
  • ACM TVX 2019
  • Jun 5, 2019

Abstract

Close
Advancements in 360° cameras have increased their related livestreams. In the case of video conferencing, 360° cameras provide almost unrestricted visibility into a conference room for a remote viewer without the need for an articulating camera. However, local participants are left wondering if someone is connected and where remote participants might be looking. To address this, we fabricated a prototype device that shows the gaze and presence of remote 360° viewers using a ring of LEDs that match the remote viewports. We discuss the long term use of one of the prototypes in a lecture hall and present future directions for visualizing gaze presence in 360° video streams.
Publication Details
  • ACM TVX 2019
  • Jun 5, 2019

Abstract

Close
Livestreaming and video calls have grown in popularity due to the increased connectivity and advancements in mobile devices. Our interactions with these cameras are limited as the cameras are either fixed or manually remote controlled. Here we present a Wizard-of-Oz elicitation study to inform the design of interactions with smart 360\textdegree\ cameras or robotic mobile desk cameras for use in video-conferences and live-streaming situations. There was an overall preference for devices that can minimize distraction as well as preferences for devices that can show they demonstrate an understanding of video-meeting context. We find participants dynamically grow with regards to the complexity of interactions which illustrate the need for deeper event semantics within the Camera AI. Finally, we detail interaction techniques and design insights to inform the next generation of personal video cameras for streaming and collaboration.
Publication Details
  • Personal and Ubiquitous Computing
  • May 7, 2019

Abstract

Close
Reliable location estimation has been a key enabler of many applications in the UbiComp space. Much progress has been made on the development of accurate of indoor location systems, which form the foundation of many interesting applications, particularly in consumer scenarios. However, many location-based applications in enterprise settings also require addressing another facet of reliability: assurance. Without having strong guarantees of a location estimate’s legitimacy, stakeholders must explicitly balance the advantages offered with the risks of falsification. In this space, there are two key threats: replay attacks, where signal and sensor information is collected in one location and replayed in another to falsify a location estimation later in time; and wormhole attacks, where signal and sensor information is forwarded to a remote location by a colluding device to falsify location estimation in real-time. In this work, we improve upon the state of the art in wormhole-resistant location estimation techniques. Specifically, we present the Location Anchor, which leverages a combination of technical solutions and social contracts to provide high-assurance proofs of device location that are resistant to wormhole attacks. Unlike existing work, the Location Anchor has minimal hardware costs, supports a rich tapestry of applications, and is compatible with commodity smartphone and tablet platforms. We show that the Location Anchor can extend existing replay-resistant location systems into wormhole-resistant location systems, even in the face of very aggressive attacker assumptions. We describe the protocols underlying the Location Anchor, as well as report on the efficacy of a prototype implementation.

Augmenting Knowledge Tracing by Considering Forgetting Behavior

Publication Details
  • The Web Conference 2019 (formerly WWW)
  • Apr 29, 2019

Abstract

Close
We describe a corpus analysis method to extract terminology from a collection of technical specifications book in the field of construction. Using statistics and word n-grams analyzes, we extract the terminology of the domain and then perform pruning steps with linguistic patterns and internet queries to improve the quality of the final terminology. In this paper we specifically focus on the improvements got by applying Internet queries and patterns. These improvements are evaluated by using a manual evaluation carried out by 6 experts in the field in the case of technical specification books.
Publication Details
  • CHI 2019
  • Apr 27, 2019

Abstract

Close
Work breaks -- both physical and digital -- play an important role in productivity and workplace wellbeing. Yet, the growing availability of digital distractions from online content can turn breaks into prolonged "cyberloafing". In this paper, we present UpTime, a system that aims to support workers' transitions from breaks back to work--moments susceptible to digital distractions. Combining a browser extension and chatbot, users interact with UpTime through proactive and reactive chat prompts. By sensing transitions from inactivity, UpTime helps workers avoid distractions by automatically blocking distracting websites temporarily, while still giving them control to take necessary digital breaks. We report findings from a 3-week comparative field study with 15 workers. Our results show that automatic, temporary blocking at transition points can significantly reduce digital distractions and stress without sacrificing workers' sense of control. Our findings, however, also emphasize that overloading users' existing communication channels for chatbot interaction should be done thoughtfully.
Publication Details
  • Internet of Things: Engineering Cyber Physical Human Systems
  • Mar 15, 2019

Abstract

Close
Recent advances on the Internet of Things (IoT) lead to an explosion of physical objects being connected to the Internet. These objects sense, compute, interpret what is occurring within themselves and the world, and preferably interact with users. In this work, we present a visible light-enabled finger tracking technique allowing users to perform freestyle multi-touch gestures on everyday object’s surface. By projecting encoded patterns onto an object’s surface (e.g. paper, display, or table) through a projector, and localizing the user’s fingers with light sensors, the proposed system offers users a richer interactive space than the device’s existing interfaces. More importantly, results from our experiments indicate that this system can localize ten fingers simultaneously with an accuracy of 1.7 millimeters and an refresh rate of 84 Hz with only 31 milliseconds delay on WiFi or 23 milliseconds delay on serial communication, easily supporting multi-finger gesture interaction on everyday ob-jects. We also develop two example applications to demonstrate possible scenarios. Finally, we conduct a pre-liminary exploration of 3D depth inference using the same setup and achieve 2.43 cm depth estimation accuracy.
Publication Details
  • IEEE 2nd International Conference on Multimedia Information Processing and Retrieval
  • Mar 14, 2019

Abstract

Close
We present an approach to detect speech impairments from video of people with aphasia, a neurological condition that affects the ability to comprehend and produce speech. To counter inherent privacy issues, we propose a cross-media approach using only visual facial features to detect speech properties without listening to the audio content of speech. Our method uses facial landmark detections to measure facial motion over time. We show how to detect speech and pause instances based on temporal mouth shape analysis and identify repeating mouth patterns using a dynamic warping mechanism. We relate our developed features for pause frequency, mouth pattern repetitions, and pattern variety to actual symptoms of people with aphasia in the AphasiaBank dataset. Our evaluation shows that our developed features are able to reliably differentiate dysfluent speech production of people with aphasia from those without aphasia with an accuracy of 0.86. A combination of these handcrafted features and further statistical measures on talking and repetition improves classification performance to an accuracy of 0.88.
Publication Details
  • ACM Transactions on Interactive Intelligent System
  • Jan 31, 2019

Abstract

Close
Activity recognition is a core component of many intelligent and context-aware systems. We present a solution for discreetly and unobtrusively recognizing common work activities above a work surface without using cameras.We demonstrate our approach, which utilizes an RF-radar sensor mounted under the work surface, in three domains; recognizing work activities at a convenience-store counter, recognizing common office deskwork activities, and estimating the position of customers in a showroom environment. Our examples illustrate potential benefits for both post-hoc business analytics and for real-time applications. Our solution was able to classify seven clerk activities with 94.9% accuracy using data collected in a lab environment and able to recognize six common deskwork activities collected in real offices with 95.3% accuracy. Using two sensors simultaneously, we demonstrate coarse position estimation around a large surface with 95.4% accuracy. We show that using multiple projections of RF signal leads to improved recognition accuracy. Finally, we show how smartwatches worn by users can be used to attribute an activity, recognized with the RF sensor, to a particular user in multi-user scenarios. We believe our solution can mitigate some of users’ privacy concerns associated with cameras and is useful for a wide range of intelligent systems.
2018

AI for Toggling the Linearity of Interactions in AR

Publication Details
  • IEEE AIVR 18
  • Dec 10, 2018

Abstract

Close
Interaction in Augmented Reality or Mixed Reality environments is generally classified into two modalities: linear (relative to object) or non-linear (relative to camera). Switching between these modes can be arduous in cases where someone's interaction with the device is limited or restricted as is often the case in medical or industrial applications where one's hands might be sterile or soiled. To solve this, we present Sound-to-Experience where the modality can be effectively toggled by a noise or sound which is detected using a modern Artificial Intelligence deep-network classifier.

Abstract

Close
The analysis of bipartite networks is critical in many application domains, such as studying gene expression in bio-informatics. One important task is missing link prediction, which infers the exis- tence of new links based on currently observed ones. However, in practice, analysts need to utilize their domain knowledge based on the algorithm outputs in order to make sense of the results. We pro- pose a novel visual analysis framework, MissBi, which allows for examining and understanding missing links in bipartite networks. Some initial feedback from a management school professor has demonstrated the effectiveness of the tool.
Publication Details
  • ISS 2018
  • Nov 25, 2018

Abstract

Close
Projector-camera systems can turn any surface such as tabletops and walls into an interactive display. A basic problem is to recognize the gesture actions on the projected UI widgets. Previous approaches using finger template matching or occlusion patterns have issues with environmental lighting conditions, artifacts and noise in the video images of a projection, and inaccuracies of depth cameras. In this work, we propose a new recognizer that employs a deep neural net with an RGB-D camera; specifically, we use a CNN (Convolutional Neural Network) with optical flow computed from the color and depth channels. We evaluated our method on a new dataset of RGB-D videos of 12 users interacting with buttons projected on a tabletop surface.
Publication Details
  • CSCW2018
  • Nov 3, 2018

Abstract

Close
Searching collaboratively for places of interest is a common activity that frequently occurs on individual mobile phones, or on large tourist-information displays in public places such as visitor centers or train stations. We created a public display system for collaborative travel planning, as well as a mobile app that can augment the display. We tested them against third-party mobile apps in a simulated travel-search task to understand how the unique features of mobile phones and large displays might be leveraged together to improve collaborative travel planning experience.
Publication Details
  • EMNLP 2018
  • Oct 31, 2018

Abstract

Close
We leverage a popularity measure in social media as a distant label for extractive summarization of online conversations. In social media, users can vote, share, or bookmark a post they prefer. The number of these actions is regarded as a measure of popularity. However, popularity is not solely determined by content of a post, e.g., a text or an image in a post, but is highly contaminated by its contexts, e.g., timing, and authority. We propose a disjunctive model, which computes the contribution of content and context separately. For evaluation, we build a dataset where the informativeness of a comment is annotated. We evaluate the results with ranking metrics, and show that our model outperforms the baseline model, which directly uses popularity as a measure of informativeness.

InkPlanner: Supporting Prewriting via Intelligent Visual Diagramming

Publication Details
  • IEEE Transactions on Visualization and Computer Graphics (Proceedings of VAST 2018)
  • Oct 21, 2018

Abstract

Close
Prewriting is the process of generating and organizing ideas before drafting a document. Although often overlooked by novice writers and writing tool developers, prewriting is a critical process that improves the quality of a final document. To better understand current prewriting practices, we first conducted interviews with writing learners and experts. Based on the learners’ needs and experts’ recommendations, we then designed and developed InkPlanner, a novel pen and touch visualization tool that allows writers to utilize visual diagramming for ideation during prewriting. InkPlanner further allows writers to sort their ideas into a logical and sequential narrative by using a novel widget— NarrativeLine. Using a NarrativeLine, InkPlanner can automatically generate a document outline to guide later drafting exercises. Inkplanner is powered by machine-generated semantic and structural suggestions that are curated from various texts. To qualitatively review the tool and understand how writers use InkPlanner for prewriting, two writing experts were interviewed and a user study was conducted with university students. The results demonstrated that InkPlanner encouraged writers to generate more diverse ideas and also enabled them to think more strategically about how to organize their ideas for later drafting.
Publication Details
  • The 8th International Conference on the Internet of Things (IoT 2018)
  • Oct 15, 2018

Abstract

Close
With the tremendous progress in sensing and IoT infrastructure, it is foreseeable that IoT systems will soon be available for commercial markets, such as in people's homes. In this paper, we present a deployment study using sensors attached to household objects to capture the resourcefulness of three individuals. The concept of resourcefulness highlights the ability of humans to repurpose objects spontaneously for a different use case than was initially intended. It is a crucial element for human health and wellbeing, which is of great interest for various aspects of HCI and design research. Traditionally, resourcefulness is captured through ethnographic practice. Ethnography can only provide sparse and often short duration observations of human experience, often relying on participants being aware of and remembering behaviours or thoughts they need to report on. Our hypothesis is that resourcefulness can also be captured through continuously monitoring objects being used in everyday life. We developed a system that can record object movement continuously and deployed them in homes of three elderly people for over two weeks. We explored the use of probabilistic topic models to analyze the collected data and identify common patterns.
Publication Details
  • UbiComp 2018 (IMWUT)
  • Oct 1, 2018

Abstract

Close
Despite reflection being identified as a key component of behavior change, most existing tools do not explicitly design for it, carrying an implicit assumption that providing access to self-tracking data is enough to trigger reflection. In this work we design a system for reflection around physical activity. Through a set of workshops, we generated a corpus of 275 reflective questions. We then combine these questions into a set of 25 reflective mini-dialogues. We deliver our mini-dialogues through MMS. 33 active users of fitness trackers used our system in a 2-week field deployment. Results suggest that the mini-dialogues were successful in triggering reflection and that this reflection led to increases in motivation, empowerment, and adoption of new behaviors. Encouragingly, 16 participants elected to use the system for two additional weeks without compensation. We present implications for the design of technology-supported dialog system for reflection.
Publication Details
  • UbiComp 2018 (IMWUT)
  • Oct 1, 2018

Abstract

Close
Continuous monitoring with unobtrusive wearable social sensors is becoming a popular method to assess individual affect states and team effectiveness in human research. A large number of applications have demonstrated the effectiveness of applying wearable sensing in corporate settings; for example, in short periodic social events or in a university campus. However, little is known of how we can automatically detect individual affect and group cohesion for long duration missions. Predicting negative affect states and low cohesiveness is vital for team missions. Knowing team members’ negative states allows timely interventions to enhance their effectiveness. This work investigates whether sensing social interactions and individual behaviors with wearable sensors can provide insights into assessing individual affect states and group cohesion. We analyzed wearable sensor data from a team of six crew members who were deployed on a four-month simulation of a space exploration mission at a remote location. Our work proposes to recognize team members’ affect states and group cohesion as a binary classification problem using novel behavior features that represent dyadic interaction and individual activities. Our method aggregates features from individual members into group levels to predict team cohesion. Our results show that the behavior features extracted from the wearable social sensors provide useful information in assessing personal affect and team cohesion. Group task cohesion can be predicted with a high performance of over 0.8 AUC. Our work demonstrates that we can extract social interactions from sensor data to predict group cohesion in longitudinal missions. We found that quantifying behavior patterns including dyadic interactions and face-to-face communications are important in assessing team process.
Publication Details
  • International Conference on Indoor Positioning and Indoor Navigation
  • Sep 24, 2018

Abstract

Close
Accurate localization is a fundamental requirement for a variety of applications, ranging from industrial robot operations to location-powered applications on mobile devices. A key technical challenge in achieving this goal is providing a clean and reliable estimation of location from a variety of low-cost, uncalibrated sesnors. Many current techniques rely on Particle Filter (PF) based algorithms. They have proven successful at effectively fusing various sensors inputs to create meaningful location predictions. In this paper we build upon this large corpous of work. Like prior work, our technique fuses Received Signal Strength Indicator (RSSI) measurements from Bluetooth Low Energy (BLE) beacons with map information. A key contribution of our work is a new sensor model for BLE beacons that does not require the mapping from RSSI to distance. We further contribute a novel method of utilizing map information during the initialization of the system and during the resampling phase when new particles are generated. Using our proposed sensor model and map prior information the performance of the overall localization is improved by 1.20 m on comparing the 75th percentile of the cumulative distribution with traditional localization techniques.

A Radio-Inertial Localization and Tracking System with BLEBeacons Prior Maps

Publication Details
  • 9th International Conference on Indoor Positioning and Indoor Navigation
  • Sep 24, 2018

Abstract

Close
In this paper, we develop a system for the lowcost indoor localization and tracking problem using radio signal strength indicator, Inertial Measurement Unit (IMU), and magnetometer sensors. We develop a novel and simplified probabilistic IMU motion model as the proposal distribution of the sequential Monte-Carlo technique to track the robot trajectory. Our algorithm can globally localize and track a robot with a priori unknown location, given an informative prior map of the Bluetooth Low Energy (BLE) beacons. Also, we formulate the problem as an optimization problem that serves as the Backend of the algorithm mentioned above (Front-end). Thus, by simultaneously solving for the robot trajectory and the map of BLE beacons, we recover a continuous and smooth trajectory of the robot, corrected locations of the BLE beacons, and the time varying IMU bias. The evaluations achieved using hardware show that through the proposed closed-loop system the localization performance can be improved; furthermore, the system becomes robust to the error in the map of beacons by feeding back the optimized map to the Front-end.
Publication Details
  • Studies in Conversational UX Design
  • Sep 4, 2018

Abstract

Close
In this chapter we discuss the use of external sources of data in designing conversational dialogues. We focus on applications in behavior change around physical activity involving dialogues that help users better understand their self-tracking data and motivate healthy behaviors. We start by introducing the areas of behavior change and personal informatics and discussing the importance of self-tracking data in these areas. We then introduce the role of reflective dialogue-based counseling systems in this domain, discuss specific value that self-tracking data can bring, and how it can be used in creating the dialogues. The core of the chapter focuses on six practical examples of design of dialogues involving self-tracking data that we either tested in our research or propose as future directions based on our experiences. We end the chapter by discussing how the design principles for involving external data in conversations can be applied to broader domains. Our goal for this chapter is to share our experiences, outline design principles, highlight several design opportunities in external data-driven computer-based conversations, and encourage the reader to explore creative ways of involving external sources of data in shaping dialogues-based interactions.
Publication Details
  • Document Engineering
  • Aug 28, 2018

Abstract

Close
We introduce a system to automatically manage photocopies made from copyrighted printed materials. The system monitors photocopiers to detect the copying of pages from copyrighted publications. Such activity is tallied for billing purposes. Access rights to the materials can be checked to prevent printing. Digital images of the copied pages are checked against a database of copyrighted pages. To preserve the privacy of the copying of non-copyright materials, only digital fingerprints are submitted to the image matching service. A problem with such systems is creation of the database of copyright pages. To facilitate this, our system maintains statistics of clusters of similar unknown page images along with copy sequence. Once such a cluster has grown to a sufficient size, a human inspector can determine whether those page sequences are copyrighted. The system has been tested with 100,000s of pages from conference proceedings and with millions of randomly generated pages. Retrieval accuracy has been around 99% even with copies of copies or double-page copies.

FormYak: Converting forms to conversations

Publication Details
  • DocEng 2018
  • Aug 28, 2018

Abstract

Close
Historically, people have interacted with companies and institutions through telephone-based dialogue systems and paper-based forms. Now, these interactions are rapidly moving to web- and phone-based chat systems. While converting traditional telephone dialogues to chat is relatively straightforward, converting forms to conversational interfaces can be challenging. In this work, we introduce methods and interfaces to enable the conversion of PDF and web-based documents that solicit user input into chat-based dialogues. Document data is first extracted to associate fields and their textual descriptions using meta-data and lightweight visual analysis. The field labels, their spatial layout, and associated text are further analyzed to group related fields into natural conversational units. These correspond to questions presented to users in chat interfaces to solicit information needed to complete the original documents and downstream processes they support. This user supplied data can be inserted into the source documents and/or in downstream databases. User studies of our tool show that it streamlines form-to-chat conversion and produces conversational dialogues of at least the same quality as a purely manual approach.
Publication Details
  • DocEng 2018
  • Aug 28, 2018

Abstract

Close
SlideDiff is a system that automatically creates an animated rendering of textual and media differences between two versions of a slide. While previous work focuses either on textual or image data, SlideDiff integrates text and media changes, as well as their interactions, e.g. adding an image forces nearby text boxes to shrink. Provided with two versions of a slide (not the full history of edits), SlideDiff detects the textual and image differences, and then animates the changes by mimicking what a user would have done, such as moving the cursor, typing text, resizing image boxes, adding images. This editing metaphor is well known to most users, helping them better understand what has changed, and fosters a sense of connection between remote workers, making them feel as if we edited together. After detection of text and image differences, the animations are rendered in HTML and CSS, including mouse cursor motion, text and image box selection and resizing, text deletion and insertion with its cursor. We discuss strategies for animating changes, in particular the importance of starting with large changes and finishing with smaller edits, and provide evidence of the utility of SlideDiff in a workplace setting.

The Effect of Edge Bundling and Seriation on Sensemaking of Biclusters in Bipartite Graphs

Publication Details
  • IEEE Transactions on Visualization and Computer Graphics
  • Jul 31, 2018

Abstract

Close
Exploring coordinated relationships (e.g., shared relationships between two sets of entities) is an important analytics task in a variety of real-world applications, such as discovering similarly behaved genes in bioinformatics, detecting malware collusions in cyber security, and identifying products bundles in marketing analysis. Coordinated relationships can be formalized as biclusters. In order to support visual exploration of biclusters, bipartite graphs based visualizations have been proposed, and edge bundling is used to show biclusters. However, it suffers from edge crossings due to possible overlaps of biclusters, and lacks in-depth understanding of its impact on user exploring biclusters in bipartite graphs. To address these, we propose a novel bicluster-based seriation technique that can reduce edge crossings in bipartite graphs drawing and conducted a user experiment to study the effect of edge bundling and this proposed technique on visualizing biclusters in bipartite graphs. We found that they both had impact on reducing entity visits for users exploring biclusters, and edge bundles helped them find more justified answers. Moreover, we identified four key trade-offs that inform the design of future bicluster visualizations. The study results suggest that edge bundling is critical for exploring biclusters in bipartite graphs, which helps to reduce low-level perceptual problems and support high-level inferences.
Publication Details
  • The 23rd ACM Symposium on Access Control Models & Technologies (SACMAT)
  • Jun 13, 2018

Abstract

Close
Devices with embedded sensors are permeating the computing landscape, allowing the collection and analysis of rich data about individuals, smart spaces, and their interactions. This class of de- vices enables a useful array of home automation and connected workplace functionality to individuals within instrumented spaces. Unfortunately, the increasing pervasiveness of sensors can lead to perceptions of privacy loss by their occupants. Given that many instrumented spaces exist as platforms outside of a user’s control—e.g., IoT sensors in the home that rely on cloud infrastructure or connected workplaces managed by one’s employer—enforcing access controls via a trusted reference monitor may do little to assuage individuals’ privacy concerns. This calls for novel enforcement mechanisms for controlling access to sensed data. In this paper, we investigate the interplay between sensor fidelity and individual comfort, with the goal of understanding the design space for effective, yet palatable, sensors for the workplace. In the context of a common space contextualization task, we survey and interview individuals about their comfort with three common sensing modalities: video, audio, and passive infrared. This allows us to explore the extent to which discomfort with sensor platforms is a function of detected states or sensed data. Our findings uncover interesting interplays between content, context, fidelity, history, and privacy. This, in turn, leads to design recommendations regarding how to increase comfort with sensing technologies by revisiting the mechanisms by which user preferences and policies are enforced in situations where the infrastructure itself is not trusted.
Publication Details
  • ACM Intl. Conf. on Multimedia Retrieval (ICMR)
  • Jun 11, 2018

Abstract

Close
Massive Open Online Course (MOOC) platforms have scaled online education to unprecedented enrollments, but remain limited by their rigid, predetermined curricula. Increasingly, professionals consume this content to augment or update specific skills rather than complete degree or certification programs. To better address the needs of this emergent user population, we describe a visual recommender system called MOOCex. The system recommends lecture videos {\em across} multiple courses and content platforms to provide a choice of perspectives on topics. The recommendation engine considers both video content and sequential inter-topic relationships mined from course syllabi. Furthermore, it allows for interactive visual exploration of the semantic space of recommendations within a learner's current context.

Abstract

Close
An enormous amount of conversation occurs online every day, including on chat platforms where multiple conversations may take place concurrently. Interleaved conversations lead to difficulties in not only following discussions but also retrieving relevant information from simultaneous messages. Conversation disentanglement aims to separate overlapping messages into detached conversations. In this paper, we propose to leverage representation learning for conversation disentanglement. A Siamese Hierarchical Convolutional Neural Network (SHCNN), which integrates local and more global representations of a message, is first presented to estimate the conversation-level similarity between closely posted messages. With the estimated similarity scores, our algorithm for Conversation Identification by SImilarity Ranking (CISIR) then derives conversations based on high-confidence message pairs and pairwise redundancy. Experiments were conducted with four publicly available datasets of conversations from Reddit and IRC channels. The experimental results show that our approach significantly outperforms comparative baselines in both pairwise similarity estimation and conversation disentanglement.
Publication Details
  • DIS 2018
  • Jun 1, 2018

Abstract

Close
Conversational agents stand to play an important role in supporting behavior change and well-being in many domains. With users able to interact with conversational agents through both text and voice, understanding how designing for these channels supports behavior change is important. To begin answering this question, we designed a conversational agent for the workplace that supports workers’ activity journaling and self-learning through reflection. Our agent, named Robota, combines chat-based communication as a Slack Bot and voice interaction through a personal device using a custom Amazon Alexa Skill. Through a 3-week controlled deployment, we examine how voice-based and chat-based interaction affect workers’ reflection and support self-learning. We demonstrate that, while many current technical limitations exist, adding dedicated mobile voice interaction separate from the already busy chat modality may further enable users to step back and reflect on their work. We conclude with discussion of the implications of our findings to design of workplace self-tracking systems specifically and to behavior-change systems in general.
Publication Details
  • International Conference on Robotics and Automation
  • May 21, 2018

Abstract

Close
Convolutional Neural Networks (CNN) have successfully been utilized for localization using a single monocular image [1]. Most of the work to date has either focused on reducing the dimensionality of data for better learning of parameters during training or on developing different variations of CNN models to improve pose estimation. Many of the best performing works solely consider the content in a single image, while the context from historical images is ignored. In this paper, we propose a combined CNN-LSTM which is capable of incorporating contextual information from historical images to better estimate the current pose. Experimental results achieved using a dataset collected in an indoor office space improved the overall system results to 0.8 m & 2.5° at the third quartile of the cumulative distribution as compared with 1.5 m & 3.0° achieved by PoseNet [1]. Furthermore, we demonstrate how the temporal information exploited by the CNN-LSTM model assists in localizing the robot in situations where image content does not have sufficient features.
Publication Details
  • International Conference on Robotics and Automation
  • May 21, 2018

Abstract

Close
In this paper, we propose a novel solution to optimize the deployment of (RF) beacons for the purpose of indoor localization. We propose a system that optimizes both the number of beacons and their placement in a given environment. We propose a novel cost-function, called CovBSM, that allows to simultaneously optimize the 3-coverage while maximizing the beacon spreading. Using this cost function, we propose a framework that maximize both the number of beacons and their placement in a given environment. The proposed solution accounts for the indoor infrastructure and its influence on the (RF) signal propagation by embedding a realistic simulator into the optimization process.