The site’s research projects will focus on equipping students with skills and knowledge related to big data analytics in cyber-physical systems (CPS). Research projects will involve combinations of Application Areas and Analytics Dimensions shown in the following figure.
Example Projects for Systems and Architecture (L1)
Project 1 Title: Reliability Analysis of Cloud Storage Systems
Advisors: Drs. Mai Zheng, Yuho Jin, Satyajayant Misra
Description: The large amount of data generated in CPS are stored and managed remotely in the cloud. Behind the scenes, the cloud storage systems consist of individual machines, and there are local systems running on individual machines to support the cloud layer. Many recent studies have shown that even the relatively mature local storage systems may exhibit erroneous behaviors under various conditions. The potential for the relatively new cloud storage systems, which introduce additional distributed management layers on top of the local systems, to be erroneous is even higher. Thus it is important as well as urgent that we perform further in-depth analysis on the cloud storage systems, which are becoming increasingly responsible for protecting and storing more data.
Research Objective: Understand and improve the reliability of cloud storage systems.
Student Learning: Students will learn the fundamental concepts of cloud infrastructures including the classic Hadoop distributed file system as well as the state-of-the-art cloud management platform Open-Stack. In addition, students will gain hands-on experience on building and analyzing the cloud systems.
Project 2 Title: Big Graph Processing In Memory
Advisor: Drs. Yuho Jin, Huiping Cao, Wen Xu
Description: Many CPS are envisioned to be a large network of connected devices. A lot of the research in CPS will be modeled as large graph problems. Graph processing results in large amounts of irregular memory accesses, which makes caching less effective and increases pressure on memory bandwidth. One of the promising architectures targeted at tackling this graph processing challenge is 3D memory stacking, which tightly couples logic and memory. 3D memory stacking architecture enables computation in memory as well as computation in processors. Shared-memory graph algorithms will be used to evaluate performance gains in processing-in-memory. The major work is to find part of the critical code in graph processing benchmarks, converting it into in-memory processing commands, and modeling graph benchmark performance in stacked memory architecture.
Research Objective: Identify the critical assembly instructions in graph algorithms and transform them into in-memory processing commands.
Student Learning: Students will learn performance profiling and modeling techniques in applications, systems, and architectures.
Example Projects for Models and Algorithms (L2)
Project 3 Title: Classification with Confidence in Disaster Response
Advisor: Drs. Laura E. Boucheron, Huiping Cao, Zachary O. Toups
Description: Effective data collection and analysis during disaster (e.g., fire, earthquake) response is a matter of life and death. In recent years, more smart devices, such as robots and Unmanned Aerial Vehicles (UAV) are being utilized in disaster response. These devices collect data (e.g., images of surrounding areas, levels of hazardous chemicals), make local decisions, and transmit the collected data to servers, which in turn integrate information from multiple devices and generate knowledge to coordinate the varied actors (people and devices). This research project will explore and design classification algorithms for the identification of critical objects (e.g., people) and areas from this large scale data to guide first responders to make prompt responses. Bayesian-formulated classification methods have recently been very successful in dealing with large, noisy, and heterogeneous data. SVMs have been applied in a wide range of application domains. The probabilistic generalization of an SVM, the Relevance Vector Machine (RVM), however, has not been applied much to real-world data. One large advantage of RVMs is their ability to define a posterior probability distribution of class membership rather than a hard decision. The use of this distribution to determine the confidence of a classification decision can be effective in a wide range of application areas, but has been relatively unexplored. In this project, we will use RVMs to develop classification algorithms for disaster response data.
Research Objective: Study the interpretation of RVM posterior probability distributions as a measure of classifier confidence and compare with other common confidence measures.
Student Learning: Students will learn important theoretical foundations of classification, skills in implementing algorithms on real-world and large datasets, and critical thinking for analyzing and comparing results.
Project 4 Title: Feature Extraction from Power Disturbance Events in Smart Grids
Advisors: Drs. Huiping Cao, Laura E. Boucheron, Sukumar Brahma, Satish Ranade
Description: Disturbances (e.g., loss of generation or load, faults) occurring in a power system can have significant impact on system operation and stability. Some studies have been carried out for identification and classification of disturbance events recorded by wide-area measurement systems in transmission networks. However, a study that comprehensively processes, interprets, and classifies all the disturbance data from multiple measurement devices over an extended time-period has not been carried out. This project will perform this task by exploring the techniques to classify disturbance events from the measurements. The success of time series classification has brought us many mature classification algorithms, such as Decision Trees, k-Nearest-Neighbor (kNN), Support Vector Machine (SVM) , Neural Network, and Bayesian Classifier. Mature and classical algorithms utilizing different techniques have been implemented in several machine learning and data mining toolkits, such as MATLAB, R, Weka, and LIBSVM. However, classifying disturbance events, which has its specific challenges and is critical for power systems, has not received much attention. This project will investigate and compare classical and novel feature extraction techniques to extract features from disturbance sequences to classify disturbances.
Research Objective: Decide the best feature extraction techniques for the classification of disturbance types in real time.
Student Learning: Students will learn the fundamental theories of power systems and the techniques to extract features from high-dimensional data and to classify them.
Project 5 Title: Discovering Influential Entities in Smart Grid Networks
Advisors: Drs. Wen Xu, Satyajayant Misra, Huiping Cao, Sukumar Brahma, Satish Ranade
Description: Recently, there has been a big effort to marry the concepts of the smart grid and the Internet of Things (IoT). The smart grid will enable two-way information flow and direct involvement of customers in the energy marketplace. Smart grid customers will form a virtual network, which will contain rich customer information and market behavioral information. Mining customers’ data in this network and integrating information from online social networks will help utilities make better energy decisions. The objective of this project is to formulate the problem of identifying a small group of influential entities (e.g., customers) in the customer networks of smart grids; develop practical social influence models and apply efficient algorithms in those models. When designing social influence models, we will consider customer interests and engagement, both positive and negative impacts of customers, and account for limited spread of influence, since a message cannot be propagated in a network indefinitely as shown in recent studies.
Research Objective: Develop models to accurately identify influential customers in smart grid networks and social influence propagation by using customer data.
Student Learning: Students will learn theories and models on how social influence propagates and useful techniques to analyze large amount of network data.
Example Projects for Visualization (L3)
Project 6 Title: Geospatial Interfaces for Disaster Data Collection
Advisors: Drs. Zachary O. Toups, Huiping Cao, Laura E. Boucheron
Description: Disaster response planning and coordination requires a group of people to collaboratively make sense of numerous incoming data feeds, many of which are served by deployed responders. Deployed responders often rely on paper to collect data [personal communication Jeff Saunders, Operations Chief, Texas Task Force 1], which impedes fast decision-making. As deployed responders shift to using mobile information technology, decision makers now need to quickly make sense of the incoming heterogeneous, but fast data feeds. This project will investigate interfaces to support disaster responders in collecting and organizing data and to make decisions. The project will develop and evaluate geospatial visualization interfaces to aid decision making.
Research Objective: Determine the best interface designs to support understanding of large data sets. Determine best means of presenting information to support decision making under stress.
Student Learning: Students will learn the fundamental theories of interface design and evaluation. Students will work with advanced technology, such as multi-touch displays and wearable computers.