The Department has technical reports dating from 1992, from both the Department of Computer Science (starting with GMU-CS) and the former Department of Information and Software Engineering (starting with ISE or ISSE). Click on any of the years at right to view and download technical reports for that year.

Submission Instructions

To submit a technical report, follow the instructions given in "How To Make a Technical Report (Rev. 3)" (Technical Report GMU-CS-TR-2011-0). The technical report maintainer is tr-admin@cs.gmu.edu

The data archive (.zip) accompanying the instructions contains useful things to help you submit your technical report:

In short: you will first contact the technical report maintainer for a technical report number. You then create the technical report according to the instructions above, plus a bibtex entry (again, see the instructions), and an optional zip file containing other data you wish to accompany the technical report. Submit these three items to the maintainer and we'll get your report posted.

GMU-CS-TR-2025-1

LLM-Driven Knowledge Graph Construction for Human Smuggling Networks

Dipak Meher and Carlotta Domeniconi and Guadalupe Correa-Cabrera

GMU-CS-TR-2025-1.pdf

Human smuggling networks are increasingly adaptive and difficult to analyze. Legal case documents offer valuable insights but are unstructured, lexically dense, and filled with ambiguous or shifting references—posing challenges for automated knowledge graph (KG) construction. Existing KG methods often rely on static templates and lack coreference resolution, while recent LLM-based approaches frequently produce noisy, fragmented graphs due to hallucinations, and duplicate nodes caused by a lack of guided extraction. We propose CORE-KG, a modular framework for building interpretable KGs from legal texts. It uses a two-step pipeline: (1) type-aware coreference resolution via sequential, structured LLM prompts, and (2) entity and relationship extraction using domain-guided instructions, built on an adapted GraphRAG framework. CORE-KG reduces node duplication by 33.28%, and legal noise by 38.37% compared to a GraphRAG-based baseline—resulting in cleaner and more coherent graph structures. These improvements make CORE-KG a strong foundation for analyzing complex criminal networks.

GMU-CS-TR-2025-2

Simulate Less, Expect More: Bringing Robot Swarms to Life via Low-Fidelity Simulations

Ricardo Vega and Kevin Zhu and Sean Luke and Maryam Parsa and Cameron Nowzari

GMU-CS-TR-2025-2.pdf

This paper proposes a novel methodology for addressing the simulation-reality gap for multi-robot swarm systems. Rather than immediately try to shrink or 'bridge the gap' anytime a real-world experiment failed that worked in simulation, we characterize conditions under which this is actually necessary. When these conditions are not satisfied, we show how very simple simulators can still be used to both (i) design new multi-robot systems, and (ii) guide real-world swarming experiments towards certain emergent behaviors when the gap is very large. The key ideas are an iterative simulator-in-the-design-loop in which real-world experiments, simulator modifications, and simulated experiments are intimately coupled in a way that minds the gap without needing to shrink it, as well as the use of minimally viable phase diagrams to guide real world experiments. We demonstrate the usefulness of our methods on deploying a real multi-robot swarm system to successfully exhibit an emergent milling behavior.

GMU-CS-TR-2023-1

So You Want to Write a Patch Editor

Sean Luke

GMU-CS-TR-2023-1.pdf

A patch editor is a software tool which assists in the programming of electronic music synthesizers. Synthesizers have a long history remote programmability via a communication protocol called MIDI, but there are many complexities involved in building a general-purpose patch editor tool for a large number of machines from a wide variety of manufacturers. This document introduces the reader to how synthesizers are remotely programmed, provides a tutorial with three example synthesizers, and discusses a variety of issues and challenges involved in building patch editors.

GMU-CS-TR-2023-2

HoloAAC: A Mixed Reality AAC Application for People with Expressive Language Difficulties

Liuchuan Yu and Huining Feng and Rawan Alghofaili and Boyoung Byun and Tiffany O'Neal and Swati Rampalli and Yoosun Chung and Vivian Genaro Motti and Lap-Fai Yu

GMU-CS-TR-2023-2.pdf

We present a novel AAC application, HoloAAC, based on mixed reality that helps people with expressive language difficulties communicate in grocery shopping scenarios via a mixed reality device. A user, who has difficulty in speaking, can easily convey their intention by pressing a few buttons. Our application uses computer vision techniques to automatically detect grocery items, helping the user quickly locate the items of interest. In addition, our application uses natural language processing techniques to categorize the sentences to help the user quickly find the desired sentence. We conducted case studies to evaluate our mixed reality-based application with 7 participants with expressive language difficulties to validate the usability and feasibility. To the best of our knowledge, HoloAAC is the very first application that explores context-aware AAC based on mixed reality. We will open-source our holistic implementation.

GMU-CS-TR-2023-3

GADGET: A Green Assessment and Decision Guidance Tool for Optimial Investment in Inter-dependant Energy Infrastructures

Bedor Alyahya and Alexander Brodsky

GMU-CS-TR-2023-3.pdf

In response to the urgent need for climate action and the transition to cleaner energy, organizations and businesses are setting greenhouse gas emission reduction targets to achieve carbon neutrality. Achieving these targets requires informed investment decisions in a complex mix of interconnected infrastructures and resources related to energy service modalities. This paper introduces GADGET, a Green Assessment and Decision Guidance Tool. GADGET assists stakeholders in optimizing investment choices in various energy service modalities. It takes into account interconnected infrastructures, greenhouse gas emission reduction, total cost of ownership, and can be extended to cover additional performance metrics. GADGET's uniqueness lies in its ability to generate machine-generated mathematical programming (MP) models through a library of power-centric analytical models. Additionally, it is equipped with a pre-processor engine that generates representative operational demand windows, such as daily patterns. This feature reduces the number of decision variables while accounting for operational interactions. Empirical studies confirm GADGET's applicability to large-scale investment challenges, ensuring effective and efficient decision-making in energy systems.

GMU-CS-TR-2022-1

Opti-Soft+: A Recommender and Sensitivity Analysis for Optimal Software Feature Selection and Release Plannings

Boccanera, Fernando and Brodsky, Alexander

GMU-CS-TR-2022-1.pdf

Many approaches have been developed to increase the return on a software investment, but each one has drawbacks. Proposed in this paper is the Opti-Soft+ framework that addresses this problem by producing a software release schedule that maximizes the business value of investments in information systems that automate business processes. The optimal release schedule is the result of solving a mixed integer linear programming problem. Opti-Soft+ includes a formal optimization model, a Decision Guidance System that implements the model and a methodology. Opti-Soft+ is an extension of the Opti-Soft framework proposed earlier with (1) a refined cost model, (2) a technique for sensitivity analysis of the normalized cost per unit of production, and (3) an atomic business process model that is driven by output throughputs in addition to input throughputs.

GMU-CS-TR-2021-1

Modeling of COVID-19 Transmission Dynamics Extended with a Comprehensive Mitigation Protocol to Predict Health, Cost and Productivity Outcomes

Brodsky, Alexander and Tadakamalla, Anita and Brodsky, Shiri and Roess, Amira

GMU-CS-TR-2021-1.pdf

This paper reports on the development of a model of COVID-19 transmission dynamics that takes into account a comprehensive mitigation protocol to better inform decision makers on COVID-19 response. The comprehensive mitigation protocol includes (1) personal protection and social distancing, (2) use of smart applications for symptom reporting and contact tracing, (3) targeted testing based on identification of individuals with possible exposure and/or infection via symptom reporting and contact tracing, (4) surveillance testing, and (5) shelter, quarantine and isolation procedures. The proposed model (1) extends a common epidemiological discrete dynamic model with the comprehensive mitigation protocol, (2) uses Bayesian probability analysis to estimate the conditional probabilities of being in non-circulating epidemiological sub-compartments as a function of the mitigation protocol parameters, based on which it (3) estimates transition ratios among the compartments, and (4) computes a range of key performance indicators including health outcomes, mitigation cost and productivity loss. The proposed model can serve as a critical component for COVID-19 mitigation recommender systems, as part of a broader effort to support urgent pandemic response.

GMU-CS-TR-2021-2

A Decision Guidance System for COVID-19 Comprehensive Mitigation with Pareto-Optimal Health, Cost and Productivity Outcomes

Tadakamalla, Anita and McKerley, Paul and Brodsky, Alexander and Roess, Amira

GMU-CS-TR-2021-2.pdf

This paper reports on the design and development of a decision guidance system to make actionable recommendations on a COVID-19 comprehensive mitigation protocol that is Pareto-Optimal in terms of health outcomes, mitigation cost and productivity loss. The comprehensive mitigation protocol includes personal protection and social distancing; use of smart applications for symptom reporting and contact tracing; targeted testing based on identification of individuals with possible exposure and/or infection via symptom reporting and contact tracing; random surveillance testing, and; shelter, quarantine and isolation procedures. The decision guidance system (1) gets, as input, expert-generated configurations of epidemiological parameters and assumptions on population behavior, (2) precomputes a database of discretized Pareto-optimal mitigation protocol alternatives based on which it (3) provides decision makers an iterative methodology of (a) Pareto-optimal KPI graphing and trade-off analysis (between health, cost and productivity outcomes), (b) detailed comparison of selected Pareto-optimal mitigation protocol alternatives, and (c) what-if analysis for selected protocol alternatives, including disease progression over the time horizon and sensitivity analysis to refine and converge on the mitigation protocol to be used.

GMU-CS-TR-2021-3

A Decision Guidance System for Optimal Infrastructure Investments

Bedor Alyahya and Alexander Brodsky

GMU-CS-TR-2021-3.pdf

We design and develop an extensible model and a decision guidance system for making actionable recommendations on investments in heterogeneous infrastructure service networks. The model expresses the cash flows, as well as performance indicators, such as total cost of ownership and carbon emissions, as a function of both investment and operational controls within physical constraints of heterogeneous infrastructures and of balancing resource flows. Uniquely, it is designed to make Pareto-optimal investment decisions under the assumption of optimal operational controls over the time horizon. We also develop an extensible library of domain-specific operational analytic models for infrastructure components, initially for desalination and water systems, including pumps, renewable energy sources, water and power storage, and Revers Osmosis desalination units. Finally, we conduct and report on a feasibility study for this domain to demonstrate the ability to solve realistic size problems.

GMU-CS-TR-2020-1

Software Release Scheduling to Optimize Business Workflow Processes: Modeling and Decision Guidance

Fenando Boccanera and Alexander Brodsky

GMU-CS-TR-2020-1.pdf

Many software development projects fail because they do not deliver sufficient business benefit to justify the investment. Existing approaches to estimating business benefit of software development adopt unrealistic assumptions which produce imprecise results. This paper focuses on removing this limitation for software projects that automate business workflow processes. For this class of projects, the paper proposes a new approach and a decision-guidance framework to select and schedule software features over a sequence of software releases as to maximize the net present value of the combined cash flow of software development as well as the improved organizational business workflow. The uniqueness of the proposed approach is in precise modelling of the business workflow processes and the savings achieved by deploying new software functionality.

GMU-CS-TR-2020-2

Catalyzing the Agility, Accessibility, and Predictability of the Manufacturing-Entrepreneurship Ecosystem Through Design Environments and Markets for Virtual Things

Alexander Brodsky and Yotam Gingold and Thomas D. LaToza and Lap-Fai Yu and Xu Han

GMU-CS-TR-2020-2.pdf

Proposed is a fundamentally new approach to manufacturing as a service based on a market of virtual things: parameterized products and services that can be searched, composed and optimized, while hiding the underlying complexity of product designs and manufacturing service networks. The approach includes bootstrapping the market with novel computational techniques and tools to reuse the distributed wealth of existing product and process designs by generalizing them into models of virtual things. The goal is to catalyze the agility, accessibility and predictability of the manufacturing-entrepreneurship ecosystem, transforming the Future of Manufacturing.

GMU-CS-TR-2020-3

A Decision Guidance System for Optimal Operation of Hybrid Power Desalination Service Network

Bedor Alyahya and Alexander Brodsky

GMU-CS-TR-2020-3.pdf

While modeling and optimization of desalination systems' operation have been extensively studied, current approaches are hard-wired to specific designs and performance metrics, without the flexibility to reuse orextend these models. Bridging this gap, reported in this paper is the development of a formal analytic modeland a decision guidance system for desalination service networks that can be applied to a broad range of desalination designs and architectures. The model and the system are based on an extensible repository ofatomic component models, initially including models for pumps, renewable energy sources, water and powerstorage, and reverse osmosis units. An experimental study is conducted to demonstrate the flexibility of the model and system, and its scalability to support realistic size problems.

GMU-CS-TR-2019-1

A Learning-Based Approach for the Programming of Interior Layout Implementation

Hanqing Wang and Wei Liang and Lap-Fai Yu

GMU-CS-TR-2019-1.pdf

Given a scene layout like a room or a courtyard composed of objects, it is usually implemented manually due to the complicated state which results in a large searching space for the machine. In this paper, we propose a learning-based approach to program the implementation automatically. Our approach has two components. The main structure of our approach is the Monte Carlo Tree which searches the most valuable move for the current state. A neural network estimates the value for the leaf nodes of the searching tree. With the power of deep reinforcement learning, the network learns how to move the objects through millions of trial and error. We demonstrate our approach on different scenes and compare the performance of our approach with human performance in our experiments.

GMU-CS-TR-2019-2

Multiagent Adversarial Inverse Reinforcement Learning

Ermo Wei and Drew Wicke and Sean Luke

GMU-CS-TR-2019-2.pdf

Learning to coordinate is a hard task for reinforcement learning due to a game-theoretic pathology known as relative overgeneralization. To help deal with this issue, we propose two methods which apply forms of imitation learning to the problem of learning coordinated behaviors. The proposed methods have a close connection to multiagent actor-critic models, and will avoid relative overgeneralization if the right demonstrations are given. We compare our algorithms with MADDPG, a state-of-the-art approach, and show that our methods achieve better coordination in multiagent cooperative tasks.

GMU-CS-TR-2018-1

Model Driven Elasticity Control for Multi-Server Queues Under Traffic Surges in Cloud Environments

Venkat Tadakamalla and Daniel A. Menascé

GMU-CS-TR-2018-1.pdf

Many computer systems, such as Internet datacenters and cloud computing environments, consist of a multitude of servers that process user requests. Incoming requests that find all servers busy have to wait until a server becomes idle. This type of queuing system is known as a G/G/c system and has been extensively studied in the queuing literature under steady state conditions. This paper studies multi-server systems subject to workload surges of generic trapezoidal shapes during which the average arrival rate of requests exceeds the system's capacity. This paper's main contributions are: (1) Generic equations for surges of any shape. (2) A set of equations to estimate the impact of trapezoidal and triangular shaped surges on response time. (3) The design, implementation, and extensive evaluation of an autonomic controller for cloud server elasticity using a G/G/c simulator previously developed by the authors. The results show that our equations estimate with great accuracy the impact of surges on response time and that our autonomic controllers are able to successfully vary the number of servers to mitigate the impact of workload surges.

GMU-CS-TR-2018-2

Bounty Hunters as Dynamic Traveling Repairmen

Drew Wicke and Ermo Wei and Sean Luke

GMU-CS-TR-2018-2.pdf

Vehicle routing problems such as the multiagent Dynamic Traveling Repariman Problem (DTRP) are of interest to many fields and of increasing practical importance in light of advances in autonomous vehicles. DTRP is NP-hard, making approximation methods attractive. However current heuristic approaches do not adequately consider issues special to DTRP, such as fairness and variance, discontiguous-space scenarios, or approaches to equitably partitioning the task space. We tackle this problem in a novel way, using a multiagent task allocation technique called bounty hunting. In bounty hunting, agents compete to perform tasks non-exclusively in return for reward, and rapidly learn which agents are more adept at various tasks than others, divvying up the task space. We demonstrate that bounty hunting can perform efficiently in discontiguous environments, and can improve the bias and variance of the system while minimally affecting average waiting time. We show that Bounty Hunting Pareto dominates the current state-of-the-art heuristic, and is particularly good in large-scale scenarios.

GMU-CS-TR-2017-1

Compact Folding of Thick Origami via Stackin

Zhonghua Xi and Huangxin Wang and Yue Hao and Jyh-Ming Lien and In-Suk Choi

GMU-CS-TR-20167-1.pdf

Recent techniques enable folding planer sheets to create complex 3D shapes, however, even a small 3D shape can have large 2D unfoldings. The huge dimension of the flattened structure makes fabrication difficult. In this paper, we propose a novel approach for folding a single thick strip into two target shapes: folded 3D shape and stacked shape. The folded shape is an approximation of a complex 3D shape provided by the user. The provided 3D shape may be too large to be fabricated (e.g. 3D-printed) due to limited workspace. Meanwhile, the stacked shape could be the compactest form of the 3D shape which makes its fabrication possible. The compactness of the stacked state also makes packing and transportation easier. The key technical contribution of this work is an efficient method for finding strips for quadrilateral meshes without refinement. We demonstrate our results using both simulation and fabricated models.

GMU-CS-TR-2017-2

Design and Experimentation of an Automated Performance Evaluation Testbed for Self-Healing and Self-Adaptive Distributed Software Systems

Jason Porter and Daniel A. Menascé and Hassan Gomaa and Emad Albassam

GMU-CS-TR-2017-2.pdf

Evaluating the performance of distributed software systems is very challenging especially in the presence of failures and adaptation. Of particular interest to this paper is self-healing and self-adaptation middleware that detects failures of distributed software systems, analyzes their root causes, devises plans to recover from these failures, and executes these plans. Recovery plans may trigger software architecture adaptations, which may be also initiated by the need to maintain performance and availability goals. This paper focuses on the evaluation and testing of recovery and adaptation frameworks (RAF) for distributed component-based software systems. We present TESS, a testbed for automatically generating distributed software architectures and their corresponding runtime applications, deploying them to the nodes of a cluster, running many different types of experiments involving failures and adaptation, and collecting in a database the values of a variety of failure recovery and adaptation metrics. Queries can then be run against the database to provide a thorough and scientific analysis of the efficiency and/or effectiveness of a RAF. Additionally, this paper presents a case study of the use of TESS for the evaluation of a specific RAF, called DARE, developed by our group.

GMU-CS-TR-2017-3

Stochastic Optimization for Steady State Production Processes based on Deterministic Approximations

Mohan Krishnamoorthy and Alexander Brodsky and Daniel A. Menascé

GMU-CS-TR-2017-3.pdf

We consider steady-state production processes that produce a product and have feasibility constraints and metrics of cost and throughput that are stochastic functions of process controls. We propose an efficient stochastic optimization algorithm for the problem of finding process controls that minimize the expectation of cost while satisfying deterministic feasibility constraints and stochastic steady state demand for the output product with a given high probability. The proposed algorithm is based on (1) a series of deterministic approximations to produce a candidate set of near-optimal control settings for the production process, and (2) stochastic simulations on the candidate set using optimal simulation budget allocation methods. We demonstrate the proposed algorithm on a use case of a real-world heat-sink production process that involves contract suppliers and manufacturers as well as unit manufacturing processes of shearing, milling, drilling, and machining, and conduct an experimental study that shows that the proposed algorithm significantly outperforms four popular simulation-based stochastic optimization algorithms.

GMU-CS-TR-2016-1

Toward Decision Guidance Management Systems: Analytical Language and Knowledge Base

Alexander Brodsky and Juan Luo and M. Omar Nachawati

GMU-CS-TR-2016-1.pdf

Decision guidance systems are a class of decision support systems that are geared toward producing actionable recommendations, typically based on formal analytical models and techniques. This paper proposes the Decision Guidance Analytics Language (DGAL) for easy iterative development of decision guidance systems. DGAL allows the creation of modular, reusable and composable models that are stored in the analytical knowledge base independently of the tasks and tools that use them. Based on these unified models, DGAL supports declarative queries of (1) data manipulation and computation, (2) what-if prediction analysis, (3) deterministic and stochastic decision optimization, and (4) machine learning, all through formal reduction to specialized models and tools, and in the presence of uncertainty.

GMU-CS-TR-2016-2

Decentralized Software Architecture Discovery in Distributed Systems

Jason Porter and Daniel A. Menasce and Hassan Gomaa

GMU-CS-TR-2016-2.pdf

Software architecture recovery plays an increasingly important role in the evolution, maintenance, and run-time self-adaptation of modern software systems whose architecture may have become outdated or may not have previously existed. However, current approaches to architecture recovery take a centralized approach, in which the process is carried out from a single location. This proves inadequate in the case of large distributed systems which, due to size, consist of nodes that are disparately located and are highly dynamic in nature. This report presents DeSARM, a completely decentralized and automated approach for software architecture recovery of distributed systems based on gossiping and message tracing. Through message tracing, the technique is able to identify important architectural characteristics such as components and connectors, in addition to synchronous and asynchronous communication patterns. Furthermore, through its use of gossiping, it exhibits the properties of scalability, global consistency among participating nodes, self-organization, and resiliency to failures. The report discusses DeSARM's architecture and detailed design and demonstrates its properties through an analysis of small and large-scale experiments.

GMU-CS-TR-2016-3

Simple Agile RPL Multicast (SARCAST)

Kevin Andrea and Robert Simon

GMU-CS-TR-2016-3.pdf

Denial of Service (DoS) attacks serve to diminish the ability of the network to perform its intended function over time. As DoS attacks may permeate the veil of the Internet of Things (IoT) devices from the broader internet, it is vital that these devices are able to mitigate these attacks. If such a network of IoT devices were to fail during time of need, lives of patients, war fighters, or the sustainability of manufacturing processes may be placed at risk. This work implements and provide initial assessment on a strategy to mitigate a resource exhaustion DoS attack using a network multicast service, wherein such an attack may target the critical energy reserves of low power devices. The Simple Agile RPL Multicast (SARCAST) concept implemented herein uses an agile addressing scheme to reduce the efficacy of multicast-based DoS attacks on IoT devices.

GMU-CS-TR-2016-4

Unity: A NoSQL-based Platform for Building Decision Guidance Systems from Reusable Analytics Models

Mohamad Omar Nachawati and Alexander Brodsky and Juan Luo

GMU-CS-TR-2016-4.pdf

Enterprises across all industries increasingly depend on decision guidance systems (DGSs) to facilitate decision-making across all lines of business. Despite significant technological advances, current DGS development paradigms lead to a tight-integration of the analytics models, methods and underlying tools that comprise these systems, often inhibiting extensibility, reusability and interoperability. To address these limitations, this paper focuses on the development of the first NoSQL decision guidance management system (NoSQL-DGMS), called Unity, which enables decision-makers to build DGSs from a repository of analytics models that can be automatically reused for different analytics methods, such as simulation, optimization and machine learning. In this paper, we provide the Unity NoSQL-DGMS reference architecture, and develop the first implementation, which is centered around a modular analytics engine that symbolically executes and automatically reduces analytics models, expressed in JSONiq, into lower-level, tool-specific representations. We conduct a preliminary experimental study on the overhead of OPL optimization models automatically generated from JSONiq using Unity, compared with manually-crafted OPL models. Preliminary results indicate that the execution time of OPL models that are automatically reduced from JSONiq is within a small constant factor of corresponding, manually-crafted OPL models.

GMU-CS-TR-2016-5

Detecting ROP with Statistical Learning of Program Characteristics

Mohamed Elsabagh and Daniel Barbará and Dan Fleck and Angelos Stavrou

GMU-CS-TR-2016-5.pdf

Return-Oriented Programming (ROP) has emerged as one of the most widely used techniques to exploit software vulnerabilities. Unfortunately, existing ROP protections suffer from a number of shortcomings: they require access to source code and compiler support, focus on specific types of gadgets, depend on accurate disassembly and construction of Control Flow Graphs, or use hardware-dependent (microarchitectural) characteristics. In this paper, we propose EigenROP, a novel system to detect ROP payloads based on unsupervised statistical learning of program characteristics. We study, for the first time, the feasibility and effectiveness of using microarchitecture-independent program characteristics --- namely, memory locality, register traffic, and memory reuse distance --- for detecting ROP. We propose a novel directional statistics based algorithm to identify deviations from the expected program characteristics during execution. EigenROP works transparently to the protected program, without requiring debug information, source code or disassembly. We implemented a dynamic instrumentation prototype of EigenROP using Intel Pin and measured it against in-the-wild ROP exploits and on payloads generated by the ROP compiler ROPC. Overall, EigenROP achieved significantly higher accuracy than prior anomaly-based solutions. It detected the execution of the ROP gadget chains with 81% accuracy, 80% true positive rate, only 0.8% false positive rate, and incurred comparable overhead to similar Pin-based solutions.

GMU-CS-TR-2015-1

COVERT: Compositional Analysis of Android Inter-App Vulnerabilities

Hamid Bagheri and Alireza Sadeghi and Joshua Garcia and Sam Malek

GMU-CS-TR-2015-1.pdf

Android is the most popular platform for mobile devices. It facilitates sharing of data and services among applications using a rich inter-app communication system. While access to resources can be controlled by the Android permission system, enforcing permissions is not sufficient to prevent security violations, as permissions may be mismanaged, intentionally or unintentionally. Android's enforcement of the permissions is at the level of individual apps, allowing multiple malicious apps to collude and combine their permissions or to trick vulnerable apps to perform actions on their behalf that are beyond their individual privileges. In this paper, we present COVERT, a tool for compositional analysis of Android inter-app vulnerabilities. COVERT's analysis is modular to enable incremental analysis of applications as they are installed, updated, and removed. It statically analyzes the reverse engineered source code of each individual app, and extracts relevant security specifications in a format suitable for formal verification. Given a collection of specifications extracted in this way, a formal analysis engine (e.g., model checker) is then used to verify whether it is safe for a combination of applications --- holding certain permissions and potentially interacting with each other --- to be installed together. Our experience with using COVERT to examine over 200 real-world apps corroborates its ability to find inter-app vulnerabilities in bundles of some of the most popular apps on the market.

GMU-CS-TR-2015-2

Folding and Unfolding Origami Tessellation by Reusing Folding Path

Zhonghua Xi and Jyh-Ming Lien

GMU-CS-TR-2015-2.pdf

Recent advances in robotics engineering have enabled the realization of self-folding machines. Rigid origami is usually used as the underlying model for the self-folding machines whose surface remains rigid during folding except at joints. A key issue in designing rigid origami is foldability that concerns about finding folding steps from a flat sheet of crease pattern to a desired folded state. Although recent computational methods allow rapid simulation of folding process of certain rigid origamis, these methods can fail even when the input crease pattern is extremely simple. In this paper, we take on the challenge of planning folding and unfolding motion of origami tessellations, which are composed of repetitive crease patterns. The number of crease lines of a tessellation is usually large, thus searching in such high dimensional configuration space with the requirement of maintaining origami rigidity is nontrivial. We propose a motion planner that takes symmetry into consideration and reuses folding path found on the essential crease pattern. Both of these strategies enable us to fold large origami tessellation much more efficiently than the existing methods. Our experimental results show that the proposed method successfully folds several types of rigid origami tessellations that the existing methods fail to fold.

GMU-CS-TR-2015-3

Nearly Convex Segmentation of Polyhedra Through Convex Ridge Separation

Guilin Liu and Zhonghua Xi and Jyh-Ming Lien

GMU-CS-TR-2015-3.pdf

Decomposing a 3D model into approximately convex components has gained more attention recently due to its ability to efficiently generate small decomposition with controllable concavity bound. However, current methods are computationally expensive and require many user parameters. These parameters are usually unintuitive and add unnecessary obstacles in processing a large number of meshes and meshes that are generated online in applications such as video games. In this paper, we investigate an approach that decomposes a mesh P based on the identification of convex ridges. Intuitively, convex ridges are the protruding parts of the mesh P. Our method, called CORISE, extracts nearly convex components of P by separating each convex ridge from all the other convex ridges. Through the new concept of residual concavity CORISE requires only a single user parameter: concavity tolerance. We show that our method can generate noticeably better segmentation in significant shorter time than the current state-of-art methods. Finally, we demonstrate applications of CORISE, including physically-based simulation, cage generation and model repairing.

GMU-CS-TR-2015-4

Plan Folding Motion for Rigid Origami via Discrete Domain Sampling

Zhonghua Xi and Jyh-Ming Lien

GMU-CS-TR-2015-4.pdf

Self-folding robot is usually modeled as rigid origami, a class of origami whose entire surface remains rigid during folding except at crease lines. In this work, we focus on finding valid folding motion that brings the origami from the unfolded state continuously to the folded state. Although recent computational methods allow rapid simulation of folding process of certain rigid origamis, these methods can fail even when the input crease pattern is extremely simple but with implicit folding orders. Moreover, due to the rigidity requirement, the probability of generating a valid configuration via uniform sampling is zero, which greatly hinders that applicability of traditional sampling-based motion planners. We propose a novel sampling strategy that samples in the discrete domain. Our experimental results show that the proposed method could efficiently generate valid configurations. Using those configurations, the planners successfully fold several types of rigid origamis that the existing methods fail to fold and could discover multiple folding paths for Multi-DOF origamis.

GMU-CS-TR-2015-5

Automated Dynamic Enforcement of Synthesized Security Policies in Android

Hamid Bagheri and Alireza Sadeghi and Reyhaneh Jabbarvand and Sam Malek

GMU-CS-TR-2015-5.pdf

As the dominant mobile computing platform, Android has become a prime target for cyber-security attacks. Many of these attacks are manifested at the application level, and through the exploitation of vulnerabilities in apps downloaded from the popular app stores. Increasingly, sophisticated attacks exploit the vulnerabilities in multiple installed apps, making it extremely difficult to foresee such attacks, as neither the app developers nor the store operators know a priori which apps will be installed together. This paper presents an approach that allows the end-users to safeguard a given bundle of apps installed on their device from such attacks. The approach, realized in a tool, called DroidGuard, combines static code analysis with lightweight formal methods to automatically infer security-relevant properties from a bundle of apps. It then uses a constraint solver to synthesize possible security exploits, from which fine-grained security policies are derived and automatically enforced to protect a given device. In our experiments with over 4,000 Android apps, DroidGuard has proven to be highly effective at detecting previously unknown vulnerabilities as well as preventing their exploitation.

GMU-CS-TR-2015-6

A Group Package Recommender Based on Learning Group Preferences, Multi-Criteria Decision Optimization, and Voting

Hanan Mengash and Alexander Brodsky

GMU-CS-TR-2015-6.pdf

This paper proposes a Group Package Recommender (GPR) framework, which provides recommendations on dynamically defined packages of products and services. It focuses on extending recommender systems in three ways: (1) to consider composite, rather than atomic, recommendations; (2) to deal with multiple, rather than single, criteria associated with recommendations; and, most importantly, (3) to support groups of users rather than individual users. This framework is based on: (1) defining the space of alternatives; (2) eliciting the utility function for each individual decision maker; (3) estimating the group utility function; (4) using the group utility function to find an optimal recommendation alternative; (5) constructing a set of diverse recommendations which contains the optimal recommendation alternative; and (6) applying alternative voting methods from social choice theories, to refine the recommendations. To evaluate the group recommender performance under each applied voting method, a preliminary experimental real-world user study is conducted, which shows that the proposed framework is able to produce a small set of recommendations that retains near optimal recommendations in term of precision and recall.

GMU-CS-TR-2015-7

Tailoring Group Package Recommendations to Large Heterogeneous Groups Based on Multi-Criteria Optimization

Hanan Mengash, and Alexander Brodsky

GMU-CS-TR-2015-7.pdf

A group package recommender framework is proposed to provide recommendations on dynamically defined packages of products and services to large heterogeneous groups based on multi-criteria optimization. The framework is based on: (1) sampling the entire large group; (2) eliciting the utility function for each member; (3) clustering the sample heterogeneous group into a number of relatively small homogeneous subgroups; (4) extracting the representative utility function for each subgroup; (5) estimating the utility function of the entire group, and use it to find an optimal recommendation alternative; (6) diversify recommendations across those subgroups; (7) applying a group decision-making method, to refine the recommendations. A preliminary experimental study is conducted, which shows that the proposed framework is able to produce a small set of ranked recommendations that retains close to optimal precision and recall, as compared to the baseline method applied directly to original large groups.

GMU-CS-TR-2015-8

Optimizing Stochastic Temporal Manufacturing Processes with Inventories: An Efficient Heuristic Algorithm based on Deterministic Approximations

Mohan Krishnamoorthy and Alexander Brodsky and Daniel A. Menascé

GMU-CS-TR-2015-8.pdf

This paper deals with stochastic temporal manufacturing processes with work-in-process inventories in which multiple products are produced from raw materials and parts. The processes may be composed of subprocesses, which, in turn may be either composite or atomic, i.e., a machine on a manufacturing floor. We assume that machines' throughput is stochastic and so are work-in-process inventories and costs. We consider the problem of optimizing the process, that is, finding throughput expectation setting for each machine at each time point over the time horizon as to minimize the total cost of production subject to satisfying the production demand with a requested probability. To address this problem, we propose an efficient iterative heuristic algorithms that is based on (1) producing high quality candidate machine settings based on a deterministic approximation of the stochastic problem, and (2) running stochastic simulations to find the best machine setting out of the produced candidates using optimal simulation budget allocation methods. We conduct an experimental study that shows that our algorithm significantly outperforms four popular simulation-based optimization algorithms.

GMU-CS-TR-2015-9

Electric Power Consortia: Toward Decision Support Based on Market Optimization

Hesham Altaleb and Alexander Brodsky

GMU-CS-TR-2015-9.pdf

Proposed in this paper is an extensible decision support system framework to facilitate Commercial and Industrial entities forming a consortium to collaborate on their electric power supply and demand in order to streamline their consumption and reduce their costs. The collaborative framework includes the structure of market setting, participants' bids, and a market resolution which produces a schedule of how power components are controlled as well as the resulting payment by market participants. We also define four properties that the market resolution must satisfy, namely, feasibility, Pareto-optimality, Nash equilibrium, and equal collaboration profitability. Furthermore, we develop a market resolution algorithm, based on a formal optimization model and prove that it satisfies the desirable market properties.

GMU-CS-TR-2015-10

Obfuscation-Resilient, Efficient, and Accurate Detection and Family Identification of Android Malware

Joshua Garcia and Mahmoud Hammad and Bahman Pedrood and Ali Bagheri-Khaligh and Sam Malek

GMU-CS-TR-2015-10.pdf

The number of Android malware apps are increasing very quickly. Simply detecting and removing malware apps is insufficient, since they can damage or alter other files, data, or settings; install additional applications; etc. To determine such behavior, a security engineer can significantly benefit from identifying the specific family to which an Android malware belongs. Techniques for detecting Android malware, and determining their families, lack the ability to deal with obfuscations (i.e., transformations of application to thwart detection). Moreover, some of the prior techniques are highly inefficient, making them inapplicable for real-time detection of threats. To address these limitations, we present a novel machine learning-based Android malware detection and family identification approach, RevealDroid, that provides selectable features. We assess RevealDroid to determine a selection of features that enable obfuscation resiliency, efficiency, and accuracy for detection and family identification. We assess RevealDroid's accuracy and obfuscation resilience on an updated dataset of malware from a diverse set of families, including malware obfuscated using various transformations, and compare RevealDroid against an existing Android malware-family identification approach and another Android malware detection approach.

GMU-CS-TR-2015-11

Discerning Machine Learning Degradation via Ensemble Classifier Mutual Agreement Analysis

Charles Smutz and Angelos Stavrou

GMU-CS-TR-2015-11.pdf

Machine learning classifiers are a crucial component of modern malware and intrusion detection systems. However, past studies have shown that classifier-based detection systems are susceptible to evasion attacks in practice. Improving the evasion resistance of learning based systems is an open problem. In this paper, we analyze the effects of mimicry attacks on real-world classifiers. To counter such attacks, we introduce a novel approach that not only exposes attempts to evade the classifier, but also evaluates the ability of the system to operate reliably. We advance mutual agreement analysis for ensemble classifiers: during the detection operation, when a sufficient number of votes from individual classifiers disagree, the ensemble classifier prediction is shown to be unreliable. This method allows the detection of classifier evasion without additional external information. To evaluate our approach, we used thousands of both malicious and benign PDF documents. Applying our method to two recent evasion attack studies, we show that most evasion scenarios are detected. Furthermore, we show that our approach can be generalized to weaken the effectiveness of the Gradient Descent and Kernel Density Estimation attacks against Support Vector Machines (SVM) by creating an ensemble classifier using many base SVMs. We discovered that feature bagging is the most important property for enabling mutual agreement based evasion detection.

GMU-CS-TR-2015-12

Near-optimal Allocation of VMs from IaaS Providers by SaaS Providers

Aldhalaan, Arwa and Menasce, Daniel A.

GMU-CS-TR-2011-12.pdf

Software as a Service (SaaS) allows companies and individuals to use software, hosted and managed by a SaaS provider, on a pay-per-use basis instead of paying for the entire upfront, maintenance, and upgrade cost. SaaS providers can lease their computing infrastructure to instantiate VMs that run their software services from Infrastructure as a Service (IaaS) providers on a pay per use basis. SaaS customers can subscribe to and unsubscribe from a software service at anytime. Thus, the SaaS cloud provider should dynamically determine the number of needed VMs to run software services as a function of the demand in a way that minimizes the SaaS cost of using VMs from an IaaS but at the same time guaranteeing an agreed upon Quality of Service (QoS) to the SaaS customers. This report presents two heuristic techniques that can be used by SaaS providers to determine the type and quantity of VMs to be leased in order to best satisfy customer demands for software services. Our experiments showed that the number of states visited by the proposed method is very low (on the order of 10^-4 of the entire space) while the solution obtained is 2% more expensive in most cases, 13% more expensive in a few cases, and 31% more expensive in only one case, when compared with the optimal solution.

GMU-CS-TR-2015-13

A Taxonomy of Job Scheduling on Distributed Computing Systems

Raquel Lopes and Daniel Menascé

GMU-CS-TR-2015-XX.pdf

Hundreds of papers on job scheduling for distributed systems are published every year and it becomes increasingly difficult to classify them. Our analysis revealed that half of these papers are barely cited. This technical report presents a general taxonomy for scheduling problems and solutions in distributed systems. This taxonomy was used to classify and make publicly available the classification of 100 scheduling problems and their solutions. These 100 problems were further clustered into eight groups based on the features of the taxonomy. The proposed taxonomy will facilitate researchers to build on prior art, increase new research visibility, and minimize redundant effort.

GMU-CS-TR-2015-14

Decision Guidance Approacgh to Power Network Analysis: Developing an Optimization Technology Solution to Support Management and Operations of Electric Power Components

Roberto Levy and Alexander Brodsky and Juan Luo

GMU-CS-TR-2015-14.pdf

This paper focuses on developing an approach and technology for actionable recommendations on the operation of components of an electric power network. The overall direction of this research is to model the major components of a Hybrid Renewable Energy System (HRES), including power generation, transmission/distribution, power storage, energy markets, and end customer demand (residential and commercial). First, we propose a conceptual diagram notation for power network topology, to allow the representation of an arbitrary complex power system. Second, we develop a formal mathematical model that describes the HRES optimization framework, consisting of the different network components, their respective costs, and associated constraints. Third, we implement the HRES optimization problem solution through a mixed-integer linear programming (MILP) model by leveraging IBM Optimization Programming Language (OPL) CPLEX Studio. Lastly, we demonstrate the model through an example of a simulated network, showing also the ability to support sensitivity / what-if analysis, to determine the behavior of the network under different configurations.

2014

GMU-CS-TR-2014-1

Epochs: Trace-Driven Analytical Modeling of Job Execution Times

Daniel A. Menascé and Shouvik Bhardan

GMU-CS-TR-2014-1.pdf

Queuing theory has extensively studied the problem of estimating job execution times in steady state conditions both in the case of single queues and queuing networks. This paper discusses the use of closed Queuing Network (QN) models during finite time intervals to estimate the execution time of jobs submitted to a computer system. More specifically, the paper presents the Epochs algorithm that allows job traces and randomly generated job inter-arrival times to be used as input to analytic models of computer systems. In other words, the paper combines methods used in discrete event simulation for the characterization of job arrivals with efficient analytic methods to model contention for resources in a computer system. The Epochs algorithm was validated against experimental results using jobs derived from a micro-benchmark and real jobs. The validation shows that the absolute relative error between measurements and execution time predictions obtained with the Epochs algorithm is below 10% in most cases and is at most 15%.

GMU-CS-TR-2014-2

Trace Driven Analytic Modeling for Evaluating Schedulers for Clusters of Computers

Shouvik Bhardan and Daniel A. Menascé

GMU-CS-TR-2014-2.pdf

Large enterprises use clusters of computers with varying computing power to process workloads that are heterogenous in terms of the type of jobs and the nature of their arrival processes. The scheduling of jobs from a workload has a significant impact on their execution times. This paper presents a trace-driven analytic model (TDAM) method that can be used to assess the impact of different schedulers on job execution times. The TDAM approach uses an implementation of the scheduler to generate job traces that are fed into analytic models of the computers in the cluster. These analytic models use closed queuing network methods to estimate congestion at the various nodes of the cluster. The paper demonstrates the usefulness of the TDAM method by showing how four different types of schedulers affect the execution times of jobs derived from well-known benchmarks. The paper also demonstrates how the method can be applied to computer clusters such as the ones used to run MapReduce jobs.

GMU-CS-TR-2014-3

Temporal Manufacturing Query Language (tMQL) for Domain Specific Composition, What-if Analysis, and Optimization of Manufacturing Processes With Inventories

Mohan Krishnamoorthy and Alexander Brodsky and Daniel A. Menascé

GMU-CS-TR-2014-3.pdf

Smart manufacturing requires streamlining operations and optimizing processes at a global and local level. This paper considers manufacturing processes that involve physical or virtual inventories of products, parts and materials, that move from machine to machine. The inventory levels vary with time and are a function of the configuration settings of the machines involved in the process. These environments require analysis, e.g., answering what-if questions, and optimization to determine optimal operating settings for the entire process. The modeling complexities in performing these tasks are not always within the grasp of production engineers. To address this problem, the paper proposes the temporal Manufacturing Query Language (tMQL) that allows the composition of modular process models for what-if analysis and decision optimization queries. tQML supports an extensible and reusable model knowledge base against which declarative queries can be posed. Additionally, the paper describes the steps to translate the components of a tMQL model to input data files used by a commercial optimization solver.

GMU-CS-TR-2014-4

Toward Reusable Models: System Development for Optimization Analytics Language (OAL)

Abdullah Alrazgan and Alexander Brodsky

GMU-CS-TR-2014-4.pdf

Decision optimization has been broadly used in many areas including economics, finance, manufacturing, logistics, and engineering. However, optimization modeling used for decision optimization presents two major challenges. First, it requires expertise in operation research and mathematical programming, which most users, including software developers, engineers, business analysts and end-users, typically do not have. Second, optimization models are usually highly customized, not modular, and not extensible. Sustainable Process Analytics Formalism (SPAF) has been recently proposed at National Institute of Standards and Technology (NIST) to address these challenges, by transitioning from the redevelopment of optimization solutions from scratch towards maintaining an extensible library of analytical models, against which declarative optimization queries can be asked. However, the development of practical algorithms and a system for SPAF presents a number of technical challenges, which have not been addressed. In this paper we (1) propose an object-oriented extension of SPAF, named Optimization Analytics Language (OAL); (2) present OAL system development based on methods to reduce OAL models and queries into a formal mathematical programming formulation; (3) showcase OAL through a simple case study in the domain of manufacturing; and (4) conduct a preliminary experimental study to assess the overhead introduced by OAL.

GMU-CS-TR-2014-5

Building a Hierarchical Finite-State Automaton: A Primer

Sean Luke

GMU-CS-TR-2014-5.pdf

This document is accompanied by the following data archive: GMU-CS-TR-2014-0.zip

This primer introduces concepts in hierarchical finite-state automata (HFA) and their use, gives a rough formalism, and then builds an entire HFA library, with all the trimmings, in Lua. The primer is intended to educate the reader on how HFA can be used, how they are programmed, and how HFA libraries are built.

GMU-CS-TR-2014-6

Decision Guidance Analytics Language (DGAL): Toward Reusable Knowledge Base Centric Modeling

Alexander Brodsky and Juan Luo

GMU-CS-TR-2014-6.pdf

Decision guidance systems are a class of decision support systems that are geared toward producing actionable recommendations, typically based on formal analytical models and techniques. This paper proposes the Decision Guidance Analytics Language (DGAL) for easy iterative development of decision guidance systems. DGAL allows the creation of modular, reusable and composable models that are stored in the analytical knowledge base independently of the tasks and tools that use them. Based on these unified models, DGAL supports declarative queries of (1) data manipulation, (2) what-if prediction analysis, (3) decision optimization based on mathematical and constraint programming, and (4) machine learning, all through formal reduction to specialized models and tools, and in the presence of uncertainty.

GMU-CS-TR-2014-7

Document Content Layout Based Exploit Protections

Charles Smutz and Angelos Stavrou

GMU-CS-TR-2014-7.pdf

Malware laden documents are a common exploit vector, especially in targeted attacks. Most current approaches seek to detect the malicious attributes of documents whether through signature matching, dynamic analysis, or machine learning. We take a different approach: we perform transformations on documents that render exploits inoperable while maintaining the visual interpretation of the document intact. Our exploit mitigation techniques are similar in effect to address space layout randomization, but we implement them through permutations to the document layout.

Using document content layout randomization, we randomize the data block layout of Microsoft OLE files in a manner similar to the inverse of a file system defragmention tool. This relocates malicious payloads in both the original document file and in the memory of the reader program. We demonstrate that our approach indeed subdues in the wild exploits by manual validation of both Office 2003 and Office 2007 malicious documents while the transformed documents continue to render benign content properly. The document transformation can be performed offline and requires only a single document scan while the user-perceived delay when opening the transformed document is negligible. We also show that it is possible to thwart malicious heap sprays by injecting benign content in documents. This approach, however, comes at an expense of computational and memory resources.

GMU-CS-TR-2014-8

Convex Multi-task Relationship Learning using Hinge Loss

Anveshi Charuvaka and Huzefa Rangwala

GMU-CS-TR-2014-8.pdf

Multi-task learning improves generalization performance by learning several related tasks jointly. Several methods have been proposed for multi-task learning in recent years. Many methods make strong assumptions about symmetric task relationships while some are able to utilize externally provided task relationships. However, in many real world tasks the degree of relatedness among tasks is not known a priori. Methods which are able to extract the task relationships and exploit them while simultaneously learning models with good generalization performance can address this limitation. In the current work, we have extended a recently proposed method for learning task relationships using smooth squared loss for regression to classification problems using non-smooth hinge loss due to the demonstrated effectiveness of SVM classifier in single task classification settings. We have also developed an efficient optimization procedure using bundle methods for the proposed multi-task learning formulation. We have validated our method on one simulated and two real world datasets and have compared its performance to competitive baseline single-task and multi-task methods.

2013

GMU-CS-TR-2013-1

Parallel Model Listing and #3SAT Through Ordering Variables

Michael Connor and Fei Li

GMU-CS-TR-2013-1.pdf

We consider listing and counting all the solutions M to a given Boolean 3SAT formula with n variables. This problem is known to be NP-hard (Garey and Johnson, 1979). A straight-forward enumeration scheme to find M takes worst-case running time O(2^n). In this paper, we minimize the total running time and memory requirement by ordering the assignments of parallel values to the literals. Our objective is to establish theoretical bounds on an exponential running time O(a^b) such that a < 2 and b < n for certain Boolean formulas. We also conduct experiments to support our theoretical analysis.

GMU-CS-TR-2013-2

Lenience towards Teammates Helps in Cooperative Multiagent Learning

Liviu Panait and Keith Sullivan and Sean Luke

GMU-CS-TR-2013-2.pdf

Concurrent learning is a form of cooperative multiagent learning in which each agent has an independent learning process and little or no control over its teammates' actions. In such learning algorithms, an agent's perception of the joint search space depends on the reward received by both agents, which in turn depends on the actions currently chosen by the other agents. The agents will tend to converge towards certain areas of the space because of their learning processes. As a result, an agent's perception of the search space may benefit if computed over multiple rewards at early stages of learning, but additional rewards have little impact towards the end. We thus suggest that agents should be lenient with their teammates: ignore many of the low rewards initially, and fewer rewards as learning progresses. We demonstrate the benefit of lenience in a cooperative co-evolution algorithm and in a new reinforcement learning algorithm.

GMU-CS-TR-2013-3

EPIC: Efficient Path-Independent Capabilities in Mobile Ad Hoc Networks --- Design, Performance, and Security

Eric Swankoski and Sanjeev Setia

GMU-CS-TR-2013-3.pdf

In this paper, we investigate the issues that arise in the use of network capabilities to facilitate DoS prevention and mitigation in mobile ad hoc networks. Most proposals for capability-based protocols in wired networks depend upon routers on the path between the sender and receiver maintaining state that enables them to verify a capability. Frequent route changes make it necessary for any capability-based protocol for MANETs to re-establish this state as efficiently as possible. We propose EPIC, a method based on reverse-disclosure hash chains to support the establishment of path-independent capabilities and show how they can be efficiently maintained in a high mobility environment. Simulation results show that EPIC provides a 19.3% increase in performance and a 37.8% increase in efficiency over route-dependent capabilities.

GMU-CS-TR-2013-4

Conservative Collision Prediction Among Polygons with Unknown Motion

Yanyan Lu, Zhonghua Xi and Jyh-Ming Lien

GMU-CS-TR-2013-4.pdf

Collision detection is a fundamental geometric tool for sampling-based motion planners. On the contrary, collision prediction for the scenarios that obstacle's motion is unknown is still in its infancy. This paper proposes a new approach to predict collision by assuming that obstacles are adversarial. Our new tool advances collision prediction beyond the translational and disc robots; arbitrary polygons with rotation can be used to better represent obstacles and provide tighter bound on predicted collision time. Comparing to an online motion planner that replans periodically at fixed time interval, our experimental results provide strong evidences that the new method significantly reduces the number of replans while maintaining higher success rate of finding a valid path.

GMU-CS-TR-2013-5

A Modular Approach to Model-Based Testing of Concurrent Programs

Richard Carver and Yu Lei

GMU-CS-TR-2013-5.pdf

This paper presents a modular approach to testing concurrent programs that are modeled using labeled transition systems. Correctness is defined in terms of an implementation relation that is expected to hold between a model of the system and its implementation. The novelty of our approach is that the correctness of a concurrent software system is determined by testing the individual threads separately, without testing the system as a whole. We define a modular implementation relation on individual treads and show that modular relations can be tested separately in order to verify a (non-modular) implementation relation for the complete system. Empirical results indicate that our approach can significantly reduce the number of test sequences that are generated and executed during model-based testing.

GMU-CS-TR-2013-6

Cesar Cadena, Jana Kosecka

Semantic Segmentation of Urban Environments into Object and Background Categories

GMU-CS-TR-2013-6.pdf

Advancements in robotic navigation, object search and exploration rest to a large extent on robust, efficient and more advanced semantic understanding of the surrounding environment. Since the choice of most relevant semantic information depends on the task, it is desirable to develop approaches which can be adopted for different tasks at hand and which separate the aspects related to surroundings from object entities. In the proposed work we present an efficient approach for detecting generic objects in urban environments from videos acquired by a moving vehicle by means of semantic segmentation. Compared to traditional approaches for semantic labeling, we strive to detect variety of objects, while avoiding the need for large amounts of training data required for recognizing individual object categories and visual variability within and across the categories. In the proposed approach we exploit the features providing evidence about widely available non-object categories (such as sky, road, buildings) and use informative features which are indicative of the presence of object boundaries to gather the evidence about objects. We formulate the object/non-object semantic segmentation problem in the Conditional Random Field Framework, where the structure of the graph is induced by the minimum spanning tree computed over 3D reconstruction, yielding an efficient algorithm for an exact inference. We carry out extensive experiments on videos of urban environments acquired by a moving vehicle and compare our approach to existing alternatives.

GMU-CS-TR-2013-7

A Test Automation Language for Behavioral Models

Nan Li and Jeff Offutt

GMU-CS-TR-2013-7.pdf

Model-based testers design tests in terms of models, such as paths in graphs. abstract tests cannot be run directly because they use names and events that exist in the model, but not the implementation. Testers usually solve this mapping problem by hand. Model elements often appear in many abstract tests, so testers write the same redundant code many times. This is time-consuming, labor-intensive, and error-prone. This paper presents a language to automate the creation of mappings from abstract tests to concrete tests. Three issues are addressed: (1) creating mappings and generating test values, (2) transforming graphs and using coverage criteria to generate test paths, and (3) solving constraints and generating concrete test. The paper also presents results from an empirical comparison of testers using the mapping language with manual mapping on 17 open source and example programs. We found that the automated test generation method took a fraction of the time the manual method took, and the manual tests contained numerous errors in which concrete tests did not match their abstract tests.

GMU-CS-TR-2013-8

Plagiarism

Sean Luke

GMU-CS-TR-2013-8.pdf

Students are often confused by what constitutes plagiarism in academia and the seriousness with which the offense is treated. The objective of this essay is to explain the issue and provide examples to explain what is plagiarism, and hopefully act as a useful guide.

GMU-CS-TR-2013-9

Future MASON Directions: Community Recommendations (Report of the 2013 MASON NSF Workshop)

Nicolas Payette and Marius Bujorianu and Glen Ropella and Ken Cline and Jeffrey Schank and Matt Miller and Sara Jonsson and Laszlo Gulyas and Richard Legendi and Olaf Bochmann and Luís de Sousa and Vlasios Voudouris and Daniil Kiose and Przemyslaw Szufel and Steve Saul and John McManus and Vittorio Scarano and Gennaro Cordasco and Chris Hollander and Paul Wiegand and Vera Kazakova and Brian Hrolenok and J. Daniel Rogers and Michael Schader and Sean Luke and Kenneth De Jong and Mark Coletti and Paul Schopf and Claudio Cioffi-Revilla and Keith Sullivan and Khaled Talukder and Ahmed Elmolla and Ermo Wei

GMU-CS-TR-2013-9.pdf

MASON is an open source multiagent simulation library geared towards simulating very large numbers of relatively lightweight interacting agents. On June 15 and 16, 2013, approximately two dozen invitees convened at George Mason University to discuss future directions for MASON and needs of the MASON community. This meeting formed the 2013 MASON NSF Workshop. During the Workshop, participants identified nine areas of interest to the community where they felt MASON should be improved or extended. This document details the reports of those working groups.

2012

GMU-CS-TR-2012-1

Dynamic RGB-D Mapping

Michael Paton and Jana Kosecka

GMU-CS-TR-2012-1.pdf

Localization and mapping has been an area of great importance and interest to the robotics and computer vision community. It has traditionally been accomplished with range sensors such as lasers and sonars. Recent improvements in processing power coupled with advancements in image matching and motion estimation has allowed development of vision based localization techniques. Despite much progress, there are disadvantages to both range sensing and vision techniques making localization and mapping that is inexpensive and robust hard to attain. With the advent of RGB-D cameras which provide synchronized range and video data, localization and mapping is now able to exploit both range data as well as RGB features. This thesis exploits the strengths of vision and range sensing localization and mapping strategies and proposes novel algorithms using RGB-D cameras. We show how to combine existing strategies and present through evaluation of the resulting algorithms against a dataset of RGB-D benchmarks. Lastly we demonstrate the proposed algorithm on a challenging indoor dataset and demonstrate improvements where either pure range sensing or vision techniques perform poorly.

GMU-CS-TR-2012-2

Mining the Execution History of a Software System to Infer the Best Time for its Adaptation

Kyle R. Canavera, Naeem Esfahani, and Sam Malek

GMU-CS-TR-2012-2.pdf

An important challenge in dynamic adaptation of a software system is to prevent inconsistencies (failures) and disruptions in its operations during and after change. Several prior techniques have solved this problem with various tradeoffs. All of them, however, assume the availability of detailed component dependency models. This paper presents a complementary technique that solves this problem in settings where such models are either not available, difficult to build, or outdated due to the evolution of the software. Our approach first mines the execution history of a software system to infer a "stochastic component dependency model", representing the probabilistic sequence of interactions among the system's components. We then demonstrate how this model could be used at runtime to infer the "best time" for adaptation of the system's components. We have thoroughly evaluated this research on a multi-user real world software system and under varying conditions.

GMU-CS-TR-2012-3

Multiobjective Optimization of Co-Clustering Ensembles

Francesco Gullo and AKM Khaled Ahsan Talukder and Sean Luke and Carlotta Domeniconi and Andrea Tagarelli

GMU-CS-TR-2012-3.pdf

Co-clustering is a machine learning task where the goal is to simultaneously develop clusters of the data and of their respective features. We address the use of co-clustering ensembles to establish a consensus co-clustering over the data. As is obvious from its name, co-clustering is naturally multiobjective. Previous work tackled the problem using both rudimentary multiobjective optimization and expectation maximization, then later a gradient ascent approach which outperformed both of them. In this paper we develop a new preference-based multiobjective optimization algorithm to compete with the gradient ascent approach. Unlike this gradient ascent algorithm, our approach once again tackles the co-clustering problem with multiple heuristics, but also applies the gradient ascent algorithm's joint heuristic as a preference selection procedure. As a result, we are able to significantly outperform the gradient ascent algorithm on feature clustering and on problems with smaller datasets.

GMU-CS-TR-2012-4

Adaptive Search for Interesting Things

William Squires and Sean Luke

GMU-CS-TR-2012-4.pdf

The results of a parameter sweep over a multidimensional parameter space are often used to gain an understanding of the space beyond simply identifying optima. But sweeps are costly, and so it is highly desirable to adaptively sample the space in such a way as to concentrate precious samples on the more "interesting" areas of the space. In this paper we analyze and expand on a previous work which defined such areas as those in which the slope of some quality function was high. We develop a performance metric in terms of generalizability of the resulting sampled model, examine the existing method in terms of scalability and accuracy, and then propose and examine two population-based approaches which address some of its shortcomings.

GMU-CS-TR-2012-5

Malicious PDF Detection Using Metadata and Structural Features

Charles Smutz and Angelos Stavrou

GMU-CS-TR-2012-5.pdf

Owed to their versatile functionality and size, PDF documents have become a popular avenue for exploitation ranging from broadcasted phishing attacks to targeted attacks. In this paper, we present a method for efficiently detecting malicious documents based on feature extraction of document metadata and structure. We demonstrate that by carefully extracting a wide-range of feature sets we can offer a robust malware classifier. Indeed, using multiple datasets containing an aggregate of 5,000 malicious documents and 100,000 benign ones, our classification rates are well above 99% while maintaining low false negatives 0.01% or less for different classification parameters and scenarios. Surprisingly, we also discovered that by artificially reducing the influence of the top features we can achieve robust detection even in an adversarial setting where the attacker is aware of the top features and our normality model. Therefore, due to the large number of the extracted features, a defense posture that employs detectors trained on random sets of features, is robust even against mimicry attacks with intimate knowledge of the training set.

GMU-CS-TR-2012-6

Elucidating Activity-related Physico-chemical Features in Antimicrobial Peptides

Daniel Veltri and Amarda Shehu

GMU-CS-TR-2012-6.pdf

The rise of drug-resistant bacteria has brought attention to antimicrobial peptides (AMPs) as targets for novel antibacterial drug research. Many machine learning methods aim to improve recognition of AMPs. Sequence-derived features are often employed in the context of supervised learning through Support Vector Machines (SVMs). This can be useful for expediently screening databases for AMP-like peptides. However, AMPs are characterized by great sequence diversity. Moreover, biological studies focusing on AMP modification and de novo design stand to benefit from computational methods capable of exposing underlying features important for activity at the amino-acid level position. We take the first steps in this direction by considering an extensive list of amino-acid physico-chemical features. We gradually narrow this list down to relevant features in the context of SVM classification. We focus on a specific AMP class, cathelicidins, due to the abundance of documented sequences, to improve their recognition over carefully-designed decoy sequences. Analysis of the features important for the classification reveals interesting physico-chemical properties to preserve when modifying or designing novel AMPs in the wet laboratory.

2011

GMU-CS-TR-2011-0

How to Make a Technical Report (Rev. 3)

Sean Luke

GMU-CS-TR-2011-0.pdf

This document is accompanied by the following data archive: GMU-CS-TR-2011-0.zip

The following document describes the preferred formatting of technical reports and submission guidelines. This document itself is formatted according to the preferred formatting rules.

GMU-CS-TR-2011-1

GPU-Euler: Sequence Assembly using GPGPU

Syed Faraz Mahmood and Huzefa Rangwala

GMU-CS-TR-2011-1.pdf

Advances in sequencing technologies have revolutionized the field of genomics by providing cost effective and high throughput solutions. In this paper, we develop a parallel sequence assembler implemented on general purpose graphic processor units (GPUs). Our work was largely motivated by a growing need in the genomic community for sequence assemblers and increasing use of GPUs for general purpose computing applications. We investigated the implementation challenges, and possible solutions for a data parallel approach for sequence assembly. We implemented an Eulerian-based sequence assembler (GPU-Euler) on the nVidia GPUs using the CUDA programming interface. GPU-Euler was benchmarked on three bacterial genomes using input reads representing the new generation of sequencing approaches. Our empirical evaluation showed that GPU-Euler produced lower run times, and showed comparable performance in terms of contig length statistics to other serial assemblers. We were able to demonstrate the promise of using GPUs for genome assembly, a computationally intensive task.

GMU-CS-TR-2011-2

Large Scale Empirical Analysis of Cooperative Coevolution

Sean Luke and Keith Sullivan and Faisal Abidi

GMU-CS-TR-2011-2.pdf

We present a study of cooperative coevolution applied to moderately complex optimization problems in large-population environments. The study asks three questions. First: what collaboration methods perform best, and when? Second: how many subpopulations are desirable? Third: is it worthwhile to do more than one trial per fitness evaluation? We discovered that parallel methods tended to work better than sequential ones, that “shuffling” (a collaboration method) predominated in performance in more complex problems, that more subpopulations generally did better, and that more trials performed marginally better.

GMU-CS-TR-2011-3

Guided Exploration of the Architectural Solution Space in the Face of Uncertainty

Naeem Esfahani and Sam Malek

GMU-CS-TR-2011-3.pdf

A system's early architectural decisions impact its properties (e.g., scalability, dependability) as well as stakeholder concerns (e.g., cost, time to delivery). Choices made early on are both difficult and costly to change, and thus it is paramount that the engineer gets them 'right'. This leads to a paradox, as in early design, the engineer is often forced to make these decisions under uncertainty, i.e., not knowing the precise impact of those decisions on the various concerns. How could the engineer make the 'right' choices in such circumstances? This is precisely the question we have tackled in this paper. We present GuideArch, a framework aimed at guiding the exploration of the architectural solution space under uncertainty. It provides techniques and tools that help the engineer to make informed decisions. The approach has been thoroughly evaluated in a project aimed at reengineering a mobile software system.

GMU-CS-TR-2011-4

Fast and Robust Generation of City-Scale Seamless 3D Urban Models

Yanyan Lu and Evan Behar and Stephen Donnelly and Jyh-Ming Lien and Fernando Camelli and David Wong

GMU-CS-TR-2011-4.pdf

Since the introduction of the concept of “Digital Earth”, almost every major international city has been re-constructed in the virtual world. A large volume of geometric models describing urban objects has become freely available in the public domain via software like ArcGlobe and Google Earth. Although mostly created for visualization, these urban models can benefit many applications beyond visualization including city scale evacuation planning and earth phenomenon simulations. However, these models are mostly loosely structured and implicitly defined and require tedious manual preparation that usually takes weeks if not months before they can be used. Designing algorithms that can robustly and efficiently handle unstructured urban models at the city scale becomes a main technical challenge. In this paper, we present a framework that generates seamless 3D architectural models from 2D ground plans with elevation and height information. These overlapping ground plans are commonly used in the current GIS software such as ESRI ArcGIS and urban model synthesis methods to depict various components of buildings. Due to measurement and manual errors, these ground plans usually contain small, sharp, and various (nearly) degenerate artifacts. In this paper, we show both theoretically and empirically that our framework is efficient and numerically stable. Based on our review of the related work, we believe this is the first work that attempts to automatically create 3D architectural meshes for simulation at the city level. With the goal of providing greater benefit beyond visualization from this large volume of urban models, our initial results are encouraging.

GMU-CS-TR-2011-5

Hierarchical Multi-Robot Learning from Demonstration

Keith Sullivan and Sean Luke

GMU-CS-TR-2011-5.pdf

Developing robot behaviors is a tedious task requiring multiple coding, trial, and debugging cycles. This makes attractive the notion of learning from demonstration, whereby a robot learns behaviors in real time from the examples of a demonstrator. Learning from demonstration can be problematic, however, because of the number of trials necessary to gather sufficient samples to learn correctly. The problem is compounded in a multi-robot setting due to the potentially much larger design space arising from the number of and interactions between the robots. In this paper, we propose a learning from demonstration system capable of rapidly training multiple robots to perform a collaborative task. Our supervised learning method applies user domain knowledge to decompose complex behaviors into a hierarchy of simpler behaviors, which are easier to train and learn, and require many fewer samples to do so. The system further reduces the state space by only considering environmental features and actions pertinent to each decomposed simple behavior. Decomposition occurs not only within individual robot behaviors but also at the hierarchical group behavior level. Experiments using Pioneer robots in a patrol scenario illustrate our system.

GMU-CS-TR-2011-6

Netgator: Malware Detection Through Program Interactive Proofs

Brian Schulte, Rhandi Martin, Haris Andrianakis, Angelos Stavrou

GMU-CS-TR-2011-6.pdf

Exfiltration of data using internet-borne attacks has become a credible threat for organization and enterprises. History has shown that crafted targeted attacks and zero-day malware are capable of penetrating even the most sophisticated defenses. To make matters worse, intrusion detection systems that perform analysis of network traffic are dependent on the timely information provided by blacklisting, signature schemes, or anomaly patterns. This is especially true for heavily used communication protocols where blocking decisions affect the everyday operations of the organization. Even when the intrusion is detected in a timely manner, valuable data might have already been stolen. In this paper, we propose a new approach to distinguish legitimate browser software from malware that generates identical network traffic signatures. Our system consists of two parts: a signature-based passive detection module, and a module that issues Program Interactive Challenges (PICs) to client applications. Initially, we perform passive detection utilizing content inspection techniques that analyze network header element order. Of course this is not enough: we demonstrate that passive detection can be easily subverted. To amend that, we introduce an interactive detection module in the form of inline network proxy that challenges the detected browser forcing it to exercise known specific internal functionality. For browsers, we can leverage HTML, Flash, Javascript and other common browser components to form challenges that malware will not be able to respond to due to lack of functionality. Contrary to interractive challenges, our challenges are transparent to the user allowing for a seamless browsing experience. We demonstrate the effectiveness of active challenges on thousands of real world malware samples. Our results show that, depending on the deployment scenario, PICs incure no or minimal (average of 0.35 seconds) latency per inspected objected.

GMU-CS-TR-2011-7

SecureSwitch: BIOS-Assisted Isolation and Switch between Trusted and Untrusted Commodity OSes

Kun Sun and Jiang Wang and Fengwei Zhang and Angelos Stavrou

GMU-CS-TR-2011-7.pdf

Protecting commodity desktop systems that run commercial operating systems (OS) without adversely impacting performance or usability remains an open problem. To make matters worse, the overall system security depends on desktop applications with complex code-bases that perform multiple and inter-dependent tasks often dictated by Internet-borne code. Recent research has indicated the need for context-dependent trustworthy environments where the user can segregate different activities in an effort to lower risk and safeguard private information. In this paper, we introduce a new BIOS-assisted mechanism for the secure generation and management of trusted execution environments, tailored to separate security-sensitive activities from untrusted ones. A key characteristic of our system is usability: the capability to quickly and securely switch between operating environments in a single physical machine without the need for any specialized hardware or extensive code modifications. Our goal was to eliminate any mutable, non-BIOS codesharing while reusing existing processor capabilities. We demonstrate that even if the untrusted OS becomes compromised, it cannot perform an exfiltration or inference attack on data or applications in the trusted OS. To avoid sophisticated attacks that fake a trusted environment, we provide visible indication to the user about the current environment. Moreover, to alternate between environments, we require the user to physically press a button, an action that cannot be reproduced by software. Using our prototype, we measured the switching process to be approximately six seconds. This short switching time empowers the user to frequently and seamlessly switch between trusted and untrusted environments.

GMU-CS-TR-2011-8

An Analysis of System Management Mode (SMM)-based Integrity Checking Systems and Evasion Attacks

Jiang Wang and Kun Sun and Angelos Stavrou

GMU-CS-TR-2011-8.pdf

System Management Mode (SMM) is an x86 processor feature designed to assist debugging for hardware manufacturers. Recent research has shown that SMM can also be used to protect the run-time integrity of software by invoking SMM to periodically check current system state and compare it with known pristine or trusted software states. Researchers and practitioners have claimed that any unauthorized state modification can be detected with an SMM-based system integrity-checking mechanism. In this paper, we demonstrate that all hardware-based, periodic integrity mechanisms can be defeated by a new class of attacks, which we refer to as “evasion attacks.” Such attacks use a compromised software stack to remove any attack traces before the integrity checks begin and to continue the execution of the malicious code after the integrity checks are completed. We detail two categories of evasion attacks: directly-intercepting System Management Interrupt (SMI) and indirectly-deriving SMI invocations. Finally, we measure the performance impact of our proof-of-concept prototypes for all of the attacks and present countermeasures for these attacks.

GMU-CS-TR-2011-9

Finding Critical Changes in Dynamic Configuration Spaces

Yanyan Lu and Jyh-Ming Lien

GMU-CS-TR-2011-9.pdf

Given a motion planning problem in a dynamic but fully known environment, we propose the first roadmapbased method, called critical roadmap, that has the ability to identify and exploit the critical topological changes of the free configuration space. Comparing to the existing methods that either ignore temporal coherence or only repair their roadmaps at fixed times, our method provides not only a more complete representation of the free (configuration-time) space but also provides significant efficiency improvement. Our experimental results show that the critical roadmap method has a higher chance of finding solutions, and it is at least one order of magnitude faster than some well-known planners.

2010

GMU-CS-TR-2010-1

Security Policy Cognizant Module Composition

Paul Seymer, Angelos Stavrou, Duminda Wijesekera, and Sushil Jajodia

GMU-CS-TR-2010-1.pdf

Component-based software development and deployment is based on developing individual software modules that are composed on an as needed basis. Such modules expose the computations they provide and their dependencies on providing these computations --- that results in a well known requires-provides specifications for modules. This paper provides a framework to combine modules that specify their requires-provides interfaces in a policy dependent way. Our framework specify policies as combinations of Constraint Logic Programming (CLP) based rules and our policies can cover multiple aspects associated of compositions, such as security and quality of service. We apply our framework to specify Quality of Protection (QoP) and Quality of Service (QoP) policies. An example shows the applicability of our policy language to a teleconferencing application with multiple security and resource usage policies.

GMU-CS-TR-2010-2

Multiple Kernel Learning for Fold Recognition

Huzefa Rangwala

GMU-CS-TR-2010-2.pdf

Fold recognition is a key problem in computational biology that involves classifying protein sharing structural similarities into classes commonly known . Recently, researchers have developed several efficient kernel based discriminatory methods for fold classification using sequence information. These methods train one-versus-rest binary classifiers using well optimized kernels from different data sources and techniques.

Integrating this vast amount of data in the form of kernel matrices is an interesting and challenging problem. The semidefinite positive property of the various kernel matrices makes it attractive to cast the task of learning an optimal weighting of several kernel matrices as a semi-definite programming optimization problem. We experiment with a previously introduced quadratically constrained quadratic optimization problem for kernel integration using 1-norm and 2-norm support vector machines. We integrate state-of-the-art profilebased direct kernels to learn an optimal kernel matrix. Our experimental results show a small significant improvement in terms of the classification accuracy across the different fold classes. Our analysis illustrates the strength of two dominating kernels across different fold classes, which suggests the redundant nature of the kernel matrices being combined.

GMU-CS-TR-2010-3

Following a Large Unpredictable Group of Targets Among Obstacles

Christopher Vo and Jyh-Ming Lien

GMU-CS-TR-2010-3.pdf

Camera control is essential in both virtual and real-world environments. Our work focuses on an instance of camera control called target following, and offers an algorithm, based on the ideas of monotonic tracking regions and ghost targets, for following a large coherent group of targets with unknown trajectories, among known obstacles. In multiple-target following, the camera's primary objective is to follow and maximize visibility of multiple moving targets. For example, in video games, a third-person view camera may be controlled to follow a group of characters through complicated virtual environments. In robotics, a camera attached to robotic manipulators could also be controlled to observe live performers in a concert, monitor assembly of a mechanical system, or maintain task visibility during teleoperated surgical procedures. To the best of our knowledge, this work is the first attempting to address this particular instance of camera control.

GMU-CS-TR-2010-4

Scalable and Robust Shepherding via Deformable Shapes

Joseph F. Harrison, Christopher Vo and Jyh-Ming Lien

GMU-CS-TR-2010-4.pdf

In this paper, we present a new motion planning strategy for shepherding in environments containing obstacles. This instance of the group motion control problem is applicable to a wide variety of real life scenarios, such as animal herding, civil crowd control, and oil-spill cleanup. However, the problem is challenging in terms of scalability and robustness because it is dynamic, highly underactuated, and involves multi-agent coordination. Our previous work showed that high-level probabilistic motion planning algorithms combined with simple shepherding behaviors can be beneficial in situations where low-level behaviors alone are insufficient. However, inconsistent results suggested a need for a method that performs well across a wider range of environments. In this paper, we present a new method, called Deform, in which shepherds view the flock as an abstracted deformable shape. We show that our method is more robust than our previous approach and that it scales more effectively to larger teams of shepherds and larger flocks. We also show Deform to be surprisingly robust despite increasing randomness in the motion of the flock.

GMU-CS-TR-2010-5

HyperCheck: A Hardware-Assisted Integrity Monitor

Jiang Wang, Angelos Stavrou, and Anup Ghosh

GMU-CS-TR-2010-5.pdf

Over the past few years, virtualization has been employed to environments ranging from densely populated cloud computing clusters to home desktop computers. Security researchers embraced virtual machine monitors (VMMs) as a new mechanism to guarantee deep isolation of untrusted software components. Unfortunately, their widespread adoption promoted VMMs as a prime target for attackers. In this paper, we present HyperCheck, a hardware-assisted tampering detection framework designed to protect the integrity of VMMs and, for some classes of attacks, the underlying operating system (OS). HyperCheck leverages the CPU System Managed Mode (SMM), present in x86 systems, to securely generate and transmit the full state of the protected machine to an external server. Using HyperCheck, we were able to ferret-out rootkits that targeted the integrity of both the Xen hypervisor and traditional OSes. Moreover, HyperCheck is robust against attacks that aim to disable or block its operation. Our experimental results show that Hypercheck can produce and communicate a scan of the state of the protected software in less than 40ms.

GMU-CS-TR-2010-6

Extra Buffer Resources Improving Competitiveness in Minimizing Energy Consumption

Zhi Zhang and Fei Li

GMU-CS-TR-2010-6.pdf

In this paper, we consider energy management algorithms for scheduling jobs in power-scare scenarios such as embedded computer systems and sensor networks. We focus on investigating the impact of buffer resources in minimizing the total energy cost in an online setting. The online algorithms do not have any assumptions on job arrivals; their worst-case performance is measured in term of competitive ratio, when they are compared with the optimal algorithms with clairvoyance. We prove that with appropriate extra buffer space, an online algorithm can beat an weak optimal offline algorithm in terms of the total energy required. Our research result helps to quantitatively estimate the optimal on-chip buffer resources allocated in real-time systems with power constraints. We also present the lower bound of competitive ratio that any deterministic online algorithm cannot achieve.

GMU-CS-TR-2010-7

An Information Theoretic Approach for the Analysis of RNA and DNA Binding Sites

Sheng Li and Huzefa Rangwala

GMU-CS-TR-2010-7.pdf

Proteins perform several critical biological processes by interacting with other macromolecules (DNA, RNA) and small molecules. Several computational approaches have been developed to determine the protein interaction sites using sequence and structure features. Instead of building another adhoc prediction algorithm, the purpose of this study is to understand the contribution of a s residue in a RNA-binding event and compare it with the DNA-binding process. We evaluate several sequence and structure-based features using mutual information theory. We show that solvent accessibility and profile-based features can be used for developing good protein-RNA binding site determination algorithms. We also recommend features that could discriminate between RNA and DNA binding sites. This work can be extended to understand protein-protein and protein-ligand interactions as well.

GMU-CS-TR-2010-8

Energy Management for Time-Critical Harvesting Wireless Sensor Networks

Bo Zhang, Robert Simon and Hakan Aydin

GMU-CS-TR-2010-8.pdf

As Cyber-Physical Systems (CPSs) evolve they will be increasingly relied to support time-critical monitoring and control activities. Further, many CPSs that utilizing Wireless Sensor Networking (WSN) technologies will require the use of energy harvesting methods to extend their lifetimes. For this application class, there are currently few effective models that allow the simulation and analysis of new algorithms or system performance. To address this problem, we define a general purpose WSN model to support a time-critical CPS system. We then present a set of Harvesting Aware Speed Selection (HASS) algorithms. Our technique maximizes the minimum energy reserve for all the nodes in the network, thus ensuring highly resilient performance under emergency or fault-driven situations. We present an optimal centralized solution, along with an efficient, fully distributed solution. We propose a CPS-specific experimental methodology, enabling us to evaluate our approach. Our experiments show that our algorithms yield significantly higher energy reserves than baseline methods.

GMU-CS-TR-2010-9

Evaluation of Short Read Metagenomic Assembly

Anveshi Charuvaka and Huzefa Rangwala

GMU-CS-TR-2010-9.pdf

Advances in sequencing technologies have equipped researchers with the ability to sequence the collective genome of entire microbial communities commonly referred to as metagenomics. These microbes are are omnipresent within the human body and environments across the world. As such, characterizing and understanding their roles is crucial for improving human health and the environment. The problem of using short reads obtained from current next generation sequencing technologies to assemble the genomes within the community sample is challenging for several reasons. In this study we assess the performance of a state-of-the-art Eulerian-based graph assembler on a series of simulated dataset with varying complexity. We evaluate the feasibility of metagenomic assembly with reads restricted to be 36 base pairs obtained from the Solexa/Illumina platform. We developed a pipeline to evaluate the quality of assembly based on contig length statistics and accuracy. We studied the effect of overlap parameters used for the metagenomic assembly and developed a clustering solution to pool the contigs obtained from different runs of the assembly algorithm which allowed us to obtain longer contigs. We also computed an entropy/impurity metric to assess how mixed the assembled contigs were. Ideally a contig should be assembled from reads obtained from the same organism. We also compared the metagenomic assemblies to the best possible solution that could be obtained by assembling individual source genomes. Our results show that accuracy was better than expected for the metagenomic samples with a few dominant organisms and was especially poor in samples containing many closely related strains.

GMU-CS-TR-2010-10

Taming Uncertainty in Self-Adaptation through Possibilistic Analysis

Naeem Esfahani and Ehsan Kouroshfar and Sam Malek

GMU-CS-TR-2010-10.pdf

Self-adaptation endows a software system with the ability to satisfy certain objectives by automatically modifying its behavior. While many promising approaches for the construction of self-adaptive software systems have been developed, the majority of them ignore the uncertainty underlying the adaptation decisions. This has been one of the key inhibitors to wide-spread adoption of self-adaption techniques in risk-averse real-world settings. In this paper, we describe an approach, called POssIbilistic SElf-aDaptation (POISED), for tackling the challenge posed by uncertainty in making adaptation decisions. POISED builds on possibility theory to assess the positive and negative consequences of uncertainty. It makes adaptation decisions that result in the best range of potential behavior. We demonstrate POISED's application to the problem of improving a software system's quality attributes via runtime reconfiguration of its customizable software components. We have extensively evaluated POISED using a prototype of a robotic software system.

GMU-CS-TR-2010-11

A New Method for Mapping the Configuration Space Obstacles of Polygons

Evan Behar and Jyh-Ming Lien

GMU-CS-TR-2010-11.pdf

Configuration space (C-space) plays an important role not only in motion planning but also in geometric modeling, shape and kinematic reasoning, and is fundamental to several basic geometric operations, such as continuous collision detection and generalized penetration depth estimation, that also find their applications in motion planning, animation and simulation. In this paper, we developed a new method for constructing the boundary of the C-space obstacles (C-obst) of polygons. This method is simpler to implement and often more efficient than the existing techniques. These main advantages are provided by a new algorithm that allows us to extract the Minkowski sum from the reduced convolution of the input polygons. We also developed a method for estimating the generalized penetration depth by computing the distance between the query point and the C-obst surface.

GMU-CS-TR-2010-12

Dynamic Minkowski Sum of Convex Shapes

Evan Behar and Jyh-Ming Lien

GMU-CS-TR-2010-12.pdf

Computing the Minkowski sums of rotating objects has always been done naively by re-computing every Minkowski sum from scratch. The correspondences between the Minkowski sums are typically completely ignored. We propose a method, called DYMSUM, that can efficiently update the Minkowski sums of rotating convex polyhedra. We show that DYMSUM is significantly more efficient than the traditional approach, in particular when the size of the input polyhedra are large and when the rotation is small between frames. From our experimental results, we show that the computation time of the proposed method grows slowly with respect to the size of the input comparing to the naive approach.

GMU-CS-TR-2010-13

Using Ontological Information to Enhance Responder Availability in Emergency Response

Paul Ngo and Duminda Wijesekera

GMU-CS-TR-2010-13.pdf

Ensuring effective communications during emergencies is an important issue for any functional government. One way to address this issue is to ensure the availability of the key personnel capable of making the appropriate decisions and taking timely actions with sufficient resources. Many XML-based languages such as the Emergency Data Exchange Language (EDXL) and associated Common Alert Protocol (CAP) have been designed to provide a basis for such communications. To ensure that messages are delivered in a timely manner, we propose some role and task based ontological enhancements for these languages. We show by example how the ontological enhancements can be used to enhance availability of emergency personnel in case of a need.

GMU-CS-TR-2010-14

Fixed Length Representations of Sequence Data using a Variant of the Hidden Markov Model

Sam Blasiak and Huzefa Rangwala

GMU-CS-TR-2010-14.pdf

Sequence classification is central to many practical problems within machine learning. Classification algorithms often center around the notion of a distance metric between examples. Unlike sequences, the Euclidean distance metric between vectors often has an intuitive meaning which transfers naturally to a meaning in the classification domain. Distances metrics between arbitrary pairs of sequences, however, can be harder to define because sequences can vary in both length and the information contained in the order of sequence elements is lost when standard distance metrics are applied. We present a scheme that employs a Hidden Markov Model variant to produce a set of fixed-length vectors from a set of sequences. We then define three inference algorithms, a Baum-Welch variant, a Gibbs Sampling algorithm, and a variational algorithm, to infer model parameters. Finally, we show experimentally that the fixed length representation produced by these inference methods is useful for classifying proteins by structural taxonomy.

GMU-CS-TR-2010-15

Detection of Communities and Bridges in Weighted Networks

Tanwistha Saha and Carlotta Domeniconi and Huzefa Rangwala

GMU-CS-TR-2010-15.pdf

Traditional graph-based clustering methods group vertices into discrete non-intersecting clusters under the assumption that each vertex can belong to only a single cluster. On the other hand, recent research on graph-based clustering methods, applied to real world networks (e.g., protein-protein interaction networks and social networks), shows overlapping patterns among the underlying clusters. For example, in social networks, an individual is expected to belong to multiple clusters (or communities), rather than strictly confining himself/herself to just one. As such, overlapping clusters enable better models of real-life phenomena. Soft clustering (e.g., fuzzy c-means) has been used with success for non-graph data, when the objects are allowed to belong to multiple clusters with a certain degree of membership. In this paper, we propose a fuzzy clustering based approach for community detection in a weighted graphical representation of social and biological networks, for which the ground truth associated to the nodes is available. We compare our results with a baseline method for both multi-labeled and single labeled datasets.

GMU-CS-TR-2010-16

GeoMason: Geospatial Support for MASON

Keith Sullivan and Mark Coletti and Sean Luke

GMU-CS-TR-2010-16.pdf

MASON is a free, open-source Java-based discrete event multi-agent simulation toolkit that has been used to model network intrusions, unmanned aerial vehicles, nomadic migrations, and farmer/herder conflicts, among others. Many multi-agent models use georeferenced data which represent such things as road networks, rivers, vegetation coverage, population, and topology. However, MASON does not directly support georeferenced data. Therefore practitioners using MASON must hand craft such support, which may be difficult and error prone. In this paper we describe newly added geospatial functionality in MASON that addresses this problem. We discuss the design of this functionality, called GeoMASON, and its use and limitations. Finally, we give examples on how to import and manipulate georeferenced data.

GMU-CS-TR-2010-17

Which Should We Try First? Ranking Information Resources through Query Classification

Joshua Church and Amihai Motro

GMU-CS-TR-2010-17.pdf

Users seeking information in distributed environments of large numbers of disparate information resources are often burdened with the task of repeating their queries for each and every resource. Invariably, some of the searched resources are more productive (yield more useful documents) than others, and it would be undoubtedly useful to try these resources first. If the environment is federated and a single search tool is used to process the query against all the disparate resources, then a similar issue arises: Which information resources should be searched first, to guarantee that useful answers are streamed to users in a timely fashion. In this paper we propose a solution that incorporates techniques from text classification, machine learning and information retrieval. Given a set of pre-classified information resources and a keyword query, our system suggests a relevance ordering of the resources. The approach has been implemented in prototype form, and initial experimentation has given promising results.

GMU-CS-TR-2010-18

Predicting Network Response Times Using Social Information

Chen Liang and Sharath Hiremagalore and Angelos Stavrou and Huzefa Rangwala

GMU-CS-TR-2010-18.pdf

Social networks and discussion boards have become a significant outlet where people communicate and express their opinion freely. Although the social networks themselves are usually well-provisioned, the participating users frequently point to external links to substantiate their discussions. Unfortunately, the sudden heavy traffic load imposed on the external, linked web sites causes them to become unresponsive leading to what people call the “Flash Crowds” effect. Flash Crowds present a real challenge, as it is not possible to predict their intensity and occurrence time. Moreover, although increasingly capable, most present-day web hosting servers and caching systems are designed to handle a nominal load of requests before they become unresponsive. This can happen either due to the limited bandwidth or processing power allocated to the hosting site. In this paper, we quantify the prevalence of flash crowd events for a popular social discussion board (Digg). Using PlanetLab, we measured the response times of 1289 unique popular websites. We were able to verify that 89% of the popular URLs suffered variations in their response times. In an effort to identify flash crowds ahead of time, we evaluate and compare traffic forecasting mechanisms. We show that predicting network traffic using network measurements has very limited success and cannot be used for large-scale prediction. However, by analyzing the content and structure of the social discussions, we were able to classify 86% of the popular web sites within 5 minutes of their submission and 95% of the sites when more (5 hours) of social content became available. Our work indicates that we can effectively leverage social activity to forecast network events that will be otherwise infeasible to anticipate.

GMU-CS-TR-2010-19

TAC-ELM: Metagenomic Taxonomic Classification with Extreme Learning Machines

Zeehasham Rasheed and Huzefa Rangwala

GMU-CS-TR-2010-19.pdf

Next-generation technologies have allowed researchers to determine the collective genomes of all organisms within specific environments or communities. Varying species abundance, length and complexities within different communities, coupled with discovery of new species makes the problem of taxonomic assignment to short DNA sequence reads extremely challenging. We have developed a new sequence composition-based taxonomic classifier, TAC-ELM for metagenomic analysis. TAC-ELM uses the framework of extreme learning machines to quickly and accurately learn the weights for a neural network model, with input features consisting of GC content and oligonucleotides. TAC-ELM is evaluated on two standard metagenomic benchmarks with sequence read lengths reflecting the traditional and current technologies. Our empirical results indicate the strength of the developed approach, which outperforms state-of-the-art taxonomic classifiers in terms of accuracy, training time and implementation complexity. We also perform experiments that evaluate the pervasive case within metagenome analysis, where a species may not have been previously sequenced or discovered and will not exist in the reference genome databases.

GMU-CS-TR-2010-20

Ruminate: A Scalable Architecture for Deep Network Analysis

Charles Smutz and Angelos Stavrou

GMU-CS-TR-2010-20.pdf

This document is accompanied by the following data archive: GMU-CS-TR-2010-20.zip

Traditionally, Network Intrusion Detection Systems (NIDS) inspect packet header and payload data for malicious content. While each system is different, most NIDS perform limited analysis on network streams and network protocols. Unfortunately, current NIDS are typically susceptible to evasion through network protocol encoding, such as base64 encoding of SMTP/MIME or gzip compression of HTTP. In addition, malicious desktop application payloads (e.g., PDF documents, Flash multimedia files) are beyond the inspection capabilities of popular NIDS. To address these limitations, we introduce Ruminate, a scalable object-centric traffic inspection and analysis architecture. Ruminate provides a distributed platform for deep analysis of network payload content. This includes full decoding of network protocols and recursive extraction of client application objects transferred over the network. While traditional NIDS utilize static packet load balancing to provide scalability, Ruminate employs dynamic load distribution of reassembled network streams and embedded objects, outsourcing the heavy processing to other processors or connected hosts. Therefore, high latency or computationally expensive analysis can be performed on commodity servers. Furthermore, our approach empowers system administrators to provision resources and preferentially treat traffic not only depending on the packet header but also on the data objects it carries. To achieve this, each object inspection algorithm is implemented as a separate component or service offered through a highly scalable producer-consumer architecture. We demonstrate using real-world traffic that our load balancing is far superior to existing techniques. This is because its granularity depends on the reconstructed objects rather than packet or simple stream analysis. Unlike existing systems, Ruminate can prevent NIDS evasion that leverages encoding or compression of malicious objects in network protocols, desktop application file formats, or encapsulation within other objects.

2009

GMU-CS-TR-2009-1

Scheduling Weighted Packets with Deadlines Over a Fading Channel

Fei Li and Zhi Zhang

GMU-CS-TR-2009-1.pdf

We consider scheduling weighted packets with time constraints over a fading channel. Packets arrive at the transmitter in an online manner. Each packet has a value and a hard deadline by which it should be sent. The fade state of the channel determines the throughput obtained per unit of time and the channel's quality may change over time. In this paper, we design both offline and online algorithms to maximize weighted throughput, which is defined as the total value of the packets sent by their respective deadlines. We first present polynomial-time exact offline algorithms for this problem. Then, we present online algorithms and their competitive analysis as well as the lower bounds of competitive ratios. Our work is the first one addressing weighted throughput for this important problem in the areas of information theory and communications.

GMU-CS-TR-2009-2

Geometry-based Computation of Symmetric Homo-oligomeric Protein Complexes

Christopher Miles and Brian Olson and Amarda Shehu

GMU-CS-TR-2009-2.pdf

The need to engineer novel therapeutics and functional materials is driving the in-silico design of molecular complexes. This paper proposes a method to compute symmetric homo-oligomeric protein complexes when the structure of the replicated protein monomer is known and rigid. The relationship between the structure of a protein and its biological function brings the in-silico determination of protein structures associated with functional states to the forefront of computational biology. While protein complexes, arising from associations of protein monomers, are pervasive in every genome, determination of their structures remains challenging. Given the difficulty in computing structures of a protein monomer, computing arrangements of monomers in a complex is mainly limited to dimers. A growth in the number of protein complexes studied in wet labs is allowing classification of their structures. A recent database shows that most naturally-occurring protein complexes are symmetric homo-oligomers. The method presented here exploits this database to propose structures of symmetric homo-oligomers that can accommodate spatial replications of a given protein monomer. The method searches the database for documented structures of symmetric homo-oligomers where the replicated monomer has a geometrically-similar structure to that of the input protein monomer. The proposed method is a first step towards the in-silico design of novel protein complexes that upon further refinement and characterization can serve as molecular machines or fundamental units in therapeutics or functional materials.

GMU-CS-TR-2009-3

Towards Session-aware RBAC Administration and Enforcement with XACML

Min Xu and Duminda Wijesekera and Xinwen Zhang and Deshan Cooray

GMU-CS-TR-2009-3.pdf

An administrative role-based access control (ARBAC) model specifies administrative policies over a role-based access control (RBAC) system, where an administrative permission may change an RBAC policy by updating permissions assigned to roles, or assigning/revoking users to/from roles. Consequently, enforcing ARBAC policies over an active access controller while some users are using protected resources would result in conflicts: a policy may be in effect in the RBAC system while being updated by an ARBAC operation. Towards solving this concurrency problem, we propose a session-aware administrative model for RBAC. We show how the concurrency problem can be resolved by enhancing the eXtensible Access Control Markup Language (XACML) reference implementation. In order to do so, we develop an XACML-ARBAC profile to specify ARBAC policies, and enforce these polices by building an ARBAC enforcement module and a session administrative module. The former synchronizes with the evaluation of access control requests. The latter revokes conflicting ongoing user sessions immediately prior to enforcing administrative operations. Experimental shows reasonable performance characteristics of our initial enhancement to Sun's reference implementation

GMU-CS-TR-2009-4

A Control Architecture for Autonomous Mobile Robotics

Randall Steck

GMU-CS-TR-2009-4.pdf

The field of mobile robotics is on the forefront of robotics research around the world. Control architectures for complex autonomous mobile robots have largely settled on hybrid architectures for their suitability at dealing with the opposing forces of planning and reactivity. We present a general, heterogeneous 3-Tier hybrid architecture for control of an autonomous mobile robot and discuss an implementation in the domain of campus navigation. The architecture features a useful organization structure for high-level skills and offers flexible construction options for low-level behavior hierarchies.

GMU-CS-TR-2009-5

Behavior-Based Motion Planning for Group Control

Christopher Vo and Joseph F. Harrison and Jyh-Ming Li

GMU-CS-TR-2009-5.pdf

Despite the large body of work in both motion planning and multi-agent simulation, little work has focused on the problem of planning motion for groups of robots using external "controller" agents. We call this problem the group control problem. This problem is complex because it is highly underactuated, dynamic, and requires multi-agent cooperation. In this paper, we present a variety of new motion planning algorithms based on ESt, RRT, and PRM methods for shepherds to guide flocks of robots through obstacle-filled environments. We show using simulation on several environments that under certain circumstances, motion planning can find paths that are too complicated for naive "simulation only" approaches. However, inconsistent results indicate that this problem is still in need of additional study.

GMU-CS-TR-2009-6

Towards a Universal Text Classifier: Transfer Learning using Encyclopedic Knowledge

Pu Wang and Carlotta Domeniconi and Jian Hu

GMU-CS-TR-2009-6.pdf

Document classification is a key task for many text mining applications. However, traditional text classification requires labeled data to construct reliable and accurate classifiers. Unfortunately, labeled data are seldom available, and often too expensive to obtain. In this work, we propose a universal text classifier, which does not require any labeled training document. Our approach simulates the capability of people to classify documents based on background knowledge. As such, we build a classifier that can effectively group documents based on their content, under the guidance of few words, which we call discriminant words, describing the classes of interest. Background knowledge is modeled using encyclopedic knowledge, namely Wikipedia. Wikipedia's articles related to the specific problem domain at hand are selected, and used during the learning process for predicting labels of test documents. The universal text classifier can also be used to perform document retrieval, in which the pool of test documents may or may not be relevant to the topics of interest for the user. In our experiments with real data we test the feasibility of our approach for both the classification and retrieval tasks. The results demonstrate the advantage of incorporating background knowledge through Wikipedia, and the effectiveness of modeling such knowledge via probabilistic topic modeling. The accuracy achieved by the universal text classifier is comparable to that of a supervised learning technique for transfer learning.

GMU-CS-TR-2009-7

Digging Digg: Comment Mining, Popularity Prediction, and Social Network Analysis

Salman Jamali and Huzefa Rangwala

GMU-CS-TR-2009-7.pdf

Using comment information available from Digg we define a co-participation network between users. We focus on the analysis of this implicit network, and study the behavioral characteristics of users. Using an entropy measure, we infer that users at Digg are not highly focused and participate across a wide range of topics. We also use the comment data and social network derived features to predict the popularity of online content linked at Digg using a classification and regression framework. We show promising results for predicting the popularity scores even after limiting our feature extraction to the first few hours of comment activity that follows a Digg submission.

GMU-CS-TR-2009-8

Towards Generalized Trust in Smart Spaces

Dalal A. Al-Arayed and João P. Sousa

GMU-CS-TR-2009-8.pdf

Existing work in trust management does not satisfactorily provide a combination of functionality and generality. A model that sufficiently addresses the aspects of trust management and is general enough to be applicable in multiple problem scenarios is needed. This paper presents a base trust model that can be used in multiple problem scenarios (content management, service provision, and routing). In addition, smart spaces introduce new issues that are not addressed sufficiently by the available trust management models. The base trust model is designed to be simple to facilitate its enrichment to accommodate the additional requirements of Smart Spaces in the future.

2008

GMU-CS-TR-2008-0

How to Make a Technical Report (Rev. 2)

Sean Luke

This document has been superceded by GMU CS Department Technical Report GMU-CS-TR-2011-0

GMU-CS-TR-2008-0.pdf

This document is accompanied by the following data archive: GMU-CS-TR-2008-0.zip

The following document describes the preferred formatting of technical reports and submission gu idelines. This document itself is formatted according to the preferred formatting rules.

GMU-CS-TR-2008-1

Algorithms for Leximin-Optimal Fair Policies in Repeated Games

Gabriel Balan and Sean Luke and Dana Richards

GMU-CS-TR-2008-1.pdf

Solutions to non-cooperative multiagent systems often require achieving a joint policy which is as fair to all parties as possible. There are a variety of methods for determining the fairest such joint policy. One approach, min fairness, finds the policy which maximizes the minimum average reward given to any agent. We focus on an extension, leximin fairness, which breaks ties among candidate policies by choosing the one which maximizes the second-to-minimum average reward, then the third-to-minimum average reward, and so on. This method has a number of advantages over others in the literature, but has so far been little-used because of the computational cost in employing it to find the fairest policy. In this paper we propose a linear programming based algorithm for computing leximin fairness in repeated games which has a polynomial time complexity given certain reasonable assumptions.

GMU-CS-TR-2008-2

Long-term Fairness with Bounded Worst-Case Losses

Gabriel Balan and Dana Richards and Sean Luke

GMU-CS-TR-2008-2.pdf

How does one repeatedly choose actions so as to be fairest to the multiple beneficiaries of those actions? We examine approaches to discovering sequences of actions for which the worst-off beneficiaries are treated maximally well, then secondarily the second-worst-off, and so on. We formulate the problem for the situation where the sequence of action choices continues forever; this problem may be reduced to a set of linear programs. We then extend the problem to situations where the game ends at some unknown finite time in the future. We demonstrate that an optimal solution is NP-hard, and present two good approximation algorithms.

GMU-CS-TR-2008-3

Life After Self-Healing: Assessing Post-Repair Program Behavior

Michael E. Locasto and Angelos Stavrou and Gabriela F. Cretu

GMU-CS-TR-20008-3.pdf

One promising technique for defending software systems against vulnerabilities involves the use of self-healing. Such efforts, however, carry a great deal of risk because they largely bypass the cycle of human-driven patching and testing used to vet both vendor and internally developed patches. In particular, it is difficult to predict if a repair will keep the behavior of the system consistent with "normal" behavior. Assuring that post-repair behavior does not deviate from normal behavior is a major challenge to which no satisfactory solutions exist. We investigate the feasibility of automatically measuring behavioral deviations in software that has undergone a self-healing repair. We provide a first examination of the problem of assessing a repair's impact on execution behavior, and we define a model for representing post-repair behavior using machine-level intraprocedural control flow. In essence, we advocate performing anomaly detection on an application's execution after it has been attacked and subsequently repaired. Our goal, however, is not to detect an attack, but rather to provide a tool for assisting a system administrator to perform vulnerability triage. Our system can help them discover the relative impact of a repair so that they can begin to track down and analyze the cause of post-repair execution anomalies.

GMU-CS-TR-2008-4

A Simple Proof of the 3-Competitiveness of the Greedy Algorithm for Scheduling Weighted Packets

Fei Li

GMU-CS-TR-2008-4.pdf

We provide a simplified proof of the 3-competitiveness of the greedy algorithm for scheduling weighted packets in the multi-FIFO buffer model. Azar and Richter provided a proof using the zero-one principle (Azar and Richter, STOC 2004). We use a different approach and we hope our approach can lead to an improved FIFO buffering algorithm.

2007

GMU-CS-TR-2007-1

Modeling Diffusion in a Discrete Environment

Adrian Grajdeanu

GMU-CS-TR-2007-1.pdf

The present report details a model for implementing diffusion in a discrete space. Formulated in 2004 in support of artificial development experiments, the model is mathematically justified starting from the diffusion equation. It has physical plausibility and handles well different shaped and sized diffusion neighborhoods in the sense that it achieves isotropic diffusion in spite of the bias introduced by the discretizing grid. It provides one formulation that encapsulates the diffusion neighborhood details and renders it applicable to linear, planar, spatial and even n-dimensional constructs.

ISE-TR-07-01

Graphical Composition of State-Dependent Use Case Behavioral Models

Jon Whittle and Ana Moreira and João Araújo and Rasheed Rabbi

ISE-TR-07-01.pdf

Maintaining a clear separation of concerns throughout the software lifecycle has long been a goal of the software community. Concerns that are separated, however, must be composed at some point. This paper presents a technique for keeping statedependent use cases separate throughout the software modeling process and a method for composing statedependent use cases. The composition method is based on the graph transformations formalism. This provides a rich yet user-friendly way of composing state dependent use cases that is built on solid foundations. To evaluate our approach, it has been applied to seven student design solutions. Each solution was originally developed using a use case-driven methodology and was reengineered to evaluate whether our technique could have been applied. The findings are that it is possible to maintain the separation of state-dependent use cases during software design and, furthermore, that expressive model composition methods are necessary to do this in practice.

ISE-TR-07-02

Evolving VirtuE

Alessandro D'Atri and Amihai Motro

ISE-TR-07-02.pdf

One of the most attractive aspects of virtual databases is their agility: the inherent ability to adapt and evolve in response to changing market conditions. Evolving VirtuE is a formal framework within which such agility can be realized. Through the concepts of enterprise time, activity logging, and log mining, the recent behavior and performance of an enterprise may be studied, and corresponding evolutionary steps can be induced. These steps may be intended to benefit the operation of individual enterprise members, as well the enterprise as a whole. In addition, we also examine enterprise creation, a period of rapid evolution that concludes when the enterprise reaches stability and begins transacting its business activities.

ISE-TR-07-03

XACML Policies for Exclusive Resource Usage

Vijayant Dhankhar and Saket Kaushik and Duminda Wijesekera

ISE-TR-07-03.pdf

The extensible access control markup language (XACML) is the standard access control policy specification language of the World Wide Web. XACML does not provide exclusive accesses to globally resources, and we do so by enhancing the policy execution framework with locking.

ISE-TR-07-04

Grid-VirtuE: A Layered Architecture for Grid Virtual Enterprises

Alfredo Cuzzocrea and Alessandro D'Atri and Andrea Gualtieri and Amihai Motro and Domenico Saccà

ISE-TR-07-04.pdf

A grid virtual enterprise is a community of independent enterprises concerned with a particular sector of the economy. Its members (nodes) are small or medium size enterprises (SME) engaged in bilateral transactions. An important principle of a grid virtual enterprise is the lack of any global "guiding force" with each member of the community making its own independent decisions. In this paper we describe Grid-VirtuE, a three-layer architecture for grid virtual enterprises. The top layer of the architecture, representing its ultimate purpose, is an environment in which grid virtual enterprises can be modeled and implemented. This layer is supported by middleware infrastructure for grids, providing a host of grid services, such as node-to-node communication, bilateral transactions, and data collection. The bottom layer is essentially a distributed data warehouse for storing, sharing and analyzing the large amounts of data generated by the grid. Among other functionalities, the warehouse handles the dissemination of data among the members of the grid; it confronts issues of data magnitude with an aging mechanism that aggregates old data at a lower level of detail; and it incorporates privacy-preserving features that retain the confidentiality of individual members. Warehouse information is also used for data and process mining, aimed at analyzing the behavior of the enterprise, and subsequently inducing evolutionary changes that will improve its performance.

ISE-TR-07-05

Fuzzy Clustering of Patterns Represented by Pairwise Dissimilarities

Maurizio Filippone

ISE-TR-07-05.pdf

Clustering is the problem of grouping objects on the basis of a similarity measure between them. This paper considers the approaches belonging to the K-means family, in particular those based on fuzzy memberships. When patterns are represented by means of non-metric pairwise dissimilarities, these methods cannot be directly applied, since they are not guaranteed to converge. Symmetrization and shift operations have been proposed, to transform the dissimilarities between patterns from non-metric to metric. It has been shown that they modify the K-means objective function by a constant, that does not influence the optimization procedure. Some fuzzy clustering algorithms have been extended, in order to handle patterns described by means of pairwise dissimilarities. The literature, however, lacks of an explicit analysis on what happens to K-means style fuzzy clustering algorithms, when the dissimilarities are transformed to let them become metric. This paper shows how the objective functions of four clustering algorithms based on fuzzy memberships change, due to dissimilarities transformations. The experimental analysis conducted on a synthetic and a real data set shows the effect of the dissimilarities transformations for four clustering algorithms based on fuzzy memberships.

ISE-TR-07-06

Weighted Cluster Ensembles: Methods and Analysis

Carlotta Domeniconi and Muna Al-Razagan

ISE-TR-07-06.pdf

Cluster ensembles offer a solution to challenges inherent to clustering arising from its ill-posed nature. Cluster ensembles can provide robust and stable solutions by leveraging the consensus across multiple clustering results, while averaging out emergent spurious structures that arise due to the various biases to which each participating algorithm is tuned. In this paper, we address the problem of combining multiple weighted clusters which belong to different subspaces of the input space. We leverage the diversity of the input clusterings in order to generate a consensus partition that is superior to the participating ones. Since we are dealing with weighted clusters, our consensus functions make use of the weight vectors associated with the clusters. We demostrate the effectiveness of our techniques by running experiments with several real datasets, including high dimensional text data. Furthermore, we investigate in depth the issue of diversity and accuracy for our ensemble methods. Our analysis and experimental results show that the proposed techniques are capable of producing a partition that is as good as or better than the best individual clustering.

ISE-TR-07-07

Query Consolidation: Interpreting a Set of Independent Queries Using a Multidatabase Architecture in the Reverse Direction

Aybar C. Acar and Amihai Motro

ISE-TR-07-07.pdf

We introduce the problem of query consolidation, which seeks to interpret a set of disparate queries submitted to independent databases with a single "global" query. This problem has multiple applications, from improving database design to protecting information from a seemingly innocuous set of apparently unrelated queries. The problem exhibits attractive duality with the much-researched problem of query decomposition, which has been addressed intensively in the context of multidatabase environments: How to decompose a query submitted to a virtual database into a set of local queries that are evaluated in individual databases. We set the new problem in the architecture of a canonical multidatabase system, using it in the reverse direction. The process incorporates two steps where multiplicity of solutions must be considered: At one point the system must infer the most likely set of equi-joins for a set of relations; at another point it must discover the most likely selection constraints that would be applied to a relation. In each case we develop a procedure that ranks solutions according to their perceived likelihood. The final result is therefore a ranked list of suggested consolidations.

2006

ISE-TR-06-01

Detecting outliers using transduction and statistical significance testing

Daniel Barbará and Carlotta Domeniconi and James P. Rogers

ISE-TR-06-01.pdf

Finding points that are outliers with respect to a set of other points is an important task in data mining. Outlier detection can uncover important anomalies in fields like intrusion detection and fraud analysis. In data streaming, the presence of a large number of outliers indicates that the underlying process that is generating the data is undergoing significant changes and the models that attempt to characterize it need to be updated. Although there has been a significant amount of work in outlier detection, most of the algorithms in the literature resort to a particular definition of what an outlier is (e.g., density-based), and use thresholds to detect them. In this paper we present a novel technique to detect outliers that does not impose any particular definition for them. The test we propose aims to diagnose whether a given point is an outlier with respect to an existing clustering model (i.e., a set of points partitioned in groups). However, the test can also be successfully utilize to recognize outliers when the clustering information is not available. This test is based on Transductive Confidence Machines, which have been previously proposed as a mechanism to provide individual confidence measures on classification decisions. The test uses hypothesis testing to prove or disprove whether a point is fit to be in each of the clusters of the model. We demonstrate, experimentally, that the test is highly robust, and produces very few misdiagnosed points, even when no clustering information is available. We also show that the test can be succesfully applied to identify outliers present inside a data set for which no other information is available, thereby provinding the user with a clean data set to identify future outliers. Our experiments also show that even if the data set used to identify further outliers is contaminated with some outliers, the test can perform succesfully.

ISE-TR-06-02

Specifying Precise Use Cases

Jon Whittle

ISE-TR-06-02.pdf

Despite attempts to formalize the semantics of use cases, they remain an informal notation. The informality of use cases is both a blessing and a curse. Whilst it admits an easy learning curve and enables communication between software stakeholders, it is also a barrier to the application of automated methods for test case generation, validation or simulation. This paper presents a precise way of specifying use cases based on a three-level modeling paradigm strongly influenced by UML. The formal syntax and semantics of use case charts are given, along with an example that illustrates how they can be used in practice.

ISE-TR-06-03

On the Testing Maturity of Software Producing Organizations: Detailed Data

Mats Grindal and Jeff Offutt and Jonas Mellin

ISE-TR-06-03.pdf

This paper presents data from a study of the current state of practice of software testing. Test managers from twelve different software organizations were interviewed. The interviews focused on the amount of resources spent on testing, how the testing is conducted, and the knowledge of the personnel in the test organizations. The data indicate that the overall test maturity is low. Test managers are aware of this but have trouble improving. One problem is that the organizations are commercially successful, suggesting that products must already be “good enough.” Also, the current lack of structured testing in practice makes it difficult to quantify the current level of maturity and thereby articulate the potential gain from increasing testing maturity to upper management and developers.

ISE-TR-06-04

Locally Adaptive Metrics for Clutering High Dimensional Data

Carlotta Domeniconi and Dimitrios Gunopulos and Sheng Ma and Dimitris Papadopoulos and Bojun Yan

ISE-TR-06-04.pdf

Clustering suffers from the curse of dimensionality, and similarity functions that use all input features with equal relevance may not be effective. We introduce an algorithm that discovers clusters in subspaces spanned by different combinations of dimensions via local weightings of features. This approach avoids the risk of loss of information encountered in global dimensionality reduction techniques, and does not assume any data distribution model. Our method associates to each cluster a weight vector, whose values capture the relevance of features within the corresponding cluster. We experimentally demonstrate the gain in perfomance our method achieves with respect to competitive methods, using both synthetic and real datasets. In particular, our results show the feasibility of the proposed technique to perform simultaneous clustering of genes and conditions in gene expression data, and clustering of very high dimensional data such as text data.

ISE-TR-06-05

Policy Transformations for Preventing Leakage of Sensitive Information in Email Systems

Saket Kaushik and William Winsborough and Duminda Wijesekera and Paul Ammann

ISE-TR-06-05.pdf

In this paper we identify an undesirable side-effect of combining different email-control mechanisms for protection from unwanted messages, namely, leakage of recipients' private information to message senders. This is because some email-control mechanisms like bonds, graph-turing tests, etc., inherently leak information, and without discontinuing their use, leakage channels cannot be closed. We formalize the capabilities of an attacker and show how she can launch guessing attacks on recipient's mail acceptance policy that utilizes leaky mechanism in its defence against unwanted mail. As opposed to the classical Dolev-Yao attacker and its extensions, attacker in our model guesses the contents of a recipient's private information. The use of leaky mechanisms allow the sender to verify her guess. We assume a constraint logic programming based policy language for specification and evaluation of mail acceptance criteria and present two different program transformations that can prevent guessing attacks while allowing recipients to utilize any email-control mechanism in their policies.

ISE-TR-06-06

Categorization of Unlabeled Documents driven by Word Weighting

Ning Kang and Carlotta Domeniconi and Daniel Barbará

ISE-TR-06-06.pdf

In text mining we often have to handle large document collections. The labeling of such large corpuses of documents is too expensive and impractical. Thus, there is a need to develop (unsupervised) clustering techniques for text data, where the distributions of words can vary significantly from one category to another. The vector space model of documents easily leads to a 30000 or more dimensions. In such high dimensionality, the effectiveness of any distance function that equally uses all input features is severely compromised. Furthermore, it is expected that different words may have different degrees of relevance for a given category of documents, and a single word may have a different importance across different categories. In this paper we first propose a global unsupervised feature selection approach for text, based on frequent itemset mining. As a result, each document is represented as a set of words that co-occur frequently in the given corpus of documents. We then introduce a locally adaptive clustering algorithm, designed to estimate (local) word relevance and, simultaneously, to group the documents. We present experimental results to demonstrate the feasibility of our approach. Furthermore, the analysis of the weights credited to terms provide evidence that the identified keywords can guide the process of label assignment to clusters. We take into consideration both spam email filtering and general classification datasets. Our analysis of the distribution of weights in the two cases provides insights on how the spam problem distinguishes from the general classification case.

ISE-TR-06-07

An Algebra for Composing Ontologies

Sasket Kaushik and Csilla Farkas and Duminda Wijesekera and Paul Ammann

ISE-TR-06-07.pdf

Ontologies are used as a means of expressing agreements to a vocabulary shared by a community in a coherent and consistent community members in a decentralized manner. As it happens on the internet, ontologies are created by community members in a decentralized manner, requiring that they be merged before being used by the community. We develop an algabra to do so in the Resource Discription Framework (RDF). To provide formal semantics of the proposed algebraic property names, while ontology C has been composed by ooperators, we type a fragment of the RDF syntax.

ISE-TR-06-08

BPEL Orchestration of Secure WebMail

Saket Kaushik and Duminda Wijesekera and Paul Ammann

ISE-TR-06-08.pdf

Web Services offer an excellent opportunity to redesign and replace old and insecure applications with more flexible and robust ones. WSEmail is one such application that replaces conventional message delivery systems with a family of Web Services that achieve the same goal. In this paper we analyze the existing WSEmail specification against the standard set of use cases (and misuse cases) supported (resp. prevented) by SMTP implementations --- the current default message delivery infrastructure --- and augment it with several missing pieces. In addition, we show how the WSEmail family of Web Services, specified in WSDL, can be orchestrated using BPEL. Finally, we provide a synchronization analysis of our WSEmail orchestration and show its correctness.

ISE-TR-06-09

Kernel Optimization using Pairwise Constraints for Semi-Supervised Clustering

Bojun Yan and Carlotta Domeniconi

ISE-TR-06-09.pdf

A critical problem related to kernel-based methods is the selection of an optimal kernel for the problem at hand. The kernel function in use must conform with the learning target in order to obtain meaningful results. While solutions to estimate optimal kernel functions and their parameters have been proposed in a supervised setting, the problem presents open challenges when no labeled data are provided, and all we have available is a set of pairwise must-link and cannot-link constraints. In this paper we address the problem of optimizing the kernel function using pairwise constraints for semi-supervised clustering. To this end we derive a new optimization criterion to automatically estimate the optimal parameters of composite Gaussian kernels, directly from the data and the given constraints. We combine the optimal kernel function computed by our technique with a recently introduced semi-supervised kernel-based algorithm to demonstrate experimentally the effectivess of our approach. The results show that our method enables the practical utilization of powerful kernel-based semi-supervised clustering approaches by providing a mechanism to automatically set the involved critical parameters.

ISE-TR-06-10

Trust-based Secure Positive Train Control (PTC) Interoperation

Mark Hartong and Rajini Goel and Duminda Wijesekera

ISE-TR-06-10.pdf

Positive Train Control (PTC) is a wireless control system ensuring railroad safety by enforcing train separation, speed enforcement, roadway worker protection and other safety functions. Due to shared track rights over each-other's tracks in North America, company A's trains must be safely operated by company B's crew on company C's tracks, requiring different PTC systems to securely interoperate with each other. For a security framework to ensure that, we propose using a trust management system with certificates and over the air re-keying (OTAR). Back of the envelope calculations show that our solution meets timing needs of PTC.

2005

GMU-CS-TR-2005-1

Reachability Testing of Concurrent Programs

Jeff Lei and Richard Carver

GMU-CS-TR-2005-1.pdf

One approach to testing concurrent programs, called reachability testing, generates synchronization sequences automatically, and on-the-fly, without constructing any static models. In this paper, we present a general execution model for concurrent programs that allows reachability testing to be applied to several commonly used synchronization constructs. We also present a new method for performing reachability testing. This new method guarantees that every partially-ordered synchronization sequence will be exercised exactly once without having to save any sequences that have already been exercised. We describe a prototype reachability testing tool called RichTest and report some empirical results, including a comparison between RichTest and a partial order reduction based tool called VeriSoft. RichTest performed significantly better for the programs in our study.

GMU-CS-TR-2005-2

Probabilistic Location Recognition using Reduced Feature Set

Fayin Li and Jana Kosecka

GMU-CS-TR-2005-2.pdf

The localization capability is central to basic navigation tasks and motivates development of various visual navigation systems. These systems can be used both as navigational aids for visually impaired or in the context of autonomous mobile systems. In this paper we describe a two stage approach for localization in indoor environments. In the first stage, the environment is partitioned into several locations, each characterized by a set of scale-invariant keypoints and their associated descriptors. In the second stage the keypoints of the query view are integrated probabilistically yielding an estimate of most likely location. The emphasis of our approach in the environment model acquisition stage is on the selection of discriminative features, best suited for characterizing individual locations. The high recognition rate is maintained with only 10% of the originally detected features, yielding a substantial speedup in recognition. The ambiguities due to the self-similarity and dynamic changes in the environment are resolved by exploiting spatial relationships between locations captured by Hidden Markov Model. Once the most likely location is determined, the relative pose of the camera with respect to the reference view can be computed.

GMU-CS-TR-2005-3

Can Good Learners Always Compensate for Poor Learners?

Keith Sullivan and Liviu Panait and Gabriel Balan and Sean Luke

GMU-CS-TR-2005-3.pdf

Can a good learner compensate for a poor learner when paired in a coordination game? Previous work has given an example where a special learning algorithm (FMQ) is capable of doing just that when paired with a specific capable algorithm even in games which stump the poorer algorithm when paired with itself. In this paper, we argue that this result is not general. We give a straightforward extension to the coordination game in which FMQ cannot compensate for the lesser algorithm. We also provide other problematic pairings, and argue that another highquality algorithm cannot do so either.

GMU-CS-TR-2005-4

Lenience Towards Teammates Helps in Cooperative Multiagent Learning

Liviu Panait and Keith Sullivan and Sean Luke

GMU-CS-TR-2005-4.pdf

GMU-CS-TR-2005-5

Selecting Informative Actions Improves Cooperative Multiagent Learning

Liviu Panait and Sean Luke

GMU-CS-TR-2005-5.pdf

In concurrent cooperative multiagent learning, each agent in a team simultaneously learns to improve the overall performance of the team, with no direct control over the actions chosen by its teammates. An agent's action selection directly influences the rewards received by all the agents; this results in a co-adaptation among the concurrent learning processes. Co-adaptation can drive the team towards suboptimal solutions because agents tend to select those actions that are rewarded better, without any consideration for how such actions may affect the search of their teammates. We argue that to counter this tendency, agents should also prefer actions that inform their teammates about the structure of the joint search space in order to help them choose from among various action options. We analyze this approach in a cooperative coevolutionary framework, and we propose a new algorithm, oCCEA, that highlights the advantages of selecting informative actions. We show that oCCEA generally outperforms other cooperative coevolution algorithms on our test problems.

GMU-CS-TR-2005-6

Nonparameteric Estimation of Multiple Structures with Outliers

Wei Zhang and Jana Kosecka

GMU-CS-TR-2005-6.pdf

In many computer vision problems, several instances of a particular model need to be recovered from the noisy data. In such cases one is faced with the problem of simultaneous estimation of the number of models and their parameters. This problem becomes difficult as the measurement noise in the data increases and the data are further corrupted by outliers. This is especially the case in a variety of motion estimation problems, where the displacement between the views is large and the process of establishing correspondences is difficult. In this paper we propose a novel nonparametric sampling based method for solving this problem. The main novelty of the proposed method lies in the analysis of the distribution of residuals of individual data points with respect to the set of hypotheses, generated by a RANSAC-like sampling process. We will show that the modes of the residual distributions directly reveal the presence of multiple models and facilitate the recovery of the individual models, without making any assumptions about the distribution of the outliers or the noise process. The proposed approach is capable of handling data with large fraction of outliers. Experiments with both synthetic and real data are presented to demonstrate the effectiveness of the proposed approach.

GMU-CS-TR-2005-7

A New Inliner Identification Scheme for Robust Estimation Problems

Wei Zhang and Jana Kosecka

GMU-CS-TR-2005-7.pdf

The data in vision problems, often heavily contaminated by outliers, call for efficient robust techniques to identify inliers for correct estimation. RANSAC algorithm is a frequently used robust estimator for computer vision problems. In traditional RANSAC scheme, when data contain significant fraction of outliers, large number of samples is needed in order to obtain at least one outlier free sample. In addition to that, each hypothesis generated from the samples is typically evaluated using all data points, which further lowers the efficiency. In this paper, we propose a novel hypothesis evaluation scheme, which enables efficient classification of the data points as outliers and inliers, while requiring a small fixed number of samples. The method is based on the observation that for each data point the properties of the distribution of the residuals with respect to the generated hypotheses reveal whether the point is an outlier or inlier. The problem of inlier/outlier identification can then be formulated as a classification problem. We demonstrate the proposed method on motion estimation problems with large fraction of outliers on both synthetic and real data.

GMU-CS-TR-2005-8

Evolving Families of Designs Using L-Systems

Elena Popovici

GMU-CS-TR-2005-8.pdf

Evolutionary computation has proven its utility in automating the process of engineering design. However, little attention has been paid to the scalability of generated designs, which is an important issue. This paper addresses this issue and proves the viability of evolving families of designs using parameterized L-Systems as a representation. The rest of the paper is organized as follows: first, an introduction as to why scalability is important and difficult; second, a review of existing work on evolving L-Systems; the third section contains a description of the application domain used for this feasibility study, details on the L-Systems and the EA used; section four presents experiments conducted and their results; the paper ends with a discussion, drawing conclusions and setting goals for future work.

ISE-TR-05-01

Intensional Encapsulations of Database Subsets via Genetic Programming

Aybar C. Acar and Amihai Motro

ISE-TR-05-01.pdf

Finding intensional encapsulations of database subsets is the inverse of query evaluation. Whereas query evaluation transforms an intensional expression (the query) to its extension (a set of data values), intensional encapsulation assigns an intensional expression to a given set of data values. We describe a method for deriving intensional representations of subsets of records in large database tables. Our method is based on the paradigm of genetic programming. It is shown to achieve high accuracy and maintain compact expression size, while requiring cost that is acceptable to all applications, but those that require instantaneous results. Intensional encapsulation has a broad range of applications including cooperative answering, information integration, security and data mining.

ISE-TR-05-02

Blind Custodians: A Database Service Architecture that Supports Privacy without Encryption

Amihai Motro and Francesco Parisi-Presicce

ISE-TR-05-02.pdf

We describe an architecture for a database service that does not assume that the service provider can be trusted. Unlike other architectures that address this problem, this architecture, which we call blind custodians, does not rely on encryption. Instead, it offers confidentiality by means of information dissociation: The server only stores “fragments” of information that are considered safe (i.e., each fragment does not violate privacy), while the client stores the associations between the fragments that are necessary to reconstruct the information. We argue that this architecture allows satisfactory confidentiality, while offering two important advantages: (1) It does not restrict the types of queries that can be submitted by clients (as encryption-based methods invariably do), and (2) it requires only light processing at the client, assigning the bulk of the processing to the server (as befits a true service). Moreover, the architecture permits flexible control over the level of confidentiality that should be maintained (at the cost of additional overhead).

ISE-TR-05-03

Optimal Manufacturing Trees

Amihai Motro and Alessandro D'Atri

ISE-TR-05-03.pdf

We consider the process of manufacturing a product from a given number of elementary components. By assembling intermediate products, the target product can be manufactured in a variety of processes, each modeled by a tree. We are interested in manufacturing turnaround: the time between receiving an order at the root and its completion. We express the turnaround time of each manufacturing process (tree) with a formula that incorporates three parameters: the time required to create elementary components, the time required to assemble a product from its components and the time required to deliver the product to its procurer (another manufacturer). We show that this turnaround formula is optimized in a manufacturing process that corresponds to a perfect (or nearly perfect) tree. The degree of the optimal tree (i.e., the ideal number of components in each sub-assembly) is shown to be independent of the number of elementary components, suggesting that in each manufacturing environment there is an ideal assembly size, which is optimal for the manufacturing of products of any scale.

ISE-TR-05-04

A Comparative Evaluation of Tests Generated from Different UML Diagrams: Diagrams and Data

Aynur Abdurazik and Jeff Offutt and Andrea Baldini

ISE-TR-05-04.pdf

This paper presents a single project experiment on the fault revealing capabilities of model-based test sets. The tests are generated from UML statecharts and UML sequence diagrams. This experiment found that the statechart test sets did better at revealing unit level faults than the sequence diagram test sets, and the sequence diagram test sets did better at revealing integration level faults than the statechart test sets. The statecharts also resulted in more test cases than the sequence diagrams. The experiment showed that model-based testing can be used to systematically generate test data and indicates that different UML models can play different roles in testing.

ISE-TR-05-05

VirtuE: a Formal Model of Virtual Enterprises for Information Markets

Alessandro D'Atri and Amihai Motro

ISE-TR-05-05.pdf

A vital part of a modern economy is an information market. In this market, information products are being traded in countless ways. Information is bought, modified, integrated, incorporated into other products, and then sold again. Often, the manufacturing of an information product requires the collaboration of several participants. A virtual enterprise is a community of business entities that collaborate on the manufacturing of complex products. This collaboration is often ad hoc, for a specific product only, after which the virtual enterprise may dismantle. The virtual enterprise paradigm is particularly appealing for modeling collaborations for manufacturing information products, and in this paper we present a new model, called VirtuE, for modeling such activities. VirtuE has three principal components. First, it defines a distributed infrastructure with concepts such as members, products, inventories, and production plans. Second, it defines transactions among members, to enable collaborative production of complex products. Finally, it provides means for the instrumentation of enterprises, to measure their performance and to govern their behavior.

ISE-TR-05-06

CoJava: A Unified Language for Simulation and Optimization

Alexander Brodsky and Hadon Nash

ISE-TR-05-06.pdf

We have proposed and implemented the language CoJava, which offers both the advantages of simulation-like process modeling in Java, and the capabilities of true decision optimization. By design, the suntax of CoJava is identical to the programming language Java, extended with special constructs to (1) make a non-deterministic choice of a numeric value, (2) assert a constraint, and (3) designate a program variable as the objective to be optimized. The semantics of CoJava interprets a program as an optimal nondeterministic execution path, namely, a path that (1) satisfies the range conditions in the choice statements, (2) satisfies the assert-constraint statements, and (3) produces the optimal value in a designated program variable, among all execution paths that satisfy (1) and (2). To run a CoJava program amounts to first finding an optimal execution path, and then procedurally executing it. We have developed a CoJava constraint compiler based on a reduction of the problem of finding an optimal execution trace to a standard symbolic formulation, reinterpreting Java code as a symbolic constraint construction, and solving the resulting optimization problem on an external solver. To demonstrate the power of CoJava, we have implemented a realistic problem in the area of robot arm control in CoJava. The robot arm is constructed using self-contained components implemented as CoJava classes, that model robot's arm movements based on Newton's laws.

2004

GMU-CS-TR-2004-1

Tunably Decentralized Algorithms for Cooperative Target Observation

Sean Luke and Keith Sullivan and Gabriel Catalin Balan and Liviu Panait

GMU-CS-TR-2004-1.pdf

Multi-agent problem domains may require distributed algorithms for a variety of reasons: local sensors, limitations of communication, and availability of distributed computational resources. In the absence of these constraints, centralized algorithms are often more efficient, simply because they are able to take advantage of more information. We introduce a variant of the cooperative target observation domain which is free of such constraints. We propose two algorithms, inspired by K-means clustering and hill-climbing respectively, which are scalable in degree of decentralization. Neither algorithm consistenly outperforms the other across over all problem domain settings. Surprisingly, we find that hill-climbing is sensitive to degree of decentralization, while K-means is not. We also experiment with a combination of the two algorithms which draws strength from each.

GMU-CS-TR-2004-2

Location Recognition, Global Localization and Relative Positioning Based on Scale-Invariant Keypoints

Jana Kosecka and Xialong Yang

GMU-CS-TR-2004-2.pdf

The localization capability of a mobile robot is central to basic navigation and map building tasks. We describe a probabilistic environment model which facilitates global localization scheme by means of location recognition. In the exploration stage the environment is partitioned into a class of locations, each characterized by a set of scale-invariant keypoints. The descriptors associated with these keypoints can be robustly matched despite changes in contrast, scale and viewpoint. We demonstrate the efficacy of these features for location recognition, where given a new view the most likely location from which this view came is determined. The misclassifications due to dynamic changes in the environment or inherent appearance ambiguities are overcome by exploiting neighborhood relationships captured by a Hidden Markov Model. We report the recognition performance of this approach on an indoor environment consisting of eighteen locations and discuss the suitability of this approach for a more general class of recognition problems. Once the most likely location has been determined we show how to compute the relative pose between the representative view and the current view.

GMU-CS-TR-2004-3

Experiments in Building Recognition

Wei Zhang and Jana Kosecka

GMU-CS-TR-2004-3.pdf

In this report we study the problem of building recognition. Given a database of building images, we are interested in classifying a novel view of a building by finding the closest match from the database. We examine the efficacy of local scale-invariant features and their associated descriptors as representations of building appearance and use a simple voting scheme in the recognition phase.

GMU-CS-TR-2004-4

A New Algorithm for Reachability Testing of Concurrent Programs

Jeff Lei and Richard Carver

This document has been superceded by GMU CS Department Technical Report GMU-CS-TR-2005-1

GMU-CS-TR-2004-4.pdf

Concurrent programs exhibit non-deterministic behavior, which makes them difficult to test. One approach to testing concurrent programs, called reachability testing, generates test sequences automatically, and on-the-fly, without constructing any static models. This approach guarantees that every partially-ordered synchronization sequence of a program with a given input will be exercised exactly once. Unfortunately, in order to make this guarantee all existing reachability testing algorithms need to save and search through the history of test sequences that have already been exercised, which is impractical for many applications. In this paper, we present a new reachability testing algorithm that does not save the history of test sequences. This new algorithm guarantees that every partially-ordered synchronization sequence is exercised at least once for an arbitrary program and exactly once for a program that satisfies certain conditions. We also describe a reachability testing tool called RichTest. Our empirical studies with RichTest indicate that our new algorithm exercises every synchronization sequence exactly once for many applications.

ISE-TR-04-01

Modeling and Testing of Dynamic Aspects of Web Applications

Ye Wu, Jeff Offutt and Xiaochen Du

ISE-TR-04-01.pdf

Web software applications have become complex, sophisticated programs that are based on novel computing technologies. Although powerful, these technologies bring new challenges to developers and testers. Checking static HTML links is no longer suffcient; Web applications must be evaluated as complex software products. This paper focuses on two unique aspects of Web applications, an extremely loose form of coupling that features dynamic integration and the ability that users have to directly change the potential flow of execution. Taken together, these allow the control flow to vary with each execution, and means the possible control flows cannot be determined statically. Thus we cannot perform standard analysis techniques that are fundamental to many software engineering activities. This paper presents a new theoretical model of new couplings and the dynamic flow of control of Web applications. This model is based on atomic sections, which allow tools to build the equivalent of a control flow graph for Web applications. The model is used to propose new test criteria for Web applications, and results are shown from a case study on a moderate-size application.

ISE-TR-04-02

A Framework for Dynamic Semantic Web Services Management

Randy Howard and Larry Kerschberg

ISE-TR-04-02.pdf

The concept of Web services as a means of dynamically discovering, negotiating, composing, executing and managing services to materialize enterprise-scale workflow is an active research topic. However its realization has thus far been elusive. Existing approaches involve many disparate concepts, frameworks and technologies. What is needed is a comprehensive and overarching framework that handles the processing and workflow requirements of Virtual Enterprises, maps them to a collection of service-oriented tasks, dynamically configures these tasks from available services, and manages the choreography and execution of these services. The goal is to add semantics to Web services to endow them with capabilities currently lacking in the literature, but necessary for their successful deployment in future systems. This paper introduces such a framework, called the Knowledge-based Dynamic Semantic Web Services (KDSWS) Framework, that addresses in an integrated end-to-end manner, the life-cycle of activities involved in preparing, publishing, requesting, discovering, selecting, configuring, deploying, and delivering Semantic Web Services. In particular, the following issues are addressed: 1) semantic specification of services capabilities including quality of service, trust, and security; 2) transaction control and workflow management; and 3) resource management, interoperation and evolution of the Virtual Enterprise. Two models and their associated languages are introduced to specify the features for these enhanced services: 1) the KDSWS Meta-Model, and 2) the KDSWS Process Language. Both are based on the Knowledge/Data Model. This integrated and comprehensive approach provides a unified end-to-end approach to enable dynamically-composable, semantically-rich, service-oriented systems of Web services.

ISE-TR-04-03

A Controlled Experimental Evaluation of Test Cases Generated from UML Diagrams

Aynur Abdurazik and Jeff Offutt and Andrea Baldini

ISE-TR-04-03.pdf

This paper presents a single project experiment on the fault revealing capabilities of test sets that are generated from UML statecharts and sequence diagrams. The results of this experiment show that the statechart test sets do better at revealing unit level faults than the sequence diagram test sets, and the sequence diagram test sets do better at revealing integration level faults than the statechart test sets. The experimental data also show that the statecharts result in more test cases than the sequence diagrams. The experiment showed that UML diagrams can be used in generating test data systematically, and different UML diagrams can play different roles in testing.

ISE-TR-04-04

Integration Testing of Object-oriented Components Using FSMS: Theory and Experimental Details

Leonard Gallagher and Jeff Offutt

ISE-TR-04-04.pdf

In object-oriented terms, one of the goals of integration testing is to ensure that messages from objects in one class or component are sent and received in the proper order and have the intended effect on the state of external objects that receive the messages. This research extends an existing single-class testing technique to integration testing. The previous method models the behavior of a single class as a finite state machine, transforms that representation into a data flow graph that explicitly identifies the definitions and uses of each state variable of the class, and then applies conventional data flow testing to produce test case specifications that can be used to test the class. This paper extends those ideas to inter-class testing by developing flow graphs and tests for an arbitrary number of classes and components. It introduces flexible representations for message sending and receiving among objects and allows concurrency among any or all classes and components. A second major result is the introduction of a novel approach to performing data flow analysis. Data flow graphs are stored in a relational database, and database queries are used to gather def-use information. This approach is conceptually simple, mathematically precise, quite powerful, and general enough to be used for traditional data flow analysis. This testing approach relies on finite state machines, database modeling and processing techniques, and algorithms for analysis and traversal of directed graphs. A proof-of-concept implementation is used to illustrate how the approach works on an extended example.

ISE-TR-04-05

Combination Testing Strategies: A Survey

Mats Grindal and Jeff Offutt and Sten F. Andler

ISE-TR-04-05.pdf

Combination strategies are test-case selection methods where test cases are identified by combining values of the different test object input parameters based on some combinatorial strategy. This survey presents 15 different combination strategies, and covers more than 30 papers that focus on one or several combination strategies. We believe this collection represents most of the existing work performed on combination strategies. This survey describes the basic algorithms used by the combination strategies. Some properties of combination strategies, including coverage criteria and theoretical bounds on the size of test suites, are also included in this description. This survey also includes a subsumption hierarchy that attempts to relate the various coverage criteria associated with the identified combination strategies. Finally, this survey contains short summaries of all the papers that cover combination strategies.

2003

GMU-CS-TR-2003-0

How to Make a Technical Report

Sean Luke and Jana Kosecka

This document has been superceded by GMU CS Department Technical Report GMU-CS-TR-2011-0

GMU-CS-TR-2003-0.pdf

This document is accompanied by the following data archive: GMU-CS-TR-2003-0.zip

The following document describes the preferred formatting of technical reports and submission guidelines. This document itself is formatted according to the preferred formatting rules.

GMU-CS-TR-2003-1

Cooperative Multi-Agent Learning: The State of the Art

Liviu Panait and Sean Luke

GMU-CS-TR-2003-1.pdf

Cooperative multi-agent systems problems are ones in which several agents attempt, through their interaction, to jointly solve tasks or to maximize their utility. Due to the interactions among the agents, multi-agent problem complexity can rise rapidly with the number of agents or their behavioral sophistication. The challenge this presents to the task of programming solutions to such problems has spawned increasing interest in machine learning (ML) techniques to automate the search and optimization process. We provide a broad survey of the cooperative multiagent learning literature. Previous surveys of this area have largely focused on issues common to specific subareas (for example, reinforcement learning or robotics). In this survey we attempt to draw from multi-agent learning work in a spectrum of areas, including reinforcement learning, evolutionary computation, game theory, complex systems, agent modeling, and robotics. We find that this broad view leads to a division of the work into two categories, each with its own special issues: applying a single learner to discover joint solutions to multi-agent problems (team learning), or using multiple simultaneous learners, often one per agent (concurrent learning). Additionally, we discuss two important topics independent of these categories: problem decomposition and communication. We conclude with a presentation of multi-agent learning problem domains, a discussion of certain challenge topics (scalability and adaptive dynamics), and a list of multi-agent learning resources.

ISE-TR-03-01

Establishing Pair-wise Keys For Secure Communication in Ad Hoc Networks: A Probabilistic Approach

Sencun Zhu and Shouhuai Xu and Sanjeev Setia and Sushil Jajodia

ISE-TR-03-01.pdf

We present a scalable distributed protocol that enables two mobile nodes in an ad hoc network to establish a pairwise shared key on the fly, without requiring the use of a on-line key distribution center. The design of our protocol is based on a novel combination of two techniques --- probabilistic key sharing and threshold secret sharing. Our protocol is scalable since every node only needs to possess a small number of keys, independent of the network size, and it is computationally efficient because it only relies on symmetric key operations. We show that a pairwise key established between two nodes using our protocol is secure against a collusive attack by a certain number of compromised nodes, and that our protocol can be parameterized to meet the appropriate levels of performance, security and storage for the application under consideration.

ISE-TR-03-02

Adding Reliable and Self-Healing Key Distribution to the Subset Difference Group Rekeying Method for Secure Multicast

Sencun Zhu and Sanjeev Setia and Sushil Jajodia

ISE-TR-03-02.pdf

The Subset Difference Rekeying (SDR) method is the most efficient stateless group rekeying method proposed in the literature. We study two important issues related to the SDR method. First, we address the issue of reliable rekey transport for SDR. We present a key distribution scheme, called FEC-BKR, that enables members to receive the current group key in a reliable and timely fashion despite packet losses in the network. Through simulation, we show that in most scenarios, FEC-BKR outperforms previously proposed schemes for reliable rekey transport. Second, we address the issue of self-healing key distribution for SDR. We present a group key recovery scheme that adds the self-healing property to SDR, i.e., our scheme enables a member that has missed up to a certain number m of previous rekey operations to recover the missing group keys without asking the key server for retransmission. The additional communication overhead imposed by our key recovery scheme is quite small (less than 3m additional keys).

ISE-TR-03-03

Precisely Answering Multi-dimensional Range Queries Without Privacy Breaches

Lingyu Wang and Yingjiu Li and Duminda Wijesekera and Sushil Jajodia

ISE-TR-03-03.pdf

This paper investigates the privacy breaches caused by multi-dimensional range (MDR) sum queries in OLAP systems. We show that existing inference control methods are generally ineffective or infeasible for MDR queries. We then consider restricting users to even MDR queries (that is, the MDR queries involving even number of data values). We show that the collection of such even MDR queries is safe if and only if a special set of sum-two queries (that is, queries involving exactly two values) is safe. On the basis of this result, we give an efficient method to decide the safety of even MDR queries. Besides safe even MDR queries we show that any odd MDR query is unsafe. Moreover, any such odd MDR query is different from the union of some even MDR queries by only one tuple. We also extend those results to the safe subsets of unsafe even MDR queries.

ISE-TR-03-04

FlexFlow: A Flexible Flow Control Policy Specification Framework

Shiping Chen and Duminda Wijesekera and Sushil Jajodia

ISE-TR-03-04.pdf

Flow control policies are important in data-flow, work-flow, transaction systems and software design. Previous work in this area concentrates either on modelling security aspects of information flow control or applying flow control policies in some specific application domain. These models permit either permissions or prohibitions for flows and normally are based on a specific meta-policy (usually the closed policy). We propose FlexFlow, a logic based flexible flow control framework to specify data-flow, work-flow and transaction systems policies that go beyond point-to-point flows. Both permissions and prohibitions are specifiable in FlexFlow and meta-policies such as permissions take precedence themselves can be specified over the meta-policy neutral policy specification environment of FlexFlow. We further show how to specify and prevent inter-flow conflicts such as those arising in role-based work-flow policies.

ISE-TR-03-05

OLAP Means On-Line Anti-Privacy

Lingyu Wang and Duminda Wijesekera and Sushil Jajodia

ISE-TR-03-05.pdf

In this paper we investigate the privacy breaches caused by multi-dimensional range sum queries in OLAP (Online Analytic Processing) systems. Our results show that the sensitive information stored in the underlying data warehouses can be easily compromised by malicious users with legitimate range queries. This compromise is possible even when users are restricted to some special class of range queries. We present algorithms that compromise individual tuples with only range queries. We study the conditions under which the compromises are possible. We also analyze the number of queries required for each compromise, as well as the complexity and completeness of those algorithms. Our study reveals the seriousness of the privacy issue in OLAP systems and provide better understanding of the problem for further research on the control methods.

ISE-TR-03-06

Fusionplex: Resolution of Data Inconsistencies in the Integration of Heterogeneous Information Sources

Philipp Anokhin and Amihai Motro

ISE-TR-03-06.pdf

Fusionplex is a system for integrating multiple heterogeneous and autonomous information sources, that uses data fusion to resolve factual inconsistencies among the individual sources. To accomplish this, the system relies on source features, which are meta-data on the quality of each information source; for example, the recentness of the data, its accuracy, its availability, or its cost. The fusion process is controlled with several parameters: (1) With a vector of feature weights, each user defines an individual notion of data utility; (2) with thresholds of acceptance, users ensure minimal performance of their data, excluding from the fusion process data that are too old, too costly, or lacking in authority, or data that are too high, too low, or obvious outliers; and, ultimately, (3) in naming a particular fusion function to be used for each attribute (for example, average, maximum, or simply any) users implement their own interpretation of fusion. Several simple extensions to SQL are all that is needed to allow users to state these resolution parameters, thus ensuring that the system is easy to use. Altogether, Fusionplex provides its users with powerful and flexible, yet simple, control over the fusion process. In addition, Fusionplex supports other critical integration requirements, such as information source heterogeneity, dynamic evolution of the information environment, quick ad-hoc integration, and intermittent source availability. The methods described in this paper were implemented in a prototype system that provides complete Web-based integration services for remote clients.

ISE-TR-03-07

Utility-based Resolution of Data Inconsistencies

Amihai Motro, Philipp Anokhin and Aybar C. Acar

ISE-TR-03-07.pdf

A virtual database system is software that provides unified access to multiple information sources. If the sources are overlapping in their contents and independently maintained, then the likelihood of inconsistent answers is high. Solutions are often based on ranking (which sorts the different answers according to recurrence) and on fusion (which synthesizes a new value from the different alternatives according to a specific formula). In this paper we argue that both methods are flawed, and we offer alternative solutions that are based on knowledge about the performance of the source data; including features such as recentness, availability, accuracy and cost. These features are combined in a flexible utility function that expresses the overall value of a data item to the user. Utility allows us to (1) define meaningful ranking on the inconsistent set of answers, and offer the top-ranked answer as a preferred answer; (2) determine whether a fusion value is indeed better than the initial values, by calculating its utility and comparing it to the utilities of the initial values; and (3) discover the best fusion: the fusion formula that optimizes the utility. The advantages of such performance-based and utility-driven ranking and fusion are considerable.

ISE-TR-03-08

Providing Voice Privacy as a Service over the Public Telephone Network

Mohamad Sharif and Duminda Wijesekera

ISE-TR-03-08.pdf

Existing design of the public telephone network is susceptible to eavesdropping on its voice stream. Consequently, any wire taper can listen to supposedly private conversations. In order to prevent such eavesdropping, we propose an architecture and its implementation for end-to-end voice encryption as a service available for interested subscribers. Our proposal consists of appropriately placing servers to authenticate telephone equipment and subscribers of the service, and certificate authorities that can cross-certify over telecommunication service providers. We show how these entities and necessary signaling mechanisms between them can be implemented using the transaction capabilities application layer (TCAP) of the signal system seven (SS7) protocol and the D channel of the digital subscriber line (DSL) connecting telephone equipment to the SS7 grid. Using published specifications, we show that the duration to initiate an encrypted telephone call takes about 19 seconds including user authentication under average load conditions. That makes our design and protocols comparable to existing intelligent services provided over public telephones and only four times larger that initiating a normal telephone call.

ISE-TR-03-09

Why Is this User Asking so Many Questions? Explaining Sequences of Queries

Aybar C. Acar and Amihai Motro

ISE-TR-03-09.pdf

A sequence of queries submitted by a database user within a short period of time may have a single, illuminating explanation. In this paper we consider sequences of single-record queries, and attempt to guess what information their authors may be trying to accumulate. Query sequences may reflect clandestine intentions, where users attempt to avoid direct queries which may disclose their true interests, preferring instead to obtain the same information by means of sequences of smaller, less conspicuous, queries. Sequences of queries may also reflect attempts to circumvent retrieval restrictions, where users attempt to approximate information which is inaccessible, with sequences of legitimate requests (in the latter case, our explanations may lead database owners to either tighten access, or, conversely, to reorganize their interfaces to facilitate access). Because the true objective of a sequence may be clouded by the retrieval of spurious records, our approach considers all the possible aggregates that a user may accumulate with a sequence, and to rank them, search-engine style, according to their plausibility as retrieval objectives. Our method is probabilistic in nature and postulates that the likelihood that a set of records is the true objective of the user is inverse proportional to the likelihood that this set results from random selection. Our method is shown to have good performance even in the presence of noise (spurious records) as high as 40-50%.

ISE-TR-03-10

UML Specification of Access Control Policies and their Formal Verification

Manuel Koch and Francesco Parisi-Presicce

ISE-TR-03-10.pdf

Security requirements have become an integral part of most modern software systems. In order to produce secure systems, it is necessary to provide software engineers with the appropriate systematic support. We propose a methodology to integrate the specification of access control policies into UML and provide a graph-based formal semantics for the UML access control specification which permits to reason about the coherence of the access control specification. The main concepts in the UML access control specification are illustrated with an example access control model for distributed object systems.

ISE-TR-03-11

Specifying Coherent Refactoring of Software Artefacts with Distributed Graph Transformations

Paolo Bottoni and Francesco Parisi-Presicce and Gabriele Taentzer

ISE-TR-03-11.pdf

Refactoring is an important source of software transformation, which changes the internal structure of a software system, while preserving its behavior. Even though the input/output view of a system's behavior does not change, refactoring can have several consequences for the computing process, as expressed for instance by the sequence of method calls or by state changes of an object or an activity. Such modifications must be reflected in the system model, generally expressed through UML diagrams. We propose a formal approach, based on distributed graph transformation, to the coordinated evolution of code and model, as effect of refactorings. The approach can be integrated into existing refactoring tools. Due to its formal base, it makes it possible to reason about the behavior preservation of each specified refactoring.

2002

ISE-TR-02-01

Provisions and Obligations in Policy Management and Security Applications

Claudio Bettini and Sushil Jajodia and X. Sean Wang and Duminda Wijesekera

ISE-TR-02-01.pdf

Policies are widely used in many different systems and applications. Recently, it has been recognized that a “yes/no” response to every scenario is just not enough for many modern systems and applications. Many policies require certain conditions to be satisfied and actions to be performed before or after a decision is made in accordance with the policy. To address this need, this paper introduces the notions of provisions and obligations. Provisions are those conditions that need to be satisfied or actions that must be performed before a decision is rendered, while obligations are those conditions or actions that must be fulfilled by either the users or the system after the decision. This paper formalizes a rule-based policy framework that includes provisions and obligations, and investigates a reasoning mechanism within this framework. A policy decision may be supported by more than one derivation, each associated with a potentially different set of provisions and obligations (called a global PO set). The reasoning mechanism can derive all the global PO sets for each specific policy decision, and facilitates the selection of the best one based on numerical weights assigned to provisions and obligations as well as on semantic relationships among them. The paper also shows the use of the proposed policy framework in a security application and discusses through an example various aspects of how the system may compensate unfulfilled obligations.

ISE-TR-02-02

A Scalable and Reliable Key Distribution Protocol for Multicast Group Rekeying

Sanjeev Setia and Sencun Zhu and Sushil Jajodia

ISE-TR-02-02.pdf

Scalable group rekeying is one of the important problems that needs to be addressed in order to support secure communications for large and dynamic groups. One of the challenging issues that arises in scalable group rekeying is the problem of delivering the updated keys to the members of the group in a reliable and timely manner. In this paper, we present a new scalable and reliable key distribution protocol for group key management schemes that use logical key hierarchies for scalable group rekeying. Our protocol, called WKA-BKR, is based upon two key ideas, weighted key assignment and batched key retransmission, both of which exploit the special properties of logical key hierarchies to reduce the overhead and increase the reliability of the key delivery protocol. We have evaluated the performance of our approach using detailed simulations. Our results show that for most network loss scenarios, the bandwidth used by our protocol is lower than that of previously proposed key delivery protocols.

ISE-TR-02-03

TupleRank: Ranking Discovered Content in Virtual Databases

Jacob Berlin and Amihai Motro

ISE-TR-02-03.pdf

Recently, the problem of data integration has been newly addressed by methods based on machine learning and discovery. Such methods are intended to automate, at least in part, the laborious process of information integration, by which existing data sources are incorporated in a virtual database. Essentially, these methods scan new data sources, attempting to discover possible mappings to the virtual database. Like all discovery processes, this process is intrinsically probabilistic; that is, each discovery is associated with a specific value that denotes assurance of its appropriateness. Consequently, the rows in a discovered virtual table have mixed assurance levels, with some rows being more credible than others. In this paper we argue that rows in discovered virtual databases should be ranked, and we describe a ranking method, called TupleRank, for calculating such a ranking order. Roughly speaking, TupleRank calibrates the probabilities calculated during a discovery process with historical information about the performance of the system. The work is done in the framework of the Autoplex system for discovering content for virtual databases, and initial experimentation is reported and discussed.

ISE-TR-02-04

Cardinality-based Inference Control in Sum-only Data Cubes (Extended Version)

Lingyu Wang and Duminda Wijesekera and Sushil Jajodia

ISE-TR-02-04.pdf

This paper deals with the inference problems in data warehouses and decision support systems such as on-line analytical processing (OLAP)systems. Even though OLAP systems restrict user accesses to predefined aggregations, the possibility of inappropriate disclosure of sensitive attribute values still exists. Based on a definition of non-compromiseability to mean that any member of a set of variables satisfying a given set of their aggregates can have more than one value, we derive sufficient conditions for non-compromiseability in sum-only data cubes. Specifically, (1) the non-compromiseability of multi-dimensional aggregates can be reduced to that of one dimensional aggregates, (2) full or dense core cuboids are non-compromiseable, and (3) there is a tight lower bound for the cardinality of a core cuboid to remain non-compromiseable. Based on those conditions, and a three-tiered model for controlling inferences, we provide a divide-and-conquer algorithm that uniformly divides data sets into chunks and builds a data cube on each such chunk. The union of those data cubes are then used to provide users with inference-free OLAP queries.

ISE-TR-02-05

A Comparative Performance Analysis of Reliable Group Rekey Transport Protocols for Secure Multicast

Sanjeev Setia Sencun Zhu and Sushil Jajodia

ISE-TR-02-05.pdf

In this paper, we present a new scalable and reliable key distribution protocol for group key management schemes that use logical key hierarchies for scalable group rekeying. Our protocol, called WKA-BKR, is based upon two ideas --- weighted key assignment and batched key retransmission --- both of which exploit the special properties of logical key hierarchies and the group rekey transport payload to reduce the bandwidth overhead of the reliable key delivery protocol. Using both analytic modeling and simulation, we compare the performance of WKA-BKR with that of other rekey transport protocols, including a recently proposed protocol based on proactive FEC. Our results show that for most network loss scenarios, the bandwidth used by WKA-BKR is lower than that of the other protocols.

ISE-TR-02-06

Performance Optimizations for Group Key Management Schemes for Secure Multicast

Sencun Zhu Sanjeev Setia and Sushil Jajodia

ISE-TR-02-06.pdf

Scalable group rekeying is one of the biggest challenges that need to be addressed to support secure communications for large and dynamic groups. In recent years, many group key management approaches based on the use of logical key trees have been proposed to address this issue. Using logical key trees reduces the complexity of group rekeying operation from O(N) to O(log N), where N is the group size. In this paper, we present two optimizations for logical key tree organizations that utilize information about the characteristics of group members to further reduce the overhead of group rekeying. First, we propose a partitioned key tree organization that exploits the temporal patterns of group member joins and departures to reduce the overhead of rekeying. Using an analytic model, we show that our optimization can achieve up to 31.4% reduction in key server bandwidth overhead over the un-optimized scheme. Second, we propose an approach under which the key tree is organized based on the loss probabilities of group members. Our analysis shows this optimization can reduce the rekeying overhead by up to 12.1%.

ISE-TR-02-07

Maintaining Evolving Component-Based Software with UML

Ye Wu and Jeff Offutt

ISE-TR-02-07.pdf

Component-based software engineering has been increasingly adopted for software development. This approach relies on using reusable components as the building blocks for constructing software. On the one hand, this helps improve software quality and productivity; on the other hand, it necessitates frequent maintenance activities, such as upgrading third party components or adding new features. The cost of maintenance for conventional software can account for as much as two-thirds of the total cost, and it is likely to be more for component-based software. Thus, an effective maintenance technique for component-based software is strongly desired. This paper presents a UML-based technique that attempts to help resolve difficulties introduced by the implementation transparent characteristics of component-based software systems. This technique can also be useful for other maintenance activities. For corrective maintenance activities, the technique starts with UML diagrams that represent changes to a component, and uses them to support regression testing. To accommodate this approach for perfective maintenance activities, more challenges are encountered. We provide a UML-based framework to evaluate the similarities of the old and new components, and corresponding retesting strategies are provided.

ISE-TR-02-08

Modeling and Testing Web-based Applications

Ye Wu and Jeff Offutt

ISE-TR-02-08.pdf

The Internet is quietly becoming the body of the business world, with web applications as the brains. This means that software faults in web applications have potentially disastrous consequences. Most work on web applications has been on making them more powerful, but relatively little has been done to ensure their quality. Important quality attributes for web applications include reliability, availability, interoperability and security. Web applications share some characteristics of client-server, distributed, and traditional programs, however there are a number of novel aspects of web applications. These include the fact that web applications are “dynamic”, due to factors such as the frequent changes of the application requirement as well as dramatic changes of the web technologies, the fact that the roles of the clients and servers change dynamically, the heterogeneity of the hardware and software components, the extremely loose coupling and dynamic integration, and the ability of the user to directly affect the control of execution. In this paper, we first analyze the distinct features of web-based applications, and then define a generic analysis model that characterizes the typical behaviors of web-based applications independently of different technologies. Based on this analysis, a family of testing techniques is discussed to ensure different level of quality control of web applications under various situations.

ISE-TR-02-09

LHAP: A Lightweight Hop-by-Hop Authentication Protocol For Ad-Hoc Networks

Sencun Zhu and Shouhuai Xu and Sanjeev Setia and Sushil Jajodia

ISE-TR-02-09.pdf

Most ad hoc networks do not implement any network access control, leaving these networks vulnerable to resource consumption attacks where a malicious node injects packets into the network with the goal of depleting the resources of the nodes relaying the packets. To thwart or prevent such attacks, it is necessary to employ authentication mechanisms that ensure that only authorized nodes can inject traffic into the network. In this paper, we present LHAP, a scalable and light-weight authentication protocol for ad hoc networks. LHAP is based on two techniques: (i) hop-by-hop authentication for verifying the authenticity of all the packets transmitted in the network and (ii) one-way key chain and TESLA for packet authentication and for reducing the overhead for establishing trust among nodes. We analyze the security properties of LHAP. Furthermore, our performance analysis shows that LHAP is a very lightweight security protocol.

2001

ISE-TR-01-01

Large Constraint Joins and Disjoint Decompositions

Alexander Brodsky and Victor E. Segal

ISE-TR-01-01.pdf

This paper is devoted to the problem of evaluation of a conjunction of N disjunctions of linear constraints in a multidimensional space (constraint join), which is a common constraint database operation. Existing methods are exponential in N. Here combinatorial geometry methods are applied to establish conditions for a polynomial size of the join output. A polynomial method to compute a join for an arbitrary input is then given. As part of the solution, a new algorithm for convex decomposition of an arbitrary non-convex polyhedron is developed. The H-tree is proposed --- a new data structure combining constraint storage and indexing. H-tree and R-tree implementations of the algorithm are given, and analytical considerations regarding the performance are provided.

ISE-TR-01-02

VirtuE: Virtual Enterprises for Information Markets

Alessandro D'Atri and Amihai Motro

ISE-TR-01-02.pdf

An essential part of a modern economy is an information market. In this market, information products are being traded in countless ways. Information is bought, modified, integrated, incorporated into other products and then sold again. Usually, the production of an information product requires the collaboration of several participants. A virtual enterprise is a community of business entities that collaborate on the manufacturing of new products. This collaboration is often ad hoc, for a specific product only, after which the virtual enterprise may dismantle. The virtual enterprise paradigm is particularly appealing for modeling collaborations for manufacturing information products, and in this paper we present a new model, called VirtuE, for modeling such activities.

ISE-TR-01-03

Generating Test Data From Requirements/Specifications: Phase IV Final Report

Jeff Offutt

ISE-TR-01-03.pdf

This report presents results for the Rockwell Collins Inc. sponsored project on generating test data from requirements/specifications, which started January 1, 2000. The purpose of this project is to improve our ability to test software that needs to be highly reliable by developing formal techniques for generating test cases from formal specificational descriptions of the software. Formal specifications give software developers the opportunity to describe exactly what services the software should provide, providing information to software designers in a clear, unambiguous manner. Formal specifications give test engineers information about the expectations of the software in a form that can be automatically manipulated. This Phase IV, 2000 report presents progress on an empirical evaluation of the efficacy of the transition-pair criterion developed in previous years. Transition-pair tests have been developed for the Flight Guidance System (FGS) and they were run on the faulty versions of FGS developed last year. These tests and the results are summarized in this preliminary report. This report also presents progress in our test data generation tool. This tool has been significantly expanded to handle multiple SCR tables, recursively defined tables, event and condition tables, non-boolean variables, and multiple-variable expressions. It is integrated with the Naval Research Laboratory's SCRTool and Rational Software Corporation's Rational Rose tool.

ISE-TR-01-04

Maintainability of the Linux Kernel

Stephen R. Schach and Bo Jin and David R. Wright and Gillian Z. Heller and A. Jefferson Offutt

ISE-TR-01-04.pdf

We have examined 365 versions of Linux. For every version, we counted the number of instances of common (global) coupling between each of the 17 kernel modules and all the other modules in that version of Linux. We found that the number of instances of common coupling grows exponentially with version number. This result is significant at the 99.99% level, and no additional variables are needed to explain this increase. On the other hand, the number of lines of code in each kernel modules grows only linearly with version number. We conclude that, unless Linux is restructured with a bare minimum of common coupling, the dependencies induced by common coupling will, at some future date, make Linux exceedingly hard to maintain without inducing regression faults.

ISE-TR-01-05

Covering Bitmap with Trapezoids is Hard

Changzhou Wang and X. Sean Wang

ISE-TR-01-05.pdf

Each sub-series of a time series can be mapped to a point on a plane whose two coordinates represent the starting time and the ending time, respectively. In certain applications, points with similar properties are likely to be clustered in a special trapezoid shape. To save these points in a compact way, the following optimization problem needs to be studied: find the minimum number of trapezoids which cover a given set of points. This paper formalizes the optimization problem as a decision problem, namely, Bitmap Cover with special Trapezoids (BCT), and proves that BCT is NP-hard.

ISE-TR-01-06

Database Schema Matching Using Machine Learning with Feature Selection

Jacob Berlin and Amihai Motro

ISE-TR-01-06.pdf

Schema matching, the problem of finding mappings between the attributes of two semantically related database schemas, is an important aspect of many database applications such as schema integration, data warehousing, and electronic commerce. Unfortunately, schema matching remains largely a manual, labor-intensive process. Furthermore, the effort required is typically linear in the number of schemas to be matched; the next pair of schemas to match is not any easier than the previous pair. In this paper we describe a system, called Automatch, that uses machine learning techniques to automate schema matching. Based primarily on Bayesian learning, the system acquires probabilistic knowledge from examples that have been provided by domain experts. This knowledge is stored in a knowledge base called the attribute dictionary. When presented with a pair of new schemas that need to be matched (and their corresponding database instances), Automatch uses the attribute dictionary to find an optimal matching. We also report initial results from the Automatch project.

2000

ISE-TR-00-01

Suitability of the UML as an Architecture Description Language with Applications to Testing

Aynur Abdurazik

ISE-TR-00-01.pdf

Increasingly, very high level designs of large software systems are being described by software architectures. A software architecture expresses the overall structure of the system in an abstract, structured way. The Unified Modeling Language (UML) is widely used to express mid- and low-level designs of software, and recent proposals have been made to adapt the UML for use as an architecture design language (ADL). This research is looking into problems associated with creating system-level software tests that are based on architecture descriptions. This paper discusses issues with using the UML as an ADL, and general problems with then using the architecture descriptions as a basis for generating tests.

ISE-TR-00-02

Generating Test Data From Requirements/Specifications: Phase III Final Report

Jeff Offutt

ISE-TR-00-02.pdf

This report presents results for the Rockwell Collins Inc. sponsored project on generating test data from requirements/specifications, which started January 1, 1999. The purpose of this project is to improve our ability to test software that needs to be highly reliable by developing formal techniques for generating test cases from formal specificational descriptions of the software. Formal specifications represent a significant opportunity for testing because they precisely describe what functions the software is supposed to provide in a form that can be easily manipulated by automated means. This Phase III, 1999 report presents results from an empirical evaluation of the full predicate specification-based testing criterion developed during the first two phases of this project, and a proof-of-concept test data generation tool. The evaluation used a comparative study on a large industrial system, a research version of the Flight Guidance Mode Logic System (FGS) provided by Rockwell Collins. Full predicate tests were generated for FGS, and compared against the T-Vec generation scheme. T-Vec tests for FGS were also provided by Rockwell Collins. While creating and running the tests, one problem was found in the SCR specifications for FGS, and one problem was found in the already well tested implementation of FGS. Both T-Vec and the full predicate tests found almost the same number of faults, but T-Vec required more than five times as many tests, thus the full predicate tests were more efficient. The proof-of-concept test data generator creates full predicate and transition-pair tests from an SCR specification. Currently, the tool requires the SCR specification to be a single mode transition table, all variables must be boolean, and the transition predicates must be single-variable expressions.

ISE-TR-00-03

Autoplex: Automated Discovery of Contents for Virtual Databases

Jacob Berlin and Amihai Motro

ISE-TR-00-01.pdf

Most virtual database systems are suitable for environments in which the set of member information sources is small and stable. Consequently, present virtual database systems do not scale up very well. The main reason is the complexity and cost of incorporating new information sources into the virtual database. In this paper we describe a system, called Autoplex, which uses machine learning techniques for automating the discovery of new content for virtual database systems. Autoplex assumes that several information sources have already been incorporated (“mapped”) into the virtual database system by human experts (as done in standard virtual database systems). Autoplex learns the features of these examples. It then applies this knowledge to new candidate sources, trying to infer views that “resemble” the examples. In this paper we report initial results from the Autoplex project.

1999

ISE-TR-99-01

Generating Test Data From Requirements/Specifications: Phase II Final Report

Jeff Offutt

ISE-TR-99-01.pdf

This report presents results for the Rockwell Collins Inc. sponsored project on generating test data from requirements/specifications, which started January 1, 1998. The purpose of this project is to improve our ability to test software that needs to be highly reliable by developing formal techniques for generating test cases from formal specificational descriptions of the software. Formal specifications represent a significant opportunity for testing because they precisely describe what functions the software is supposed to provide in a form that can be easily manipulated by automated means. This Phase II, 1998 report presents results and strategies for practically applying test cases generated according to the criteria presented in the Phase I, 1997 report. This report presents a small empirical evaluation of the test criteria, and algorithms for solving various problems that arise when applying test cases developed from requirements/specifications. One significant problem in specification-based test data generation is that of reaching the proper program state necessary to execute a particular test case. Given a test case that must start in a particular state S, the test case prefix is a sequence of inputs that will put the software into state S. We have addressed this problem in two ways. First is to combine various test cases to be run in test sequences that are ordered in such a way that each test case leaves the software in the state necessary to run the subsequent test case. An algorithm is presented that attempts to find test case sequences that are optimal in the sense that the fewest possible number of test cases are used. To handle situations where it is desired to run each test case independently, an algorithm for directly deriving test sequences is presented. This report also presents procedures for removing redundant test case values, and develops the idea of “sequence-pair” testing, which was presented in the 1997 Phase I report, into a more general idea of “interaction-pair” testing.

ISE-TR-99-02

Using Approximations to Scale Exploratory Data Analysis in Datacubes

Daniel Barbará and Xintao Wu

ISE-TR-99-02.pdf

Exploratory Data Analysis is a widely used technique to determine which factors have the most influence on data values in a multi-way table, or which cells in the table can be considered anomalous with respect to the other cells. In particular, median polish is a simple, yet robust method to perform Exploratory Data Analysis. Median polish is resistant to holes in the table (cells that have no values), but it may require a lot of iterations through the data. This factor makes it difficult to apply to large multidimensional tables, since the I/O requirements may be prohibitive. This paper describes a technique that uses median polish over an approximation of a datacube, easing the burden of I/O. The results obtained are tested for quality, using a variety of measures. The technique scales to large datacubes and proves to give a good approximation of the results that would have been obtained by median polish in the original data.

ISE-TR-99-03

Chaotic Mining: Knowledge Discovery Using the Fractal Dimension

Daniel Barbará

ISE-TR-99-03.pdf

A lot of real datasets behave in a fractal fashion, i.e., they show an invariance with respect to the scale used to look at them. Fractal sets are characterized by a family of fractal dimensions, each with a particular interpretation. In this paper we show examples of how the fractal dimension(s) can be used to extract knowledge from datasets. Techniques that use the fractal dimension to detect anomalies in time series, to characterize time patterns of association rules, discover patterns in datacubes and cluster multidimensional datasets are described as part of an on-going research effort.

ISE-TR-99-04

Recovering Watermarks from Images

Zoran Duric and Neil F. Johnson and Sushil Jajodia

ISE-TR-99-04.pdf

Many techniques for watermarking of digital images have appeared recently. Most of these techniques are sensitive to cropping and/or affine distortions (e.g., rotations and scaling). In this paper we describe a method for recognizing images based on the concept of identification mark; the method does not require the use of the original image, it requires only a small number of salient image points. We show that, using our method, it is possible to recognize distorted images and recover their “original” appearance. Once the image is recognized we use a second technique based on the normal flow to fine-tune image parameters. The restored image can be used to recover the watermark that had been embedded in the image by its owner.

ISE-TR-99-05

Testing the Polymorphic Relationships of Object-Oriented Components

Roger T. Alexander

ISE-TR-99-05.pdf

As we move from developing procedure-oriented to object-oriented programs, the complexity traditionally found in functions and procedures is moving to the connections among components. More faults occur as components are integrated to form higher level aggregates of behavior and state. Consequently, we need to place more effort on testing the connections among components. Although object-oriented technology provides abstraction mechanisms to build components to integrate, it also adds new compositional relations that can contain faults, which must be found during integration testing. This paper describes new techniques for analyzing and testing the polymorphic relationships that occur in object-oriented software. The application of these techniques can result in an increased ability to find faults and overall higher quality software.

ISE-TR-99-06

Change Impact Analysis of Object-oriented Software

Michelle L. Lee

ISE-TR-99-06.pdf

As the software industry has matured, we have shifted our resources from being devoted todeveloping new software systems to making modifications in evolving software systems. A major problem for developers in an evolutionary environment is that seemingly small changes can ripple throughout the system to cause major unintended impacts elsewhere. As such, software developers need mechanisms to understand how a change to a software system will impact the rest of the system. Although the effects of changes in object-oriented software can be restricted, they are also more subtle and more difficult to detect. Maintaining the current object-oriented systems is more of an art, similar to where we were 15 years ago with procedural systems, than an engineering skill. We are beginning to see “legacy” object-oriented systems in industry. A difficult problem is how to maintain these objects in large, complex systems. Although objects are more easily identified and packaged, features such as encapsulation, inheritance, aggregation, polymorphism and dynamic binding can make the ripple effects of object-oriented systems far more difficult to control than in procedural systems. The research presented here addresses the problems of change impact analysis for object-oriented software. Major results of this research include a set of object-oriented data dependency graphs, a set of algorithms that allow software developers to evaluate proposed changes on object-oriented software, a set of object-oriented change impact metrics to evaluate the change impact quantitatively, and a prototype tool, ChaT, to evaluate the algorithms. This research also results in efficient regression testing by helping testers decide what classes and methods need to be retested, and in supporting cost estimation and schedule planning.

ISE-TR-99-07

Resolving Inconsistencies in the Multiplex Multidatabase System

Philipp Anokhin and Amihai Motro

ISE-TR-99-07.pdf

With the development of the Internet and the on-line availability of large numbers of information sources, the problem of integrating multiple heterogeneous information sources requires reexamination, its basic underlying assumptions having changed drastically. Integration methodologies must now contend with situations in which the number of potential data sources is very large, and the set of sources changes continuously. In addition, the ability to create quick, ad-hoc virtual databases for short-lived applications is now considered attractive. Under these new assumptions, a single, complete answer can no longer be guaranteed. It is now possible that a query could not be answered in its entirety, or it might result in several different answers. Multiplex is a multidatabase system designed to operate under these new assumptions. In this paper we describe how Multiplex handles queries that do not have a single, complete answer. The general approach is to define flexible and comprehensive strategies that direct the behavior of the query processing subsystem. These strategies may be defined either as part of the multidatabase design or as part of ad-hoc queries.

ISE-TR-99-08

Using the Fractal Dimension to Cluster Datasets

Daniel Barbará and Ping Chen

ISE-TR-99-08.pdf

Clustering is a widely used knowledge discovery technique. It helps uncovering structures in data that were not previously known. Clustering of large datasets has received a lot of attention in recent years. However, clustering is a still a challenging task since many published algorithms fail to do well in scaling with the size of the dataset and the number of dimensions that describe the points, or in finding arbitrary shapes of clusters, or dealing effectively with the presence of noise. In this paper, we present a new clustering algorithm, based in the fractal properties of the datasets. The new algorithm, which we call Fractal Clustering (FC) places points incrementally in the cluster for which the change in the fractal dimension after adding the point is the least. This is a very natural way of clustering points, since points in the same cluster have a great degree of self-similarity among them (and much less self-similarity with respect to points in other clusters). FC requires one scan of the data, is suspendable at will, providing the best answer possible at that point, and is incremental. We show via experiments that FC effectively deals with large datasets, high-dimensionality and noise and is capable of recognizing clusters of arbitrary shape.

ISE-TR-99-09

Generating Test Cases from UML Specifications

Aynur Abdurazik and Jeff Offutt

ISE-TR-99-09.pdf

Unified Modeling Language (UML) is a third generation modeling language in object-oriented software engineering. It provides constructs to specify, construct, visualize, and document artifacts of software-intensive systems. This paper presents a technique that uses Offutt's state-based specification test data generation model to generate test cases from UML statecharts. A tool TCGen has been developed to demonstrate this technique with specifications written in Software Cost Reduction (SCR) method and Unified Modeling Language (UML). Experimental results from using this tool are presented.

1998

ISSE-TR-98-02

Input Validation Testing: A Requirements-Driven, System Level, Early Lifecycle Technique

Jane Hayes and Jeff Offutt

ISSE-TR-98-02.pdf

Syntax-directed software accepts inputs from the user, constructed and arranged properly, that control the execution of the application. Input validation testing chooses test data that attempt to show the presence or absence of specific faults pertaining to input tolerance. A large amount of syntax-directed software currently exists and will continue to be developed that should be subjected to input validation testing. System level testing techniques that currently address this area are not well developed or formalized. There is a lack of system level testing formal research and accordingly a lack of formal, standard criteria, general purpose techniques, and tools. Systems are large (size and domain) so unit testing techniques have had limited applicability. Input validation testing techniques have not been developed or automated to assist in static input syntax evaluation and test case generation. This paper addresses the problem of statically analyzing input command syntax as defined in interface and requirements specifications and then generating test cases for input validation testing. The IVT (Input Validation Testing) technique has been developed, a proof-of-concept tool (MICASA) has been implemented, and validation has been performed. Empirical validation on actual industrial software (for the Tomahawk Cruise Missile) shows that as compared with senior, experienced testers, MICASA found more requirement specification defects, generated test cases with higher syntactic coverage, and found additional defects. Additionally, the tool performed at significantly less cost.

ISSE-TR-98-03

Quasi-Cubes: A Space-Efficient Way to Support Approximate Multidimensional Databases

Daniel Barbará and Mark Sullivan

ISSE-TR-98-03.pdf

A data cube is a popular organization for summary data. A cube is simply a multidimensional structure that contains at each point an aggregate value, i.e., the result of applying an aggregate function to an underlying relation. In practical situations, cubes can require a large amount of storage. The typical approach to reducing storage cost is to materialize parts of the cube on demand. Unfortunately, this lazy evaluation can be a time-consuming operation. In this paper, we propose an approximation technique that reduces the storage cost of the cube without incurring the run time cost of lazy evaluation. The idea is to characterize regions of the cube by using statistical models whose description take less space than the data itself. Then, the model parameters can be used to estimate the cube cells with a certain level of accuracy. To increase the accuracy, some of the “outliers,” i.e., cells that incur in the largest errors when estimated can be retained. The storage taken by the model parameters and the retained cells, of course, should take a fraction of the space of the full cube and the estimation procedure should be faster than computing the data from the underlying relations. We study three methods for modeling cube regions: linear regression, singular value decomposition and the independence approach. We study the tradeoff between the amount of storage used for the description and the accuracy of the estimation. Experiments show that the errors introduced by the estimation are small even when the description is substantially smaller than the full cube. Since cubes are used to support data analysis and analysts are rarely interested in the precise values of the aggregates (but rather in trends), providing approximate answers is, in most cases, a satisfactory compromise. Our technique allows systems to handle very large cubes that cannot either be fully stored or efficiently materialized on demand. Even for applications that can only accept tiny error in the cube data, our techniques can be used to present successive on-line refinements of the answer to a query so that a coarse picture of the answer can be presented while the remainder of the answer is fetched from disk (i.e., to support on-line aggregation).

1997

ISSE-TR-97-01

Redistributing Secret Shares to New Access Structures and Its Applications

Yvo Desmedt and Sushil Jajodia

ISSE-TR-97-01.pdf

Proactive secret sharing deals with refreshing secret shares, i.e., redistributing the shares of a secret to the original access structure. In this paper we focus on the general problem of redistributing shares of a secret key. Shares of a secret have been distributed such that access sets specified in the access structure (e.g., t-out-of-l) can access (or use) the secret. The problem is how to redistribute the secret, without recovering it, in such a way that those specified in the new access structure will be able to recover the secret. We also adapt our scheme such that it can be used in the context of threshold cryptography and discuss its applications to secure databases.

ISSE-TR-97-02

Generating Test Data from SOFL Specifications

A. Jefferson Offutt and Shaoying Liu

ISSE-TR-97-02.pdf

Software testing can only be formalized and quantified when a solid basis for test generation can be defined. Bases that are commonly used include the source code, control flow graphs, design representations, and specifications/requirements. Formal specifications represent a significant opportunity for testing because they precisely describe what functions the software is supposed to provide in a form that can be easily manipulated. In this paper, we present a new method for generating tests from formal specifications. This method is comprehensive in specification coverage, applies at several levels of abstraction, and can be highly automated. We apply our method to SOFL specifications, describe the technique, and demonstrate the application on a case study. A preliminary evaluation using a code-level coverage criterion (mutation testing), indicates that the method can result in very effective tests.

1996

ISSE-TR-96-01

Subsumption of Condition Coverage Techniques by Mutation Testing

A. Jefferson Offutt and Jeffrey M. Voas

ISSE-TR-96-01.pdf

Condition coverage testing is a family of testing techniques that are based on the logical flow of control through a program. The condition coverage techniques include a variety of requirements, including that each statement in the program is executed and that each branch is executed. Mutation testing is a fault-based testing technique that is widely considered to be very powerful, and that imposes requirements on testing that include, and go beyond, many other techniques. In this paper, we consider the six common condition coverage techniques, and formally show that these techniques are subsumed by mutation testing, in the sense that if mutation testing is satisfied, then the condition coverage techniques are also satisfied. The fact that condition coverage techniques are subsumed by mutation has immediate practical significance because the extensive research that has already been done for mutation can be used to support condition coverage techniques, including automated tools for performing mutation testing and generating test cases.

ISSE-TR-96-02

Algorithms for Change Impact Analysis of Object-oriented Software

Li Li and A. Jefferson Offutt

ISSE-TR-96-02.pdf

As the software industry has matured, we have shifted from almost exclusively devoting our resources to the development of new software systems to devoting large portions of our resources to making modifications to evolving software systems. A major problem for developers in an evolutionary environment is that seemingly small changes in one place can ripple throughout the system to have major unintended impacts elsewhere, thus, software developers need mechanisms to understand what impact a change to a software system will have on the rest of the system. With object-oriented software, the impacts of changes are restricted (largely because of encapsulation), but are also more subtle and harder to determine. This paper presents algorithms to analyze potential impacts of changes to object-oriented software, taking into account encapsulation, inheritance, and polymorphism. This technique allows software developers to perform “what if” analysis on the impacts of proposed changes, and thereby choose the change that has the least impact on the rest of the system. The analysis also offers valuable information to regression testing, by suggesting what classes and methods need to be retested, and to project managers, who can use the results for cost estimation and schedule planning.

ISSE-TR-96-03

A Prototype Domain Modeling Environment for Reusable Software Architectures

H. Gomaa and L. Kerschberg and V. Sugumaran and C. Bosch and I. Tavakoli

Contact authors for abstract and paper

ISSE-TR-96-04

Domain Modeling of the Spiral Process Model

Hassan Gomaa and Larry Kerschberg and Ghulam Ahmad Farrukh and Rajan Girish Mohan

Contact authors for abstract and paper

ISSE-TR-96-05

PROGEN: A Knowledge-based System for Process Model Generation, Tailoring and Reuse

Larry Kerschebrg and Hassan Gomaa and Rajan Girish Mohan and Ghulam Ahmad Farrukh

Contact authors for abstract and paper

ISSE-TR-96-06

Discovering Temporal Relationships with Multiple Granularities in Time Sequences

Claudio Bettini and X. Sean Wang and Sushil Jajodia and Jia-Ling Lin

ISSE-TR-96-06.pdf

An important usage of time sequences is to discover temporal patterns. The discovery process usually starts with a user-specified skeleton, called an “event structure”, which consists of a number of variables representing events and temporal constraints among these variables; and the goal of the discovery is to find temporal patterns, i.e., instantiations of the variables in the structure, which frequently appear in the time sequence. This paper introduces event structures that have temporal constraints with multiple granularities, defines the pattern discovery problem with these structures, and studies effective algorithms to solve it. The basic components of the algorithms include timed automata with granularities (TAGs) and a number of heuristics. The TAGs are for testing whether a specific temporal pattern, called a “candidate complex event type”, appears frequently in a time sequence. Since there are often a huge number of candidate event types for a usual event structure, heuristics are presented aiming at reducing the number of candidate event types and reducing the time spent by the TAGs to test whether a candidate type does appear frequently in the sequence. These heuristics exploit the information provided by explicit and implicit temporal constraints with granularity in the given event structure. The paper also gives the results of an experiment to show the effectiveness of the heuristics on a real data set.

ISSE-TR-96-07

The C3 Constraints Object-Oriented Database System

Alexander Brodsky and Victor E. Segal and Pavel A. Exarkhopoulo

ISSE-TR-96-02.pdf

Constraints provide a flexible and uniform way to conceptually represent diverse data capturing spatio-temporal behavior, complex modeling requirements, partial and incomplete information etc, and have been used in a wide variety of application domains. Constraint databases have recently emerged to deeply integrate data captured by constraints in databases. This paper reports on the development of the first constraint object-oriented database system, Ccube, and describes its specification, design and implementation. The Ccube system is designed to be used for both implementation and optimization of high-level constraint object-oriented query languages such as LyriC or constraint extensions of OQL, and for directly building software systems requiring extensible use of constraint database features. The Ccube data manipulation language, Constraint Comprehension Calculus, is an integration constraint calculus for extensible constraint domains within monoid comprehensions, which serve as an optimization-level language for object-oriented queries. The data model for constraint calculus is based on constraint spatio-temporal (CST) objects that may hold spatial, temporal or constraint data, conceptually represented by constraints. New CST objects are constructed, manipulated and queried by means of constraint calculus. The model for monoid comprehensions, in turn, is based on the notion of monoids, which is a generalization of collection and aggregation types to structures over which one can iterate and apply merge operator; this includes disjunctions and conjunctions of constraints. The focal point of our work is achieving the right balance between expressiveness, complexity and representation usefulness, without which the practical use of the system would not be possible. To that end, C3 constraint calculus guarantees polynomial time data complexity, and, furthermore, is tightly integrated with monoid comprehensions to allow deep global optimization.

ISSE-TR-96-08

Applying Logic-based Database to Impact Analysis of Object-oriented Software

Li Li and Jeff Offutt

Contact authors for paper

As the software industry has matured, we have shifted our resources, once devoted to developing new software systems, to making modifications in evolving software systems. A major problem for developers in an evolutionary environment is that seemingly small changes can ripple throughout the system to have major unintended impacts elsewhere. As a result, software developers need mechanisms to understand how a change to a software system will affect the rest of the system. Although the effects of changes in object-oriented are restricted, they are also more subtle and more difficult to detect. The technique presented in this paper allows software developers to perform “what if” analysis on the effect of proposed changes, and thereby choose the change that has the least influence on the rest of the system. The analysis also adds valuable information to regression testing, by suggesting what classes and methods need to be re-tested, and to project managers, who can use the results for cost estimation and schedule planning. This paper presents algorithms to analyze the potential impacts of changes to object-oriented software, taking into account encapsulation, inheritance, and polymorphism, and implements the algorithms using datalog, a deductive database model. Using the datalog makes the implementation of the algorithms relatively easier. It also expands the range of the impact analysis of the object-oriented system to any interesting questions that can be expressed by the datalog query.

ISSE-TR-96-09

Mutation Operators for Ada

A. Jefferson Offutt and Jeff Voas and Jeff Payne

ISSE-TR-96-09.pdf

Mutation analysis is a method for testing software. It provides a method for assessing the adequacy of test data. This report describes the mutation operators defined for the Ada programming language. The mutation operators are categorized using syntactic criteria, in a form suitable for an implementor of a mutation-based system, or a tester wishing to understand how mutation analysis can be used to test Ada programs. Each mutation operator is carefully defined, and when appropriate, implementation notes and suggestions are provided. We include operators for all syntactic elements of Ada, including exception handling, generics, and tasking. A summary table listing all operators for Ada, and compared with C and Fortran operators is also provided. The design described here is the result of deliberations among the authors in which all aspects of the Ada language and software development in Ada were considered. These operators can also be viewed as the culmination of previous mutation operator definitions for other languages. This report is intended to serve as a manual for the Ada mutation operators.

ISSE-TR-96-10

A General Framework for Time Granularity and Its Application to Temporal Reasoning

Claudio Bettini and X. Sean Wang and Sushil Jajodia

ISSE-TR-96-10.pdf

This paper presents a general framework to define time granularity systems. We identify the main dimensions along which different systems can be characterized, and investigate the formal relationships among granularities in these systems. The paper also introduces the notion of a network of temporal constraints with (multiple) granularities emphasizing the semantic and computational differences from constraint networks with a single granularity. Consistency of networks with multiple granularities is shown to be NP-hard in general and approximate solutions for this problem and for the “minimal network” problem are proposed.

1995

ISSE-TR-95-00

Integration Testing Based on Software Couplings

Zhenyi Jin and A. Jefferson Offutt

ISSE-TR-95-00.pdf

Integration testing is an important part of the testing process, but few integration testing techniques have been systematically studied or defined. This paper presents an integration testing technique based on couplings between software components. The coupling-based testing technique is described, and 12 coverage criteria are defined. The coupling-based technique was compared with the category-partition method. Results shows that the coupling-based technique detected more faults, and used fewer test cases than category-partition on a subject program. This modest result indicates that the coupling-based testing approach can benefit practitioners who are performing integration testing on software.

ISSE-TR-95-01

Multilevel Relational (MLR) Data Model

Fang Chen and Ravi S. Sandhu

ISSE-TR-95-01.pdf

Many multilevel data models have been proposed, and different models have different merits. In this paper, we unify many of these merits to build a new multilevel data model as a natural extension of the traditional relational data model. We describe our new model called the Multilevel Relational (MLR) data model for multilevel relations with element-level labeling. The MLR model builds upon prior work of numerous authors in this area, and integrates ideas from a number of sources. The new data-based semantics is given to the MLR data model which combines ideas from SeaView, belief-based semantics and LDV model, and has the advantages of both eliminating ambiguity and retaining upward information flow. The model is simple, unambiguous and powerful. It has five integrity properties and five operation statements for manipulating multilevel relations. In order to support this integration, several new concepts are introduced and many old ones are redefined. A central contribution of this paper is proofs of soundness, completeness and security of the MLR data model. These proofs respectively show that any of the provided operation statements can keep database state legal (i.e., satisfying all integrity properties), every legal database state can be constructed, and the MLR data model is non-interfering. This is the first time that such properties have been formally proved for a multilevel relational data model. The expressive power of the MLR model is also discussed in this paper, and is compared with several other models.

ISSE-TR-95-02

The Effectiveness of Category-Partition Testing of Object-Oriented Software

Alisa Irvine and A. Jefferson Offutt

ISSE-TR-95-02.pdf

When migrating from conventional to object-oriented programming, developers face difficult decisions in modifying their development process to best use the new technology. In particular, ensuring that the software is highly reliable in this new environment poses different challenges. Developers need understanding of effective ways to test the software. This paper presents empirical data that show that the existing technique of category-partition testing can effectively find faults in object-oriented software, and new techniques are not necessarily needed. For this study, we identified types of faults that are common to C++ software and inserted faults of these types into two C++ programs. Test cases generated using the category-partition method were used to test the programs. A fault was considered detected if it caused the program to terminate abnormally or if the output was different from the output of the original program. The results show that the combination of the category-partition method and a tool for detecting memory management faults may be effective for testing C++ programs in general. Since there is no evidence that traditional techniques are not effective, software developers may not need new testing methods when migrating to object-oriented development.

ISSE-TR-95-03

Multiplex: A Formal Model for Multidatabases and Its Implementation

Amihai Motro

ISSE-TR-95-03.pdf

The integration of information from multiple databases has been an enduring subject of research for almost 20 years, and many different solutions have been attempted or proposed. Missing from this research has been a uniform framework. Usually, each solution develops its own ad-hoc framework, designed to address the particular aspects of the problem that are being attacked and the particular methodology that is being used. To address this situation, in this paper we define a formal model for multidatabases, which we call Multiplex. Mulitplex is a simple extension of the relational model, which may serve as a uniform abstraction for many previous ad-hoc solutions. Multiplex is based on formal assumptions of integrability, which distinguish between scheme and instance reconcilability among independent databases. Multiplex supports database heterogeneity, and it provides several degrees of freedom that allow it to model actual situations encountered in multidatabase applications. In addition, in situations in which a single answer is not obtainable (either because the global query is not answerable, or there are multiple candidate answers), Multiplex defines approximative answers. Finally, Multiplex provides a practical platform for implementation. A prototype of such an implementation is described briefly.

ISSE-TR-95-04

The Problem of Optimal Approximations of Queries Using Views and Its Applications

Alexander Brodsky and Amihai Motro

ISSE-TR-95-04.pdf

In several recent works a common scenario is described: a relational database scheme R and a set of views V of this scheme are defined, and queries on the database scheme R must be transformed to equivalent queries on the views V. Given a query Q on R, the first problem is whether there exists a query Q_V expressed exclusively on V which is equivalent to Q, and, if it exists, how to find it. Clearly, an equivalent Q_V may not always exist. In such a case, it is important to approximate Q as best as possible with a query on V. The problems here are to define approximations, to determine whether a best approximation exists, whether it is unique, and how to find it. In this paper we formalize and answer these questions. Specifically, we show that the following problems are decidable for a conjunctive query Q and a set of conjunctive views V: (1) is there a conjunctive query Q_V on V that is equivalent to Q, and (2) is there a union Q_U of conjunctive queries on V that is equivalent to Q? Furthermore, we show that Q_V and Q_U are effectively computable if they exist. Moreover, the greatest lower bound of Q exists and is unique up to equivalence. This is done by introducing lower bound classifications, and proving their existence, finiteness and effective computability. This problem has many interesting applications, including physical database design (aimed at optimization of query processing), cooperative answering (where answers are annotated with some of their properties), and multidatabase query decomposition. We examine these applications, and we show how our work extends earlier results obtained in these applications.

ISSE-TR-95-05

Temporal Semantic Assumptions and Their Use in Database Query Evaluation

Claudio Bettini and X. Sean Wang and Sushil Jajodia

ISSE-TR-95-05.pdf

Temporal data explicitly stored in a temporal database are often associated with certain semantic assumptions. Each assumption can be viewed as a way of deriving implicit information from the explicitly stored data. Rather than leaving the task of deriving (possibly infinite) implicit data to application programs, as is the case currently, it is desirable that this be handled by the database management systems. To achieve this, this paper formalizes and studies two types of semantic assumptions: point-based and interval-based. The point-based assumptions include those assumptions that use interpolation methods, while the interval-based assumptions include those that involve different temporal types (time granularities). In order to incorporate semantic assumptions into query evaluation, this paper introduces a translation procedure that converts a user query into a system query such that the answer of this system query over the explicit data is the same as that of the user query over the explicit AND the implicit data. The paper also investigates the finiteness (safety) of user queries and system queries.

ISSE-TR-95-06

Using Formal Methods To Reason About Semantics-Based Decompositions Of Transactions

Paul Ammann and Sushil Jajodia and Indrakshi Ray

ISSE-TR-95-06.pdf

Many researchers have investigated the process of decomposing transactions into smaller pieces to increase concurrency. The research typically focuses on implementing a decomposition supplied by the database application developer, with relatively little attention to what constitutes a desirable decomposition and how the developer should obtain such a decomposition. In this paper, we argue that the decomposition process itself warrants attention. A decomposition generates a set of proof obligations that must be satisfied to show that a particular decomposition correctly models the original collection of transactions. We introduce the notion of semantic histories to formulate and prove the necessary properties. Since the decomposition impacts not only the atomicity of transactions, but isolation and consistency as well, we present a technique based on formal methods that allows these properties to be surrendered in a carefully controlled manner. Note: The text of this paper appears in VLDB '95, Proceedings of the Twenty-First International Conference on Very Large Databases, Zurich, Switzerland, September, 1995. In addition, the appendices of this report contain proofs that space constraints precluded from the VLDB paper.

ISSE-TR-95-07

A Performance Oriented Design Methodology for Large-Scale Distributed Data Intensive Information Systems

Daniel A. Menascé and Hassan Gomaa and Larry Kerschberg

ISSE-TR-95-07.pdf

The Earth Observing System (EOS) Data and Information System (EOSDIS) is perhaps one of the most important examples of large-scale, geographically distributed, and data intensive systems. Designing such systems in a way that guarantees that the resulting design will satisfy all functional and performance requirements is not a trivial task. This paper presents a performance-oriented methodology to design large-scale distributed data intensive information systems. The methodology is then applied to the design of EOSDIS Core System (ECS). Performance results, based on queuing network models of ECS are also presented.

ISSE-TR-95-08

A Software Architectural Design Method for Large-Scale Distributed Data Intensive Information Systems

Hassan Gomaa and Larry Kerschberg and Daniel A. Menascé

ISSE-TR-95-08.pdf

This paper describes a Software Architectural Design Method for Large-Scale Distributed Data Intensive Information Systems. The method starts by developing a domain model of the system. The domain model is then mapped to a federated client/server software architecture, where the servers need to cooperate with each other to service client requests. To demonstrate the viability of the architecture, specific user scenarios are applied to the federated client / server software architecture using event sequence diagrams. The method is illustrated by applying it to the design of a complex software system, the Earth Observing System Data and Information System (EOSDIS) Core System.

ISSE-TR-95-09

Data and Information Architectures for Large-Scale Distributed Data Intensive Information Systems

Larry Kerschberg and Hassan Gomaa and Daniel A. Menascé

ISSE-TR-95-09.pdf

No abstract or online report available.

ISSE-TR-95-10

A Semantic Model of Program Faults

Jeff Offutt and Jane Hayes

ISSE-TR-95-10.pdf

Program faults are artifacts that are widely studied, but there are many aspects of faults that we still do not understand. In addition to the simple fact that one important goal during testing is to detect faults, a full understanding of the characteristics of faults is crucial to several research areas in testing. These include fault-based testing, testability, mutation testing, and the comparative evaluation of testing strategies. In this preliminary paper, we explore the fundamental nature of faults by looking at the differences between a syntactic and semantic characterization of faults. We offer definitions of these characteristics and explore the differentiation. Specifically, we discuss the concept of “size” of program faults --- the measurement of size provides interesting and useful distinctions between the syntactic and semantic characterization of faults. We use the fault size observations to make several predictions about testing and present data that supports this model. We also use the model to offer explanations about several questions that have intrigued testing researchers.

ISSE-TR-95-11

Finding Mode Invariants in SCR Specifications

Zhenyi Jin

ISSE-TR-95-11.pdf

This paper presents an algorithm to derive modeclass invariants from SCR specifications. Matrices are formed to help to find the invariants. A case study of a cruise control system shows that the algorithm is capable of determining the same invariants that were proved via model checking in earlier investigations. This algorithm provides a mechanical procedure that are much easier and clearer for computing invariants than computing from reachability graphs. It can be easily automated and handle big systems. It views the SCR specifications from the SCR structure's point of view and analyzes mode invariants as structural properties. It presents a new way to analyze software requirements documents and detects implicitly implied system property information that can be very useful and important in understanding, developing, and testing the software system. Related issues such as generating test cases that are independent of the design structure based on the SCR specifications are discussed to reveal possible research problems and directions in the future.

ISSE-TR-95-12

Integrating Testing with the Software Development Process

Zhenyi Jin and A. Jefferson Offutt

ISSE-TR-95-12.pdf

Software testing is usually postponed to the end of the development, after coding has started, or even after it has ended. By waiting until this late in the process, testing winds up being compressed --- there are not enough resources (time and budget) remaining, problems with previous stages have been solved by taking time and dollars from testing, and there is insufficient time to adequately plan for testing. In this paper, we present an integrated approach to testing software, where testing activities begin as soon as development activities begin, and are carried out in parallel with the development activities. We present specific activities --- including planning, active testing, and development influencing activities --- that are associated with each of the traditional lifecycle phases. These activities can be carried out by the developers or separate test engineers, and can be associated with development activities within the confines of any development process. These activities allows the tester to detect and prevent throughout the software development process, leading to more reliable and higher quality software.

ISSE-TR-95-13

Advanced Transaction Processing in Multilevel Secure File Stores

Elisa Bertino and Sushil Jajodia and Luigi Mancini and Indrajit Ray

ISSE-TR-95-13.pdf

The concurrency control requirements for transaction processing in a multilevel secure file system are different from those in conventional transaction processing systems; in particular, there is the need to coordinate transactions at different security levels avoiding both potential timing covert channels and the starvation of transactions at high security levels. Suppose a transaction at a low security level attempts to write a data item that is being read by a transaction at a higher security level. On the one hand, a timing covert channel arises if the transaction at the low security level is either delayed or aborted by the scheduler. On the other hand, the transaction at the high security level may be subjected to an indefinite delay if it is forced to abort repeatedly. This paper extends the two-phase locking mechanism that is suitable for conventional transaction processing systems, to multilevel secure data management systems. The scheme presented here, avoids the abort of higher level transactions, nonetheless guaranteeing serializability. The system programmer is provided with a powerful set of linguistic constructs which supports exception handling, partial rollback and forward recovery. The proper use of these constructs can prevent the indefinite delay in completion of a high level transaction, and allows the system programmer to trade off starvation with transaction isolation.

ISSE-TR-95-14

Applying Formal Methods to Semantic-Based Decomposition of Transactions

Paul Ammann and Sushil Jajodia and Indrakshi Ray

ISSE-TR-95-14.pdf

In some important database applications, performance requirements are not satisfied by the traditional approach of serializability, in which transactions appear to execute atomically and in isolation on a consistent database state. Although many researchers have investigated the process of decomposing transactions into steps to increase concurrency, the focus of the research is typically on implementing a decomposition supplied by the database application developer, with relatively little attention to what constitutes a desirable decomposition and how the developer should obtain such a decomposition. In this paper, we focus on the decomposition process itself. A decomposition generates a set of proof obligations that must be satisfied to show that the decomposition correctly models the original collection of transactions. We introduce the notion of semantic histories to formulate and prove the necessary properties, and the notion of successor sets to describe efficiently the correct interleavings of steps. The successor set constraints use information about conflicts between steps so as to take full advantage of conflict serializability at the level of steps. We implement the resulting, verified decomposition in a two-phase locking environment.

ISSE-TR-95-15

ViewFinder: An Object-Oriented Browser

Alessandro D'Atri and Amihai Motro and Laura Tarantino

ISSE-TR-95-15.pdf

ViewFinder is a graphical tool for browsing in databases that provides a flexible, yet intuitive environment for exploratory searches. The design approach has been to provide maximum functionality and generality without sacrificing simplicity. The constructs of ViewFinder's external model are essentially object-oriented: class and token objects, membership relationships between tokens and classes, generalization relationships between classes, inheritance, and so on. This external model is based on an internal model which resembles a semantic network. Such a network may be extracted from a variety of data models, including object-oriented, entity-relationship and extended relational models. This architecture gives ViewFinder a large degree of model independence. The main construct of the external model are displays of objects (either classes or tokens), called views. Commands are available for efficient traversal of the database, displaying views of any class or token. To speed up repetitive searches, views may be synchronized: the user sets up several views, linked in a tree-like structure, so that when the information displayed in the root view is modified (e.g., scrolled by the user), the contents of the other views change automatically. Additional commands are available to search, order, aggregate and select the information displayed in a view, thus providing a simple query facility.

1994

ISSE-TR-94-00

An Experimental Determination of Sufficient Mutation Operators

Jeff Offutt and Ammei Lee and Gregg Rothermel and Roland Untch and Christian Zapf

ISSE-TR-94-00.pdf

Mutation testing is a technique for unit testing software that, although powerful, is computationally expensive. The principal expense of mutation is that many variants of the test program, called mutants, must be repeatedly executed. Selective mutation is a way to reduce the cost of mutation testing by reducing the number of mutants that must be executed. This paper reports experimental results that compare selective mutation testing to standard, or non-selective, mutation testing. The results support the hypothesis that selective mutation is almost as strong as non-selective mutation; in experimental trials selective mutation provides almost the same coverage as non-selective mutation, with a four-fold or more reduction in cost.

ISSE-TR-94-01

On the Expressive Power of the Unary Transformation Model

Ravi S. Sandhu and Srinivas Ganta

ISSE-TR-94-01.pdf

The Transformation Model (TRM) was recently introduced in the literature by Sandhu and Ganta. TRM is based on the concept of transformation of rights. The propagation of access rights in TRM is authorized entirely by existing rights for the object in question. It has been demonstrated in earlier work that TRM is useful for expressing various kinds of consistency, confidentiality, and integrity controls. In our previous work a special case of TRM named Binary Transformation Model (BTRM) was defined. We proved that BTRM is equivalent in expressive power to TRM. This result indicates that it suffices to allow testing for only two cells of the matrix. In this paper we study the relationship between TRM and the Unary Transformation Model (UTRM). In UTRM, individual commands are restricted to testing for only one cell of the matrix (whereas individual TRM commands can test for multiple cells of the matrix). Contrary to our initial conjecture, we found, quite surprisingly, that TRM and UTRM are formally equivalent in terms of expressive power. The implications of this result on safety analysis is also discussed in this paper. The construction used to prove the equivalence of TRM and UTRM also indicates that they might not be practically equivalent.

ISSE-TR-94-02

A Graduate Course in Object-Oriented Analysis Based on Student-Generated Projects

Bo Sanden

ISSE-TR-94-02.pdf

This paper describes the experience from several offerings of a Software Requirements course based on the Object Modeling Technique by Rumbaugh et al. The paper describes the organization of the course, which includes a semester project carried out by groups of 3-4 students. Descriptions of 20 such projects are given, most of which were based on ideas from the student teams.

ISSE-TR-94-03

Entity-Life Modeling and Ada 9X

Jeffrey R. Carter and Bo I. Sanden

ISSE-TR-94-03.pdf

Entity-life modeling (ELM) is a design approach for reactive systems. ELM designs are usually implemented in Ada and rely on its packaging and tasking facilities. An ELM solution of a Flexible Manufacturing System (FMS) has been published with an implementation in Ada 83. Ada 83 has a number of shortcomings for reactive systems, many of which are addressed by the proposed revision, Ada 9X. This paper examines the implementation of the FMS in Ada 9X and compares it to the earlier implementation. Although the FMS problem was conceived independently from the Ada 9X effort, the Ada 83 implementation is virtually a catalog of problems that are elegantly addressed by Ada 9X. This indicates the importance of the 9X revision to make Ada useful in reactive, concurrent applications.

ISSE-TR-94-04

Object-Oriented Analysis for Control Systems with Shared Resources

Bo Sanden and Anhtuan Dinh

ISSE-TR-94-04.pdf

This paper proposes a modification of existing approaches to object-oriented analysis (particularly the Object Modeling Technique, OMT) to address the problem of resource sharing in control systems. In control-system analysis, it is important to identify certain objects as the resource users and other as the (shared) resources. Once this has been done, the system can often be made deadlock free simply by ensuring that resource users always acquire shared resources in the same order. Resource users can easily be high-lighted in the object model, while events and actions associated with resource allocation can be shown in the dynamic model. A flexible manufacturing system (FMS) is given as a non-trivial example. In that example, jobs need simultaneous exclusive access to various devices.

ISSE-TR-94-05

Experiments with Data Flow and Mutation Testing

Jeff Offutt and Jie Pan and Kanupriya Tewary and Tong Zhang

ISSE-TR-94-05.pdf

This paper presents two experimental comparisons of data flow and mutation testing. These two techniques are widely considered to be effective for unit-level software testing, but can only be analytically compared to a limited extent. We compare the techniques by evaluating the effectiveness of test data developed for each. For a number of programs, we develop ten independent sets of test data; five to satisfy the mutation criterion, and five to satisfy the all-uses data flow criterion. These test sets are developed using automated tools, in a manner consistent with the way a test engineer would apply these testing techniques. We use these test sets in two separate experiments. First we apply a “cross scoring”, by measuring the effectiveness of the test data that was developed for one technique in terms of the other technique. Second, we investigate the ability of the test sets to find faults. We place a number of faults into each of our subject programs, and measure the number of faults that are detected by the test sets. Our results indicate that while both techniques are effective, mutation-adequate test sets are closer to satisfying data flow, and detect more faults.

ISSE-TR-94-06

A Safety Kernel for Traffic Light Control

Paul Ammann

ISSE-TR-94-06.pdf

The success of kernels for enforcing security has led to proposals to use kernels for enforcing safety. As a feasibility demonstration of one such proposal, we describe a kernel for traffic light control. We argue the utility of the kernel and state the safety properties the kernel must ensure. We give a formal specification of the kernel and show that the specification maintains the safety properties. We explain an implementation of the kernel in Ada and discuss use of the kernel.

ISSE-TR-94-07

Algebraic Query Languages on Temporal Databases with Multiple Time Granularities

X. Sean Wang

ISSE-TR-94-07.pdf

This paper investigates algebraic query languages on temporal databases. The data model used is a multidimensional extension of the temporal modules introduced in [1]. In a multidimensional temporal module, every non-temporal fact has a timestamp that is a set of n-ary tuples of time points. A temporal module has a set of timestamped facts and has an associated temporal granularity (or temporal type), and a temporal database is a set of multidimensional temporal modules with possibly different temporal types. Temporal algebras are proposed on this database model. Example queries and results of the paper show that the algebras are rather expressive. The operations of the algebras are organized into two groups: snapshot-wise operations and timestamp operations. Snapshot-wise operations are extensions of the traditional relational algebra operations, while timestamp operations are extensions of first-order mappings from timestamps to timestamps. Multiple temporal types are only dealt with by these timestamp operations. Hierarchies of algebras are defined in terms of the dimensions of the temporal modules in the intermediate results. The symbol Talg_k^m,n is used to denote all the algebra queries whose input, output and intermediate modules are of dimensions at most m, n and k, respectively. (Most temporal algebras proposed in the literature are in Talg₁^1,1.) Equivalent hierarchies Tcalc_k^m,n are defined in a calculus query language that is formulated by using a first-order logic with linear order. The addition of aggregation functions into the algebras is also studied.

ISSE-TR-94-08

An Application of Entity Life Modeling to Incremental Re-engineering of Fortran Reactive Program Components

F. G. Patterson Jr. and Jean-Jacques Moortgat

ISSE-TR-94-08.pdf

This paper describes a case study in which the Entity Life Modeling (ELM) technique is used to reconstruct a single module from a legacy FORTRAN application. The architecture of the module changed as a result of entity identification and mode analysis. Concurrent aspects of the problem domain were identified and modeled as concurrent tasks. To interface with the greater part of the software system that was not reconstructed, an interface object was created. The interface object may be regarded as scaffolding that will be discarded when neighboring increments of the system are reengineered. The reengineered increment has a highly maintainable design that is characteristic of complete systems constructed using the ELM methodology.

ISSE-TR-94-09

Using Constraints to Detect Equivalent Mutantsr

Jeff Offutt and Jie Pan

ISSE-TR-94-09.pdf

Mutation testing is a software testing technique that is considered to be very powerful, but is very expensive to apply. One open problem in mutation testing is how to automatically detect equivalent mutant programs. Currently, equivalent mutants are detected by hand, which makes it a very expensive and time-consuming process, and restricts the use of mutation testing. This paper presents a technique that uses mathematical constraints to automatically detect equivalent mutants. A tool Equivalencer has been developed to demonstrate this technique, and experimental results from using this tool are presented.

ISSE-TR-94-10

The Dynamic-domain Reduction Approach to Test Data Generation: Design and Algorithms

Jeff Offutt and Zhenyi Jin and Jie Pan

ISSE-TR-94-10.pdf

Although there are many techniques and tools available to support the software testing process, one of the most crucial parts of testing, generating the test data, is usually done by hand. Test data generation is one of the most technically challenging steps in testing software, but unfortunately, most practical systems do not incorporate any automation in this step. This paper presents a new method for automatically generating test data that incorporates ideas from symbolic evaluation, constraint-based testing, and dynamic test data generation. It takes an initial set of values for each input, and “pushes” the values through the control-flow graph of the program in a dynamic way, modifying the sets of values as branches in the program are taken. This method is an outgrowth of previous research in constraint-based testing, and combines several of the steps that were previously separate into one coherent process. The major difference is that this method is dynamic, and resolves path constraints immediately; it also includes an intelligent search technique, and improved handling of arrays, loops, and pointers.

ISSE-TR-94-11

Logic Design for Temporals Database with Multiple Temporal Types

X. Sean Wang and Claudio Bettini and Alexander Brodsky and Sushil Jajodia

ISSE-TR-94-11.pdf

The purpose of good database logical design is to eliminate data redundancy and insertion and deletion anomalies. In order to achieve this objective for temporal databases, the notions of temporal types and temporal functional dependencies (TFDs) are introduced. A temporal type is a monotonic mapping from ticks of time (represented by positive integers) to time sets (represented by subsets of reals) and is used to capture various standard and user-defined calendars. A TFD is a proper extension of the traditional functional dependency and takes the form X-u->Y, meaning that there is a unique value for Y during one tick of the temporal type u for one particular X value. An axiomatization for TFDs is given. Since a finite set of TFDs usually implies an infinite number of TFDs, we introduce the notion of and give an axiomatization for a finite closure to effectively capture a finite set of implied TFDs that are essential to the logical design. Temporal normalization procedures with respect to TFDs are then given. More specifically, temporal Boyce-Codd normal form (TBCNF) that eliminates all data redundancies, and temporal third normal form (T3NF) that allows dependency preservation, are defined. Both normal forms are proper extensions of their traditional counterparts, BCNF and 3NF. Decomposition algorithms are presented that give lossless TBCNF decompositions and lossless, dependency preserving, T3NF decompositions.

ISSE-TR-94-12

The LyriC Language: Querying Constraint Objects

Alexander Brodsky and Yoram Kornatzky

ISSE-TR-94-12.pdf

Presented in this paper is a novel language for querying object-oriented databases where objects may hold spatial, temporal or constraint data, conceptually represented by equality and inequality constraints. The proposed LyriC language is motivated by application realms such as (1) constraint-based design in two-, three-, or higher-dimensional space, (2) large-scale optimization and analysis, based mostly on linear programming techniques, and (3) spatial and geographic databases. LyriC is an extension of flat constraint query languages, and especially those for linear constraint databases, to complex objects. The extension is based on the object-oriented paradigm, where a database is a collection of objects some of whose attributes might be either described explicitly or implicitly by constraints. Constraints are organized in classes with associated methods. The query language is an extension of the language XSQL, suggested by Kifer, Kim and Sagiv, and is built around the idea of extended path expressions. Path expressions in a query traverse nested structures in one sweep. Constraints are used in a query to filter stored constraints and to create new constraint objects.

ISSE-TR-94-13

Semantic Update Optimization in Active Databases

Jong Pil Yoon and Larry Kerschberg

ISSE-TR-94-13.pdf

In an active database, an update may be constrained by integrity constraints, and may also trigger rules that, in turn, may affect the database state. The general problem is to effect the update while also managing the “side-effects” of constraint enforcement and rule execution. In this paper an update calculus is proposed by which updates, constraints and rules are specified and managed within the same formalism. Constraints and production rules are expressed in a constraint language based on first-order logic. These logic expressions are used to semantically transform an original update into a sequence of updates that reflect the relevant constraints and production rules. The inference mechanism associated with processing a reformulated query ensures that: 1) the pre- and post-conditions of an update are satisfied, 2) update side-effects are propagated, and 3) repairs are made to tuples exhibiting constraint violations. Thus, a user-specified “update” is transformed, through semantic reformulation techniques, into a sequence of updates which together ensure semantic integrity of the original update as well as its propagated side-effects. This research presents several contributions. Integrity constraints and production rules are expressed in a declarative formalism so that they may be reasoned about. The update calculus formalism handles the semantic reformulation of an update to reflect relevant constraints and rules governing it. Finally, an algorithm is presented to handle constraint enforcement, production rule firing, and subsequent repair actions.

ISSE-TR-94-14

A Restrictive Definition of Concurrency for Discrete-event Modeling

Bo Sanden

ISSE-TR-94-14.pdf

Concurrency is an increasingly popular modeling concept in object-oriented analysis, software design, process modeling and other areas. Many approaches view their problem domains in terms of independent, concurrent processes that occasionally need to be synchronized. This is potentially an extremely concurrent model, where “essential” processes that represent independent progress may be lost in the wealth of more trivial processes. This is particularly important in a software implementation, where various overhead costs are incurred by each process.

1993

ISSE-TR-93-00

Functional and Test Specifications for the MiStix File System

Paul Ammann and Jeff Offutt

ISSE-TR-93-00.pdf

Mistix is a simple file system based on the Unix file system. Mistix is used in classroom exercises in graduate software engineering classes at George Mason University. In this document, we supply several different specifications for Mistix. First we give an informal specification. Next, since the informal specification turns out to be difficult to formalize directly, we supply a revised informal specification that matches subsequent formal specifications. We give a model-based specification for Mistix in Z, followed by functional test specifications based on the Category-Partition method. Finally, we give an algebraic specification for Mistix.

ISSE-TR-93-01

Empirical Comparisons of Data Flow and Mutation Testing

Jeff Offutt and Kanupriya Tewary

ISSE-TR-93-01.pdf

Data flow and mutation testing are two powerful white box testing techniques for unit-level software testing. Unfortunately, they cannot be completely compared on an analytical basis, for example, mutation is incomparable on an inclusion basis with most data flow criteria. This paper shows that mutation includes the All-defs data flow criterion, but is incomparable with other data flow criteria, and presents results from two empirical comparisons of mutation with the All-uses data flow criterion. For the first comparison, we define a coverage measure that is used for the comparison. The coverage of data flow by mutation is between 99% and 100%, which means that test data that satisfied the mutation criterion also satisfied the data flow criterion in almost all cases. The coverage of mutation by data flow was between 80 and 90%. For the second comparison, we use test data that satisfy the two criteria to detect faults, and compare the criteria on the basis of the faults found. The empirical evidence strongly indicates that while data flow-adequate test sets are close to being mutation-adequate, mutation-adequate test sets are almost invariably data flow-adequate.

ISSE-TR-93-02

Planar Lattice Security Structures for Multi-level Replicated Databases

Paul Ammann and Sushil Jajodia

ISSE-TR-93-02.pdf

We review three algorithms for concurrency control in secure, multi-level replicated databases. We show by counterexample that the various algorithms, even if suitably modified to correct published problems, still do not guarantee serializability for either general partial orders or general lattices. The counterexamples suggest that the difficulty is a general feature of concurrency control algorithms tailored to the replicated architecture. We make two constructive observations. First, sufficient centralization permits algorithms applicable to general partial orders. Second, the reviewed algorithms are applicable to more restrictive security structures. Specifically, we observe that the intuitively appealing restriction to planar lattices is sufficient.

ISSE-TR-93-03

Recognition and Approximation of NTrees

Ravi Sandhu and Srinivas Ganta

ISSE-TR-93-03.pdf

The ntree hierarchy for protection groups was recently introduced in the literature and shown to have an efficient representation by assigning a pair of lr-values to each group so that U is a proper subgroup of V if only if l[U]<l[V] and r[U]<[V]. NTrees are incrementally constructed by refinement of an existing group into another ntree of groups, with forests of trees and inverted trees as the basic ntrees to start this process. It is shown that ntrees have a simpler characterization as the class of partial orders constructed by binary refinements, in which an existing group is refined into exactly two groups. An algorithm for recognizing ntrees is developed on the basis of this characterization. It is also shown how to assign leftmost lr-values which play a central role in ensuring that refinement requires a strictly incremental change in representation of the ntree. Also two algorithms for approximating an arbitrary hierarchy by an ntree are given.

ISSE-TR-93-04

Globally Consistent Event Ordering In One-Directional Distributed Environments

Paul Ammann and Sushil Jajodia and Phyllis Frankl

ISSE-TR-93-04.pdf

We consider communication structures for event ordering algorithms in distributed environments where information flows only in one direction. Example applications are multilevel security and hierarchical databases. Although the most general one-directional communication structure is a partial order, partial orders do not enjoy the property of being consistently-ordered, a formalization of the notion that globally consistent event ordering is ensured. Our main result is that the crown-free property is necessary and sufficient for a communication structure to be consistently-ordered. We discuss the computational complexity of detecting crowns and sketch typical applications.

ISSE-TR-93-05

Using Formal Methods To Mechanize Category-Partition Testing

Paul Ammann and Jeff Offutt

ISSE-TR-93-05.pdf

We extend the category-partition method, a specification-based method for testing software. Previous work in category-partition has focused on developing structured test specifications that describe software tests. We offer guidance in making the important decisions involved in transforming test specifications to actual test cases. We present a structured approach to making those decisions, including a mechanical procedure for test derivation. With this procedure, we suggest a heuristic for choosing a minimal coverage of the categories identified in the test specifications, suggest parts of the process that can be automated, and offer a solution to the problem of identifying infeasible combinations of test case values. Our method uses formal schema-based functional specifications and is illustrated with an example of a simple file system. We conclude that our approach eases test case creation and leads to effective tests of software. We also note that by basing this procedure on formal specifications, we can identify anomalies in the functional specifications.

ISSE-TR-93-06

A Formal Framework for Integrating Inconsistent Answers from Multiple Information Sources

Ami Motro

ISSE-TR-93-06.pdf

In any environment of multiple information sources it is practically unavoidable that information sources would overlap; that is, describe the same portion of the real world. And when information sources overlap, it is practically unavoidable that information would be inconsistent; that is, describe differently the same portion of the real world. In this report we define a formal framework for resolving extensional inconsistencies that are detected in the process of processing queries against a collection of information sources. This framework establishes basic database concepts, such as schemes, instances, views, queries and integrity constraints, as well as new concepts, such as view equivalence, scheme mappings, multi-databases, multi-database queries and inconsistency. Altogether, it provides the formal foundations necessary for addressing issues of information integration and inconsistency resolution. Our approach to the integration of inconsistent answers is based on information goodness, a measure for estimating the proximity of stored information to the ideal information. We propose to maintain for each information source a goodness base: a set of basic views whose goodness will be useful for inferring the goodness of anticipated queries. When a query is answered inconsistently by individual information sources we propose to (1) estimate the goodness of each answer from the appropriate goodness base, and (2) integrate the individual answers in an answer of the highest goodness. We sketch the architecture of a software agent, called Harmony, that will implement our approach.

ISSE-TR-93-07

A Framework for Constraint Management in Object-Oriented Databases

Jong P. Yoon and Larry Kerschberg

ISSE-TR-93-07.pdf

We introduce the concept of rule schema in this paper in order to support constraints in the active object-oriented paradigm. The rule schema provides the meta-knowledge associated with constraints analogous to the way a database schema provides metadata about the database objects. It is used to compile the constraints into clauses which are then “attached” to appropriate object attributes. The compiled constraints incorporate the semantics associated with inheritance over generalization hierarchies and references to simple objects which are constituents of complex objects. These clauses are evaluated and enforced whenever an attribute value is retrieved or updated. The update semantics for update propagation of object instances is embedded in the compiled constraints; constraints therefore play a role similar to methods in the model, but are specified explicitly rather than procedurally. They can therefore be managed by the active object-oriented database.

ISSE-TR-93-08

Semantic Query Optimization in Deductive Object-Oriented Databases

Jong P. Yoon and Larry Kerschberg

ISSE-TR-93-08.pdf

This paper addresses the problem of semantic query reformulation in the context of object-oriented deductive databases. It extends the declarative object-oriented specifications of F-logic proposed by Kifer and Lausen using the semantic query optimization technique developed by Chakravarthy, Grant, and Minker. In general, query processing in object-oriented databases is expensive when a query incorporates declarative rules, methods and inherited properties. We introduce the technique of semantic query reformulation for F-logic queries which transforms the original query into an equivalent, semantically-rich query that is more efficiently processed. We also discuss the issues of conflict resolution strategies and query evaluation priorities for queries involving the upper bounds of objects in the F-logic “type” lattice.

ISSE-TR-93-09

A Framework for Knowledge Discovery and Evolution in Databases

Jong P. Yoon and Larry Kerschberg

ISSE-TR-93-09.pdf

Although knowledge discovery is increasingly important in databases, discovered knowledge is not always useful to users. It is mainly because the discovered knowledge does not fit user's interests, or it may be redundant or inconsistent with a priori knowledge. Knowledge discovery in databases depends critically on how well a database is characterized and how consistently the existing and discovered knowledge is evolved. This paper describes a novel concept for knowledge discovery and evolution in databases. The key issues of this work include: using a database query to discover new rules; using not only positive examples (answer to a query) but also negative examples to discover new rules; harmonizing existing rules with the new rules. The main contribution of this paper is the development of a new tool for (1) characterizing the exceptions in databases and (2) evolving knowledge as a database evolves.

ISSE-TR-93-10

Designing Graphical User Interfaces with Entity-life Modeling

Douglas L. Smith

ISSE-TR-93-10.pdf

This paper explores using entity-life modeling (ELM) as an effective method for designing concurrent graphical user interfaces in a message processing environment. In ELM, multiple threads of control in the software are modeled on threads of events in the problem environment, and software objects are modeled on objects in the problem. The goal is to identify a minimum number of threads in the problem environment that drive the system while operating on objects. ELM states that threads are primarily those processes that must be rescheduled at particular points in time and those that vie for resources. The intent of this paper is to show that the ELM approach to problem analysis needs to be modified in order to produce optimal designs in a message passing environment. The paper suggests that some reschedulable processes do not need their own threads of control. Instead, the desired effect can be achieved by timers that insert a message into the message queue of a given process at the appropriate time. The paper also proposes a new category of background entities to account for time-consuming calculations that must be carried out concurrently with ordinary message processing.

1992

ISSE-TR-92-00

Subjects and Objects: Structuring Concepts for Reactive Systems

Bo Sanden

ISSE-TR-92-00.pdf

This paper proposes subject-object modeling as a new design approach for reactive software. Subject-object modeling combines the strength of the existing object and process paradigms of computation. Each paradigms stress the close relationship between the structure of the software and the structure of the problem. The object paradigm sees the world in terms of objects sending messages to each other, while the process paradigm sees it in terms of communicating sequential processes. With both paradigms, the structure of the software can be directly patterned on the structure of the problem environment, providing a seamless transition from analysis to design. The problem with the object paradigm is that it does not handle timing well. Some object-oriented authors play down the timing aspect and regard the operations on objects as “shopping lists” to be picked from in any order. But timing is essential in many systems and particularly in reactive software. The process paradigm captures time dependencies, but forces the user to regard even trivial objects in process terms, which is often cumbersome and counter-intuitive. Subject-object modeling enhances the object paradigm with a means to express time relationships beyond the scope of any one object and enhances the process paradigm with object decomposition as a means for information hiding and for the handling of simple objects. The analyst's goal is a compelling description of the problem domain based on processes operating on objects, where the processes capture the timing and ordering of operations on objects. Such a problem description is useful in its own right. It can also be mapped directly onto a software design. In an Ada environment, processes map onto tasks and the objects become packages (or instances of abstract data types.) In this paper, the subject-object approach is applied to a problem with free-floating buoys and an elevator control problem. The solution to the buoy problem is compared to an earlier-published object-based solution.

ISSE-TR-92-01

Artificial Realities, Virtual Communities, and Knowbots

Chris Dede

ISSE-TR-92-01.pdf

As the power of information technology grows and its cost diminishes, training is shifting into complex, computer-maintained worlds. The primary reason for this transition is effectiveness; richly detailed virtual environments leverage the most important variables for educational success. These types of training applications can enhance learners' motivation to spend time on task, provide collaborative experiences to foster peer teaching, tailor material to each student's needs and background, and promote the transferability of complicated knowledge and skills into real-world settings. This report describes progress in three aspects of virtual environments that draw on ideas from artificial intelligence: artificial realities, virtual communities, and knowbots.

ISSE-TR-92-02

Using Compiler Optimization Techniques to Detect Equivalent Mutants

A. Jefferson Offutt and W. Michael Craft

ISSE-TR-92-02.pdf

Mutation is a software testing technique that requires the tester to generate test data that will find specific, well-defined errors. Mutation testing executes many slightly differing versions, called mutants, of the same program to evaluate the quality of the data used to test the program. Although these mutants are generated and executed efficiently by automated methods, many of the mutants are functionally equivalent to the original program and are not useful for testing. Recognizing and eliminating equivalent mutants has traditionally been done by hand, a time-consuming and arduous task, which limits the practical usefulness of mutation testing. This paper presents extensions to previous work in detecting equivalent mutants; specifically, we present algorithms for determining several classes of equivalent mutants, and results from an implementation of these algorithms. These algorithms are based on data flow analysis and six compiler optimization techniques. We describe each of these techniques and how they are used to detect equivalent mutants. The design of the tool, and some experimental results using it are also presented. Finally, a new approach for detecting equivalent mutants that may be more powerful than the optimization techniques is introduced.

Technical Report Archive

Department of Computer Science
George Mason University

Submission Instructions

Technical Report Archive

Department of Computer ScienceGeorge Mason University

Submission Instructions

Department of Computer Science
George Mason University