Unpack thesis sources.

2024-07-01 03:43:45 +02:00
parent 9ca84861b3
commit 38c93a2cb1
38 changed files with 6362 additions and 0 deletions
--- a/thesis/Chapters/ch1-introduction.tex
+++ b/thesis/Chapters/ch1-introduction.tex
@@ -0,0 +1,44 @@
+% !TEX root = ../Thesis.tex
+\chapter{Introduction}\label{introduction}
+\iot devices are becoming increasingly prevalent in modern homes, offering a range of benefits such as controlling home lighting, remote video monitoring, and automated cleaning \citep{iothome2019}. 
+These conveniences are made possible by the sensors and networked communication capabilities embedded within these devices. 
+However, these features also pose significant privacy and security risks \citep{islamiot2023}. 
+IoT devices are often integrated into home networks and communicate over the internet with external servers, potentially enabling surveillance or unauthorized data sharing without the user's knowledge or consent \citep{infoexpiot}. Moreover, even in the absence of malicious intent by the manufacturer, these devices are still vulnerable to programming bugs and other security failures \citep{peekaboo2020}.
+\medskip
+
+Security researchers focused on the security and privacy of such \iot devices rely on various utilities and tools for conducting research. 
+These tools are often glued together in scripts with arbitrary decisions about file naming and data structuring. 
+Such impromptu scripts typically have a narrow range of application, making them difficult to reuse across different projects. Consequently, useful parts are manually extracted and incorporated into new scripts for each project, exacerbating the problem.
+\medskip
+
+This approach leads to scattered data, highly tailored scripts, and a lack of standardized methods for sharing or reproducing experiments. The absence of standardized tools and practices results in inconsistencies in data collection and storage, making it difficult to maintain compatibility across projects. 
+Furthermore, the lack of conventions about file naming and data structuring leads to issues in finding and accessing the data. 
+For research groups, these issues are further compounded during the onboarding of new members, who must navigate this fragmented landscape and often create their own ad-hoc solutions, perpetuating the cycle of inefficiency and inconsistency.
+\medskip
+
+To systematically and reproducibly study the privacy and security of IoT devices, an easy-to-use testbed that automates and standardizes various aspects of experimenting with IoT devices is needed.
+
+\section{Motivation}\label{sec:motivation}
+The primary motivation behind this project is to address the challenges faced by security researchers in the field of IoT device security and privacy. 
+The scattered nature of data, the lack of standardized tools, and the ad-hoc methods used for data collection or processing, are an obstacle for researchers who want to produce valid and reproducible results \citep{fursinckorg2021}. 
+A standardized testbed, enabling a more systematic approach to collecting and analyzing network data from \iot devices, can help make tedious and error-prone aspects of conducting experiments on \iot devices more bearable, while at the same time enhancing the quality of the data, by adhering to interoperability standards by establishing data collection and storage standards.
+This bachelor project is specifically informed by the needs of the PET research group at the University of Basel, who will utilize it to run IoT device experiments, and as a foundation to build more extensive tooling.
+
+\section{Goal}\label{sec:goal}
+The goal of this project is to design and implement a testbed for IoT device experiments. To aid reproducibility, there are two main objectives:
+
+First, the testbed should automate key aspects of running experiments with IoT devices, particularly the setup and initialization of data collection processes as well as some basic post-collection data processing.
+
+Secondly, the testbed should standardize how data from experiments is stored. This includes standardizing data and metadata organization, establishing a naming scheme, and defining necessary data formats.
+A more detailed description to how this is adapted for this project follows in \cref{ch:adaptation}.
+
+
+\section{Outline}
+This report documents the design and implementation of an \iot testbed.
+In the remainder of the text, the typographically formatted string "\iottbsc" refers to this projects' conception of testbed, whereas "\iottb" specifically denotes the Python package which is the implementation artifact from this project.
+
+This report outlines the general goals of a testbed, details the specific functionalities of \iottbsc, and explains how the principles of automation and standardization are implemented. 
+We begin by giving some background on the most immediately useful concepts.
+\cref{ch:adaptation} derives requirements for \iottbsc starting from first principles and concludes by delineating the scope considered for implementation, which is described in \cref{ch4}.
+In \cref{ch:5-eval} we evaluate \iottbsc, and more specifically, the \iottb software package against the requirements stated in \cref{ch:adaptation}.
+We conclude in \cref{ch:conclusion} with an outlook on further development for \iottbsc.
--- a/thesis/Chapters/ch2-background.tex
+++ b/thesis/Chapters/ch2-background.tex
@@ -0,0 +1,48 @@
+% !TEX root = ../Thesis.tex
+\chapter{Background}
+This section provides the necessary background to understand the foundational concepts related to IoT devices, testbeds, and data principles that inform the design and implementation of \iottbsc.
+
+\section{Internet of Things}
+The \iot refers to the connection of “things” other than traditional computers to the internet. The decreasing size of microprocessors has enabled their integration into smaller and smaller objects. Today, objects like security cameras, home lighting, or children's toys may contain a processor and embedded software that enables them to interact with the internet. The Internet of Things encompasses objects whose purpose has a physical dimension, such as using sensors to measure the physical world or functioning as simple controllers. When these devices can connect to the internet, they are considered part of the Internet of Things and are referred to as \textbf{IoT devices} (see \citet{whatissmartdevice2018} and \citet{iotfundamentals}).
+
+\section{Testbed}
+A testbed is a controlled environment set up to perform experiments and tests on new technologies. The concept is used across various fields such as aviation, science, and industry. Despite the varying contexts, all testbeds share the common goal of providing a stable, controlled environment to evaluate the performance and characteristics of the object of interest.
+
+Examples of testbeds include:
+\begin{enumerate}
+    \item \textbf{Industry and Engineering}: In industry and engineering, the term \emph{platform} is often used to describe a starting point for product development. A platform in this context can be considered a testbed where various components and technologies are integrated and tested together before final deployment.
+    \item \textbf{Natural Sciences}: In the natural sciences, laboratories serve as testbeds by providing controlled environments for scientific experiments. For example, climate chambers are used to study the effects of different environmental conditions on biological samples (e.g., in \citet{vaughan2005use}). Another example is the use of wind tunnels in aerodynamics research to simulate and study the effects of airflow over models of aircraft or other structures.
+    \item \textbf{Computing}: In computing, specifically within software testing, a suite of unit tests, integrated development environments (IDEs), and other tools could be considered as a testbed. This setup helps in identifying and resolving potential issues before deployment. By controlling parameters of the environment, a testbed can ensure that the software behaves as expected under specified conditions, which is essential for reliable and consistent testing.
+    \item \textbf{Interdisciplinary}: Testbeds can take on considerable scales. For instance, in \citet{tbsmartgrid2013} provides insight into the aspects of  a testbed for a smart electric grid. 
+    This testbed is composed out of multiple systems, — an electrical grid, internet, and communication provision — which in their own right are already complex environments.
+    The testbed must, via simulation or prototyping, provide control mechanisms, communication, and physical system components.
+    
+\end{enumerate}  
+
+
+\section{FAIR Data Principles}
+\label{concept:fair}
+The \emph{FAIR Data Principles} were first introduced by \citet{wilkinson_fair_2016} with the intention to improve the reusability of scientific data. The principles address \textbf{F}indability, \textbf{A}ccessibility, \textbf{I}nteroperability, and \textbf{R}eusability. Data storage designers may use these principles as a guide when designing data storage systems intended to hold data for easy reuse. 
+For a more detailed description, see \citep{go-fair}.
+
+\section{Network Traffic}\label{sec:network-traffic}
+Studying \iot devices fundamentally involves understanding their network traffic behavior. 
+This is because network traffic contains (either explicitly or implicitly embedded in it) essential information of interest. 
+Here are key reasons why network traffic is essential in the context of \iot device security:
+\begin{enumerate}
+    \item \textbf{Communication Patterns}: Network traffic captures the communication patterns between IoT devices and external servers or other devices within the network. By analyzing these patterns, researchers can understand how data flows in and out of the device, which is critical for evaluating performance and identifying any unauthorized communications or unintended leaking of sensitive information.
+    \item \textbf{Protocol Analysis:} Examining the protocols used by IoT devices helps in understanding how they operate. Different devices might use various communication protocols, and analyzing these can reveal insights into their compatibility, efficiency, and security. Protocol analysis can also uncover potential misconfigurations or deviations from expected behavior.
+    \item \textbf{Flow Monitoring:} Network traffic analysis is a cornerstone of security research. It allows researchers to identify potential security threats such as data breaches, unauthorized access, and malware infections. By monitoring traffic, one can detect anomalies that may indicate security incidents or vulnerabilities within the device.
+    \item \textbf{Information Leakage}: \iot devices are often deployed in a home environment and connect to the network through wireless technologies \citep{iothome2019}. This allows an adversary to passively observe traffic. While often this traffic is encrypted, the network flow can leak sensitive information, which is extracted through more complex analysis of communication traffic and Wi-Fi packets \citep{friesssniffing2018}, \citep{infoexpiot}. In some cases, the adversary can determine the state of the smart environment and their users \citep{peekaboo2020}.
+\end{enumerate}
+
+
+\section{(Network) Packet Capture}
+Network \textit{packet capture} \footnote{also known as \emph{packet sniffing}, \emph{network traffic capture}, or just \emph{sniffing}. The latter is often used when referring to nefarious practices.} fundamentally describes the act or process of intercepting and storing data packets traversing a network. It is the principal technique used for studying the behavior and communication patterns of devices on a network. For the reasons mentioned in \cref{sec:network-traffic}, packet capturing is the main data collection mechanism used in \iot device security research, and also the one considered for this project.
+
+\section{Automation Recipes}
+\todoRevise()
+Automation recipes can be understood as a way of defining a sequence of steps needed for a process.
+In the field of machine learning, \textit{Collective Mind}\footnote{\url{https://github.com/mlcommons/ck}} provides a small framework to define reusable recipes for building, running, benchmarking and optimizing  machine learning applications.
+A key aspect of these recipes some platform-independent, which has enabled wider testing and benchmarking of machine learning models. Even if a given recipe is not yet platform independent, it can be supplemented with user-specific scripts which handle the platform specifics. Furthermore, it is possible to create a new recipe from the old recipe and the new script, which, when made accessible, essentially has extended the applicability of the recipe \citet{friesssniffing2018}.
+Automation recipes express the fact that some workflow is automated irrespective of the underlying tooling. A simple script or application can be considered an recipe (or part of) 
--- a/thesis/Chapters/ch3-adaptation.tex
+++ b/thesis/Chapters/ch3-adaptation.tex
@@ -0,0 +1,141 @@
+\chapter{Adaptation}\label{ch:adaptation}
+
+In this chapter, we outline the considerations made during the development of the IoT testbed, \iottbsc. 
+Starting from first principles, we derive the requirements for our testbed and finally establish the scope for \iottbsc.
+The implemented testbed which results from this analysis, the software package \iottb, is discussed in \cref{ch4}.\\
+
+\section{Principal Objectives}\label{sec:principles-and-objectives}
+The stated goal for this bachelor project (see \cref{sec:goal}), is to create a testbed for \iot devices, which automates aspects of the involved workflow,  with the aim of increasing reproducibility, standardization, and compatibility of tools and data across project boundaries.
+We specify two key objectives supporting this goal:
+\begin{enumerate}[label=\textit{Objective \arabic*}]
+    \item \textbf{Automation Recipes:}\label{obj:recipies} The testbed should support specification and repeated execution of important aspects of experiments with IoT devices, such as data collection and analysis (see \citep{fursinckorg2021})
+    \item \textbf{FAIR Data Storage:}\label{obj:fair} The testbed should store data in accordance with the FAIR \citep{go-fair} principles.
+\end{enumerate}
+
+\section{Requirements Analysis}\label{sec:requirements}
+In this section, we present the results of the requirements analysis based on the principal objectives.
+The requirements derived for \ref{obj:recipies} are presented in \cref{table:auto_recipe_requirements}.
+\cref{table:fair_data_storage_requirements} we present requirements based on \ref{obj:fair}.
+
+\begin{table}[H]
+\centering
+\caption{Automation Recipes Requirements}
+\label{table:auto_recipe_requirements}
+\begin{minipage}{\textwidth}
+\begin{enumerate}[label=\textit{R1.\arabic*}]
+    \item \label{req:auto_install_tools} \textbf{Installation of Tools}: Support installation of necessary tools like \textit{mitmproxy} \cite{mitmproxy}, \textit{Wireshark} \cite{wiresharkorg} or \textit{tcpdump} \cite{tcpdump}).
+    
+        \textit{Reasoning:} 
+        There are various tools used for data collection and specifically packet capture.
+        Automating the installation of necessary tools ensures that all required software is available and configured correctly without manual intervention. This reduces the risk of human error during setup and guarantees that the testbed environment is consistently prepared for use. Many platforms, notably most common Linux distributions, come with package managers which provide a simple command-line interface for installing software while automatically handling dependencies. This allows tools to be quickly installed, making it a \textit{lower priority} requirement for \iottbsc.
+
+    \item \label{req:auto_config_start} \textbf{Configuration and Start of Data Collection}: Automate the configuration and start of data collection processes. Specific subtasks include:
+    \begin{enumerate}
+        \item Automate wireless hotspot management on capture device.
+        \item Automatic handling of network capture, including the collection of relevant metadata.
+    \end{enumerate}
+    
+        \textit{Reasoning:} 
+        Data collection is a central step in the experimentation workflow. Configuration is time-consuming and prone to error, suggesting automating this process is useful.As mentioned in \cref{sec:motivation}, current practices lead to incompatible data and difficult to reuse scripts.
+        Automating the configuration and start of data collection processes ensures a standardized approach, reducing the potential for user error 
+        and thereby increasing data compatibility and efficient use of tools. Automating this process must be a central aspect of \iottbsc.
+
+    \item \label{req:auto_data_processing} \textbf{Data Processing}: Automate data processing tasks.
+    
+        \textit{Reasoning:} Some network capture tools produce output in a binary format. To make the data available to other processes, often the data must be transformed in some way.
+        Data processing automation ensures that the collected data is processed uniformly and efficiently, enhancing it reusability and interoperability. Processing steps may include cleaning, transforming, and analyzing the data, which are essential steps to derive meaningful insights. Automated data processing saves time and reduces the potential for human error. It ensures that data handling procedures are consistent, which is crucial for comparing results across different experiments and ensuring the validity of findings.
+    
+    
+    \item \label{req:auto_reproducibility} \textbf{Reproducibility}: Ensure that experiments can be repeated with the same setup and configuration.
+
+        \textit{Reasoning:} A precondition to reproducible scientific results is the ability to run experiments repeatedly with all relevant aspects are set up and configured identically.
+    \item \label{req:auto_execution_control} \textbf{Execution Control}: Provide mechanisms for controlling the execution of automation recipes (e.g., start, stop, status checks).
+    
+        \textit{Reasoning:} Control mechanisms are essential for managing the execution of automated tasks. This includes starting, stopping, and monitoring the status of these tasks to ensure they are completed successfully.
+
+    \item \label{req:auto_error_logging} \textbf{Error Handling and Logging}: Include robust error handling and logging to facilitate debugging to enhance reusability.
+
+        \textit{Reasoning:} Effective error handling and logging improve the robustness and reliability of the testbed.Automation recipes may contain software with incompatible logging mechanisms. 
+        To facilitate development and troubleshooting, a unified and principled logging important for \iottbsc.
+    \item \label{req:auto_documentation} \textbf{Documentation}: Provide clear documentation and examples for creating and running automation recipes.
+\end{enumerate}
+\end{minipage}
+\end{table}
+
+\begin{table}[H]
+\centering
+\caption{FAIR Data Storage Requirements}
+\label{table:fair_data_storage_requirements}
+\begin{minipage}{\textwidth}
+\begin{enumerate}[label=\textit{R2.\arabic*}]
+    \item \label{req:fair_data_meta_inventory} \textbf{Data and Metadata Inventory}: \iottbsc should provide an inventory of data and metadata that typically need to be recorded (e.g., raw traffic, timestamps, device identifiers).
+    
+    \textit{Reasoning:} Providing a comprehensive inventory of data and metadata ensures that data remains findable after collection. Including metadata increases interpretability and gives context necessary for extracting reproducible results.
+    
+    \item \label{req:fair_data_formats} \textbf{Data Formats and Schemas}: Define standardized data formats and schemas.
+    
+        \textit{Reasoning:} Standardized data formats and schemas ensure consistency and interoperability.
+    
+    \item \label{req:fair_file_naming} \textbf{File Naming and Directory Hierarchy}: Establish clear file naming conventions and directory hierarchies. for organized data storage. 
+    
+        \textit{Reasoning:} This enhances findability and accessibility.
+    \item \label{req:fair_preservation} \textbf{Data Preservation Practices}: Implement best practices for data preservation, including recommendations from authoritative sources like the Library of Congress \citep{recommendedformatrsLOC}.
+    
+        \textit{Reasoning:} Implementing best practices for data preservation can mitigate data degradation and ensures integrity of data over time. This ensures long-term accessibility and reusability.
+    \item \label{req:fair_accessibility} \textbf{Accessibility Controls}: Ensure data accessibility with appropriate    permissions and access controls.
+    \item \label{req:fair_interoperability} \textbf{Interoperability Standards}: Use widely supported formats and protocols to facilitate data exchange and interoperability.
+    \item \label{req:fair_reusability} \textbf{Reusability Documentation}: Provide detailed metadata to support data reuse by other researchers.
+\end{enumerate}
+\end{minipage}
+\end{table}
+
+We return to these when we evaluate \iottbsc in \cref{ch:5-eval}.
+
+\section{Scope}\label{sec:scope}
+This section defines the scope of the testbed \iottbsc.
+To guide the implementation of the software component of this bachelor project,  \iottb,
+we focus on a specific set of requirements that align with the scope of a bachelor project. 
+While the identified requirements encompass a broad range of considerations, we have prioritized those that are most critical to achieving the primary objectives of the project.
+
+For this project, we delineate our scope regarding the principal objectives as follows:
+\begin{itemize}
+    \item \ref{obj:recipies}: \iottb focuses on complying with \ref{req:auto_config_start}, \ref{req:auto_reproducibility}.
+    \item \ref{obj:fair}: \iottb ensures FAIR data storage implicitly, with the main focus lying on \ref{req:fair_data_formats}, \ref{req:fair_data_meta_inventory}, \ref{req:fair_file_naming}.
+\end{itemize}
+
+
+\subsection{Model Environment}\label{sec:assumed-setup}
+In this section, we describe the environment model assumed as the basis for conduction \iot device experiments.
+This mainly involves delineating the network topology. Considerations are taken to make this environment, over which the \iottb testbed software has no control, easy reproducible \citep{vacuumpie2023}.\\
+
+We assume that the \iot device generally requires a Wi-Fi connection. 
+This implies that the environment is configured to reliably capture network traffic without disrupting the \iot device's connectivity. This involves setting up a machine with internet access (wired or wireless) and possibly one Wi-Fi card supporting AP mode to act as the \ap for the \iot device under test \citep{surveytestingmethods2022}. 
+Additionally, the setup must enable bridging the IoT-AP network to the internet to ensure  \iot device.\\
+
+Specifically, the assumed setup for network traffic capture includes the following components:
+\begin{enumerate}
+    \item \textbf{IoT Device:} The device under investigation, connected to a network.
+    \item \textbf{Capture Device:} A computer or dedicated hardware device configured to intercept and record network traffic. This is where \iottb runs. 
+    \item \textbf{Wi-Fi \ap:} The \ap through which the \iot device gets network access.
+    \item \textbf{Router/ Internet gateway:} The network must provide internet access.
+    \item \textbf{Switch or software bridge:} At least either a switch or an \os with software bridge support must be available to be able to implement one of the setups described in \cref{fig:cap-setup1} and \cref{fig:cap-setup2}.
+    \item \textbf{Software:} tcpdump is needed for network capture.
+\end{enumerate}
+\newpage
+\begin{figure}[!ht]
+    \centering
+    \includegraphics[width=0.75\linewidth]{Figures/network-setup1.png}
+    \caption{Capture setup with separate Capture Device and AP}
+    \label{fig:cap-setup1}
+\end{figure}
+
+\begin{figure}[!ht]
+    \centering
+    \includegraphics[width=0.75\linewidth]{Figures/setup2.png}
+    \caption{Capture setup where the capture device doubles as the \ap for the \iot device.}
+    \label{fig:cap-setup2}
+\end{figure}
+\newpage
+
+
+
--- a/thesis/Chapters/ch4-iottb.tex
+++ b/thesis/Chapters/ch4-iottb.tex
@@ -0,0 +1,153 @@
+\chapter{Implementation}\label{ch4}
+This chapter discusses the implementation of the IoT device testbed, \iottbsc which is developed using the Python programming language. This choice is motivated by Python's wide availability and the familiarity many users have with it, thus lowering the barrier for extending and modifying the testbed in the future. The testbed is delivered as a Python package and provides the \iottb command with various subcommands. A full command reference can be found at \cref{appendix:cmdref}.\\
+Conceptually, the software implements two separate aspects: data collection and data storage. 
+The \iottbsc database schema is implicitly implemented by \iottb. Users use \iottb mainly to operate on the database or initiate data collection. Since the database schema is transparent to the user during operation, we begin with a brief description of the database layout as a directory hierarchy, before we get into \iottb \cli.
+
+\section{Database Schema}
+The storage for \iottbsc is implemented on top of the file system of the user.
+Since user folder structures provide little standardization, we require a configuration file, while gives \iottb some basic information about the execution environment.
+The testbed is configured in a configuration file in JSON format, following the scheme in \cref{lst:cfg-shema}.
+\verb|DefaultDatabase| is a string which represents the name of the database, which is a directory in \\
+\verb|DefaultDatabasePath| once initialized.
+\iottb assumes these values during execution, unless the user specified otherwise. 
+If the user specifies a different database location as in option in a subcommand, \verb|DatabaseLocations| is consulted.
+\verb|DatabaseLocations| is a mapping from every known database name to the full path of its parent directory in the file system.
+The configuration file is loaded for every invocation of \iottb.
+It provides the minimal operating information.
+Now that we understand
+\begin{listing}[!ht]
+    \inputminted[]{json}{cfg-shema.json}
+    \caption{Schema of the testbed configuration file.}
+    \label{lst:cfg-shema}
+\end{listing}
+\newpage
+\section{High Level Description}
+\iottb is invoked following the schema below. In all cases, a subcommand be specified for anything to happen.
+\iottb is used from the command line and follows the following schema:
+\begin{minted}[fontsize=\small]{bash}
+iottb [<global options>] <subcommand> [<subcommand options>] [<argument(s)>]
+\end{minted}
+\todoRevise{Better listing}
+When \iottb is invoked, it first checks to see if it can find the database directory in the \os users home directory\footnote{Default can be changed}. 
+
+\section{Database Initialization}\label{sec:db-init}
+The IoT testbed database is defined to be a directory named \db. Currently, \iottb creates this directory in the user's home directory (commonly located at the path \texttt{/home/<username>} on Linux systems) the first time any subcommand is used. All data and metadata are placed under this directory. Invoking \verb|iottb init-db| without arguments causes defaults to be loaded from the configuration file. If the file does not exist, it is created with default values following \cref{lst:cfg-shema}. Else, the database is created with the default name or the user-suplied name as a directory in the file system, unless a database under that name is already registered in the \verb|DatabaseLocaions| map. The commands described in the later sections all depend on the existence of a \iottbsc database.
+It is neither possible to add a device nor initiate data collection without an existing database.
+The full command line specification can be found in \cref{cmdref:init-db}.
+Once a database is initialized, devices may be added to that database.
+
+\section{Adding Devices}\label{sec:add-dev}
+Before we capture the traffic of a \iot device, \iottb demands that there exists a dedicated
+directory for it. 
+We add a device to the database by passing a string representing the name of the device to the \addev subcommand.
+This does two things:
+\begin{enumerate}
+    \item A python object is initialized from the class as in \cref{lst:dev-meta-python}
+    \item A directory for the device is created as \verb|<db-path>/<device_canonical_name>|
+    \item A metadata file \verb|device_metadata.json| is created and placed in the newly created directory. This file is in             the JSON format, and follows the schema seen in \cref{lst:dev-meta-python}.
+\end{enumerate}
+
+\begin{listing}[!ht]
+    \inputminted[firstline=12, lastline=29, linenos]{python}{device_metadata.py}
+    \caption{Device Metadata}
+    \label{lst:dev-meta-python}
+\end{listing}
+
+The Device ID is automatically generated using a UUID to be FAIR compliant. \verb|canonical_name| is generated by the \verb|make_canonical_name()| function provided in \cref{lst:dev-canonical}. 
+Fields not supplied to \verb|__init__| in \cref{lst:dev-meta-python} are kept empty. The other fields in  are currently not used by \iottb itself, but provide metadata 
+which can be used during a processing step. Optionally, one can manually create such a file with pre-set values and pass it to the setup.
+For example, say the testbed contains a configuration as can be seen in \cref{lst:appendix:appendixa:config-file}
+
+\begin{listing}[!ht]
+    \inputminted[firstline=1, lastline=8, linenos]{json}{appendixa-after-add-device-dir.txt}
+    \caption{Directory layout after adding device 'default' and 'Roomba'}
+    \label{lst:cfg-file-post-add}
+\end{listing}
+
+If we then add two devices \verb|'iPhone 13 (year 2043)'| and \verb|roomba|, the layout of the database resembles \cref{lst:cfg-db-layout-post-add} and, for instance, the \verb|roomba| devices' will contain the metadata listed in \cref{lst:meta-roomba-post-add}. See \cref{appendixA:add-dev-cfg} for a complete overview.
+
+\begin{listing}[!ht]
+    \lstinputlisting[firstline=11, lastline=16]{appendixa-after-add-device-dir.txt}
+    \caption{Directory layout after adding device 'default' and 'Roomba'}
+    \label{lst:cfg-db-layout-post-add}
+\end{listing}
+
+\begin{listing}[!ht]
+    \lstinputlisting[firstline=39, lastline=55]{appendixa-after-add-device-dir.txt}
+    \caption{Directory layout after adding device 'default' and 'Roomba'}
+    \label{lst:meta-roomba-post-add}
+\end{listing}
+
+\newpage
+\section{Traffic Sniffing}\label{sec:sniff}
+Automated network capture is a key component of \iottb. The standard network capture is provided by the \texttt{sniff} subcommand, which wraps the common traffic capture utility \emph{tcpdump}\citep{tcpdump}. \cref{cmdref:sniff} shows usage of the command.
+
+Unless explicitly allowed by specifying that the command should run in \texttt{unsafe} mode, an IPv4, or MAC address \emph{must} be provided. An IP addresses are only accepted in dot-decimal notation \footnote{e.g., 172.168.1.1} and MAC addresses must specify as six groups of two hexadecimal digits\footnote{e.g., 12:34:56:78:AA:BB}. Failing to provide either results in the capture being aborted. The rationale behind this is simple: they are the only way to identify the traffic of interest. Of course, it is possible to retrieve the IP or MAC after a capture. Still, the merits outweigh the annoyance. The hope is that this makes \iottb easier to use \emph{correctly}. For example, consider the situation, where a student is tasked with performing multiple captures across multiple devices. If the student is not aware of the need of an address for the captured data to be usable, then this policy avoids the headache and frustration of wasted time and unusable data. 
+
+To comply with \ref{req:auto_config_start} and \ref{req:fair_data_meta_inventory}, each capture also stores some metadata in \texttt{capture\_metadata.json}. \cref{lst:cap-meta} shows the metadata files schema.
+
+
+\begin{listing}[!ht]
+\inputminted[firstline=288, lastline=319]{python}{sniff.py}
+\caption{Metadata Stored for sniff command}
+\label{lst:cap-meta}
+\end{listing}
+
+The \texttt{device\_id} is the \uuid \ of the device for which the capture was performed. This ensures the capture metadata remains associated even if files are moved. Furthermore, each capture also gets a \uuid. This \uuid \ is used as the suffix for the PCAP file, and the log files. The exact naming scheme is given in \cref{lst:cap-naming}.
+
+\begin{listing}
+\inputminted[firstline=179, lastline=181]{python}{sniff.py}
+\caption{Naming scheme for files created during capture.}
+\label{lst:cap-naming} 
+\end{listing}
+
+
+
+\section{Working with Metadata}
+The \texttt{meta} subcommand provides a facility for manipulating metadata files. It allows users to get the value of any key in a metadata file as well as introduce new key-value pairs. However, it is not possible to change the value of any key already present in the metadata. This restriction is in place to prevent metadata corruption.
+
+The most crucial value in any metadata file is the \texttt{uuid} of the device or capture the metadata belongs to. Changing the \texttt{uuid} would cause \iottb to mishandle the data, as all references to data associated with that \texttt{uuid} would become invalid. Changeing the any other value might not cause mishandling by \iottb, but they nonetheless represent essential information about the data. Therefore, \iottb does not allow changes to existing keys once they are set.
+
+Future improvements might relax this restriction by implementing stricter checks on which keys can be modified. This would involve defining a strict set of keys that are write-once and then read-only.
+
+\section{Raw Captures}
+The \texttt{raw} subcommand offers a flexible way to run virtually any command wrapped in \iottb. Of course, the intended use is with other capture tools, like \textit{mitmproxy}\citet{mitmproxy}, and not arbitrary shell commands. 
+While some benefits, particularly those related to standardized capture, are diminished, users still retain the advantages of the database.
+
+
+The syntax of the \texttt{raw} subcommand is as follows:
+\begin{minted}{bash}
+iottb raw <device> <command-name> "<command-options-string>" # or
+iottb raw <device> "<string-executable-by-a-shell>" #
+\end{minted}
+
+\iottb does not provide error checking for user-supplied arguments or strings. 
+Users benefit from the fact that captures will be registered in the database, assigned a \texttt{uuid}, and associated with the device. 
+The metadata file of the capture can then be edited manually if needed.
+
+
+\iottb does not provide error checking for user-supplied arguments or strings. 
+Users benefit from the fact that captures will be registered in the database, assigned a \texttt{uuid}, and associated with the device. 
+The metadata file of the capture can then be edited manually if needed.
+
+However, each incorrect or unintended invocation that adheres to the database syntax (i.e., the specified device exists) will create a new capture directory with a metadata file and \texttt{uuid}. Therefore, users are advised to thoroughly test commands beforehand to avoid creating unnecessary clutter.
+
+\section{Integrating user scripts}\label{sec:integrating-user-scripts}
+The \texttt{--pre} and \texttt{--post} options allow users to run any executable before and after any subcommand, respectively. 
+Both options take a string as their argument, which is passed as input to a shell and launched as a subprocess. 
+The rationale for running the process in a shell is that Python's Standard Library process management module, \texttt{subprocess}\footnote{\url{https://docs.python.org/3/library/subprocess.html}}, does not accepts argument to the target subprocess when a single string is passed for execution.
+
+Execution is synchronous: the subcommand does not begin execution until the \texttt{--pre} script finishes, and the \texttt{--post} script only starts executing after the subcommand has completed its execution. \iottb always runs in that order. 
+
+There may be cases where a script provides some type of relevant interaction intended to run in parallel with the capture. Currently, the recommended way to achieve this is to wrap the target executable in a script that forks a process to execute the target script, detaches from it, and returns.
+
+These options are a gateway for more complex environment setups and, in particular, allow users to reuse their scripts, thus lowering the barrier to adopting \iottb.
+
+\section{Extending and Modifying the Testbed}
+One of the key design goals of \iottb is easy extensibility. \iottb uses the Click Library \citep{click} to handle parsing arguments. Adding a new command amounts to no more than writing a function and decorating it according to Click specification.
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%% Figures
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+
--- a/thesis/Chapters/ch5-evaluation.tex
+++ b/thesis/Chapters/ch5-evaluation.tex
@@ -0,0 +1,202 @@
+\chapter{Evaluation}\label{ch:5-eval}
+In this sectioned we evaluate \iottb, paying particular attention to the requirements defined in \cref{sec:requirements}.
+
+\begin{table}[h!]
+\centering
+\begin{tabular}{|c|l|c|}
+\hline
+\textbf{Requirement ID} & \textbf{Description} & \textbf{Status} \\ \hline
+\ref{req:auto_install_tools} & Installation of Tools & Not Met \\ \hline
+\ref{req:auto_config_start} & Configuration and Start of Data Collection & $\downarrow$ \\ \hline
+\ref{req:auto_config_start}a) & Automate Wi-Fi Setup & Not Met \\ \hline
+\ref{req:auto_config_start}b) & Automate Data Capture & Met \\ \hline
+\ref{req:auto_data_processing} & Data Processing & Partially Met \\ \hline
+\ref{req:auto_reproducibility} & Reproducibility & Partially Met \\ \hline
+\ref{req:auto_execution_control} & Execution Control & Not Met \\ \hline
+\ref{req:auto_error_logging} & Error Handling and Logging & Partially Met \\ \hline
+\ref{req:auto_documentation} & Documentation & $\downarrow$ \\ \hline
+\ref{req:auto_documentation}a) & User Manual & Met \\ \hline
+\ref{req:auto_documentation}b) & Developer Docs & Not Met \\ \hline
+\ref{req:fair_data_meta_inventory} & Data and Metadata Inventory & Met \\ \hline
+\ref{req:fair_data_formats} & Data Formats and Schemas & Met \\ \hline
+\ref{req:fair_file_naming} & File Naming and Directory Hierarchy & Met \\ \hline
+\ref{req:fair_preservation} & Data Preservation Practices & Partially Met \\ \hline
+\ref{req:fair_accessibility} & Accessibility Controls & Not Met \\ \hline
+\ref{req:fair_interoperability} & Interoperability Standards & Fully Met \\ \hline
+\ref{req:fair_reusability} & Reusability Documentation & Met \\ \hline
+\end{tabular}
+\caption{Summary of Requirements Evaluation}
+\label{tab:requirements-evaluation}
+\end{table}
+
+\cref{tab:requirements-evaluation} gives an overview of the requirements introduced in \cref{sec:requirements} and our assessment of their status. 
+It is important to note that the status “Met” does not imply that the requirement is implemented to the highest possible standard. 
+Furthermore, this set of requirements itself can (and should) be made more specific and expanded in both detail and scope as the project evolves.
+
+Additionally, \ref{tab:requirements-evaluation} does not provide granularity regarding the status of individual components, which might meet the requirements to varying degrees. 
+For example, while the requirement for data collection automation may be fully met in terms of basic functionality, advanced features such as handling edge cases or optimizing performance might still need improvement. 
+Similarly, the requirement for data storage might be met in terms of basic file organization but could benefit from enhanced data preservation practices.
+
+Thus, the statuses presented in \cref{tab:requirements-evaluation} should be viewed as a general assessment rather ground truth. 
+Future work should aim to refine these requirements and their implementation to ensure that \iottbsc continues to evolve and improve.
+
+To provide a more comprehensive understanding, the following sections offer a detailed evaluation of each requirement. This detailed analysis will discuss how each requirement was addressed, the degree to which it was met, and any specific aspects that may still need improvement. By examining each requirement individually, we can better understand the strengths and limitations of the current implementation and identify areas for future enhancement.
+
+\section{\ref{req:auto_install_tools}: Installation of Tools}
+\textbf{Status: Not Met} \\
+\iottbsc does not install any software or tools by itself. Dependency management for Python packages is handled by installers like PIP, since the Python package declares its dependencies.
+Tcpdump is the only external dependency, and \iottbsc checks if Tcpdump is available on the capture device. If it is not, the user is asked to install it.
+Our position is that generally it is a good idea to not force installation of software and allow users the freedom to choose. The added benefit to the user of a built-in installer seems low. Adding some installer to \iottbsc does not promise great enough improvement in ease-of-use vis-à-vis the higher maintenance cost introduced to maintain such a module.
+For future work, this requirement could be removed.
+
+\section{\ref{req:auto_config_start}: Configuration and Start of Data Collection}
+\textbf{Status: Partially Met} \\
+The testbed automates the configuration and initiation of data collection processes, including wireless hotspot management and network capture. This automation reduces setup time and minimizes errors.
+The testbed automates some aspects of configuring and initializing the data collection process. This project focused on package capture and adjacent tasks. \ref{req:auto_config_start}b can be considered \textit{complete} in that packet capture is fully supported thorough Tcpdump and important metadata is saved. Depending on the setup (see \cref{fig:cap-setup1} and \cref{fig:cap-setup2}) a Wi-Fi hotspot needs to be set up before packet capture is initiated. \iottbsc does not currently implement automated setup and takedown of a hotspot on any platform, so \ref{req:auto_config_start} a is not currently met. There are scripts for Linux systems bundled with the Python package which can be used with the \texttt{--pre} and \texttt{--post} options mentioned in \cref{sec:integrating-user-scripts}. But to consider this task fully automated and supported, this should be built in to \iottbsc itself.
+Furthermore, there are other data collection tools like \textit{mitmproxy}\citep{mitmproxy} or more complicated setup tasks like setting up a routing table to allow for more capture scenarios, which are tedious tasks and lend themselves to automation. Future work should include extending the set of available automation recipes continuously. 
+New task groups/recipe domains should be added as sub-requirements of \ref{req:auto_config_start}.
+We propose the following new sub-requirement
+\begin{itemize}
+    \item \ref{req:auto_config_start}c: Testbed should implement automatic setup of NAT routing for situations where \ap is connection to the capture device and a bridged setup is not supported.
+    \item \ref{req:auto_config_start}d: Testbed should dynamically determine which type of hotspot setup is possible and choose the appropriate automation recipe.
+\end{itemize}
+Extending \ref{req:auto_config_start} means stating which data collection and adjacent recipes are wanted.
+
+\section{\ref{req:auto_data_processing}: Data Processing}
+\textbf{Status: Partially Met} \\
+While the testbed includes some basic data processing capabilities, there is room for improvement. 
+Currently, the only one recipe exists for processing raw data. \iottbsc can extract a CSV file from a PCAP file. The possibilities for automation recipes which support data processing are many.
+Having the data in a more standardized format allows for the creation of more sophisticated feature extraction recipes with application for machine learning. 
+Before they are available, users can still use the \texttt{--post} option with their feature extraction scripts.
+
+\section{\ref{req:auto_reproducibility}: Reproducibility}
+\textbf{Status: Met} \\
+Supported automation can be run with repeatedly, and used options are documented in the capture metadata. This allows others to repeat the process with the same options. 
+So in this respect, this requirement is met. But, the current state can be significantly improved by automating the process of repeating a capture task with the same configuration as previous captures.
+To support this, we propose the following new subrequirement which aids the automated reproduction of past capture workflows
+\begin{itemize}
+    \item \ref{req:auto_reproducibility}a: The testbed should be able to read command options from a file
+    \item \cref{req:auto_reproducibility}b: The testbed should be able to perform a capture based on metadata files of completed captures.
+\end{itemize}
+Taking these requirements promises to seriously increase reproducibility.
+
+\section{\ref{req:auto_execution_control}: Execution Control}
+\textbf{Status: Not Met} \\
+The testbed currently provides no controlled method to interfere with a running recipe. In most cases, \iottb will gracefully end if the user sends the process a SIGINT, but there are no explicit protections against data corruption in this case. Furthermore, during execution, \iottb writes to log files and prints basic information to the users' terminal. Extending this with a type of monitoring mechanism would be good steps toward complying with this requirement in the future.
+
+\section{R1.6: Error Handling and Logging}
+\textbf{Status: Met} \\
+Robust error handling and logging are implemented, ensuring that issues can be diagnosed and resolved effectively. Detailed logs help maintain the integrity of experiments. It is also possible for the user to control how much output is given in the terminal. Here are four examples of the same command, with just increasing degrees of verbosity specified by the user:
+
+\subsection{Logging Example}
+\textbf{Command: } \verb|iottb sniff roomba  --unsafe -c 10 <verbosity>|
+Verbosity can be unspecified, \verb|-v|, \verb|-vv| or \verb|-vvv|
+
+\begin{figure}
+    \centering
+\begin{minted}[breaklines]{bash}
+$ iottb sniff roomba  --unsafe -c 10
+Testbed [I]
+ Using canonical device name roomba
+ Found device at path /home/seb/showcase/roomba
+ Using filter None
+ Files will be placed in /home/seb/showcase/roomba/sniffs/2024-07-01/cap0000-0214
+ Capture has id 62de82ad-3aa2-460e-acd0-546e46377987
+ Capture setup complete!
+ Capture complete. Saved to roomba_62de82ad-3aa2-460e-acd0-546e46377987.pcap
+ tcpdump took 2.16 seconds.
+ Ensuring correct ownership of created files.
+ Saving metadata.
+ END  SNIFF SUBCOMMAND    
+\end{minted}
+    \caption{No verbosity.}
+    \label{fig:example-no-verb}
+\end{figure}
+
+On the first verbosity level, only logger warnings are now printed to the standard output. During normal execution we do not expect significantly more output. This is also true for the second verbosity level.
+\begin{figure}
+    \centering
+    \begin{minted}{bash}
+$ iottb -v|-vv sniff roomba  --unsafe -c 10
+<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
+WARNING - iottb_config - DatabaseLocations are DatabaseLocationMap in the class iottb.models.iottb_config
+\end{minted}
+    \caption{Only \textit{additional} output for \-v or \-vv.}
+    \label{fig:example-one-verb}
+\end{figure}
+
+This changes once we reach the third verbosity level, because now additonally the logger level is set to "INFO".
+Clearly, \cref{fig:example-lvl-three} contains far more output than \cref{fig:example-one-verb}.
+It is possible to get even more output printed to standard output by also passing the \verb|--debug| flag.
+This produces significantly more output as can be seen in \cref{fig:example-debug-output}.
+\begin{figure}
+    \centering
+    \begin{minted}{bash}
+$ iottb -vvv sniff roomba  --unsafe -c 10
+<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
+INFO - main - cli - 48 - Starting execution.
+INFO - iottb_config - __init__ - 24 - Initializing Config object
+WARNING - iottb_config - warn - 21 - DatabaseLocations are DatabaseLocationMap in the class iottb.models.iottb_config
+INFO - iottb_config - load_config - 57 - Loading configuration file
+INFO - iottb_config - load_config - 62 - Config file exists, opening.
+INFO - sniff - validate_sniff - 37 - Validating sniff...
+INFO - sniff - sniff - 91 - sniff command invoked
+INFO - string_processing - make_canonical_name - 20 - Normalizing name roomba
+Testbed [I]
+ Using canonical device name roomba
+ Found device at path /home/seb/showcase/roomba
+INFO - sniff - sniff - 152 - Generic filter None
+ Using filter None
+ Files will be placed in /home/seb/showcase/roomba/sniffs/2024-07-01/cap0003-0309
+ Capture has id f1e92062-4a82-4429-996c-97bd7fa57bec
+INFO - sniff - sniff - 186 - pcap file name is roomba_f1e92062-4a82-4429-996c-97bd7fa57bec.pcap
+INFO - sniff - sniff - 187 - stdout log file is stdout_f1e92062-4a82-4429-996c-97bd7fa57bec.log
+INFO - sniff - sniff - 188 - stderr log file is stderr_f1e92062-4a82-4429-996c-97bd7fa57bec.log
+INFO - sniff - sniff - 246 - tcpdump command: sudo tcpdump -# -n -vvv -c 10 -w /home/seb/showcase/roomba/sniffs/2024-07-01/cap0003-0309/roomba_f1e92062-4a82-4429-996c-97bd7fa57bec.pcap
+ Capture setup complete!
+ Capture complete. Saved to roomba_f1e92062-4a82-4429-996c-97bd7fa57bec.pcap
+ tcpdump took 2.12 seconds.
+ Ensuring correct ownership of created files.
+ Saving metadata.
+ END  SNIFF SUBCOMMAND        
+    \end{minted}
+    \caption{Caption}
+    \label{fig:example-lvl-three}
+\end{figure}
+
+\section{\ref{req:auto_documentation}: Documentation}
+\textbf{Status: Partially Met} \\
+For users, there is a 'Command Line Reference' (see \cref{appendix:cmdref}) which details all important aspects of operating the \iottb \cli. Furthermore, helpful messages are displayed regarding the correct syntax of the commands if an input is malformed. So user documentation does exist and, while certainly can be improved upon, is already helpful. 
+Unfortunately, documentation for developers is currently poor. The codebase is not systematically documented and there is currently no developer's manual. 
+Thoroughly documenting the existing codebase should be considered the most pressing issue and tackled first to improve developer documentation.
+
+\section{\ref{req:fair_data_meta_inventory}: Data and Metadata Inventory}
+\textbf{Status: Fully Met} \\
+The testbed organizes data and metadata in a standardized and principled way. The database is complete with respects to the currently primary and secondary artifact which stem from operating  \iottb itself.
+While complete now, extending \iottb carries the risk of breaking this requirement if not careful attention is given.
+Since the database is a central part of the system as a whole, extensions must ensure that they comply with this requirement before they get built in. 
+
+\section{\ref{req:fair_data_formats}: Data Formats and Schemas}
+\textbf{Status:  Met} \\
+The testbed standardizes directory and file naming. All metadata is in plain test and in the JSON format. This makes them very accessible to both humans and machines. Currently, the only binary format which \iottbsc creates are PCAP files. Luckily, the PCAP format is widely known and not proprietary, and  widely available tools (e.g., Wireshark\citep{wiresharkorg}) exist to inspect them. Furthermore, the data in the PCAP files can be extracted in to the plaintext CSV format, this further improves interoperability. Consistence is currently implicitly handles, that is, there are no strict schemas \footnote{Strict schemas for metadata file briefly were introduced, but then abandoned due to the lack of knowledge surrounding the PYdantic library \citep{pydantic}.} \iottb should generally not corrupt data during operation. But plaintext files are manually editable and can inadvertently be corrupted or made invalid (e.g. accidentally deleting a few digits from a UUID). 
+It is important to keep this in mind when extending \iottbsc and the types of files residing in the database become more heterogeneous.
+
+
+\subsection{\ref{req:fair_file_naming}: File Naming and Directory Hierarchy}
+\textbf{Status: Met} \\
+\iottb currently names all files which it creates according to a well-defined schema. In all cases, the file name is easily legible (e.g., metadata files like \cref{lst:cap-meta}) or the context of where the file resides provides easy orientation to a human reviewer. For instance, raw data files, which currently only are PCAP files, are all named with a \uuid. This is not helpful to the human, but the metadata file, which resides in the same directory, provides all the needed information to be able to understand what is contained within it. Furthermore, these files reside in a directory hierarchy which identifies what devices the traffic belongs to, the date the capture file was created. Finally, capture files reside in a directory which identifies where in the sequence of capture of a given day it was created. 
+Automation recipes expanding the range of data types collected can just follow this convention. This ensures interoperability and findability between various capture methods.
+
+\cref{ch4} \ref{sec:add-dev} already showed examples of the naming convention when adding devices.
+
+
+\section{\cref{req:fair_preservation}: Data Preservation Practices}
+\textbf{Status: Partially Met} \\
+Specific data preservation practices are not taken. \iottb already follows the Library of Congress recommendations on data formats (see \citet{recommendedformatrsLOC}). Most data is stored in plain text, and the binary formats used are widely known within the field and there is no access barrier. 
+To enhance the testbeds' compliance with this requirement, automation recipes which back-up the data to secure locations periodically can be developed. The need for built-in preservation should be balanced with the goal of not introducing dependencies not related to the core aim of automated collection and FAIR storage. One way is just to have a repository of scripts which are not built in to the \iottb executable, but which users can  use and adapt to their  needs\footnote{For instance rsync scripts with predefined filters appropriate for the database.}.
+
+\section{\cref{req:fair_accessibility}: Accessibility Controls}
+\textbf{Status: x} \\
+While the \iottb executable is ware what data it can and cannot access or change, there are currently no  wider access controls implemented. 
+
+
--- a/thesis/Chapters/ch6-conclusion.tex
+++ b/thesis/Chapters/ch6-conclusion.tex
@@ -0,0 +1,7 @@
+\chapter{Conclusion}\label{ch:conclusion}
+\iottbsc is an attempt for at an automation testbed for \iot devices.
+The \iottb package can be considered somewhat feature limited and incomplete for a proper testbed, but it provides a foundation on which to build a more fully fledged system.
+\iottb currently automates the setup and configuration of network packet capture and saves relevant database.
+The testbed uses the file system as a database such that it is also navigable by humans, not just machines.
+Data is stored in a predictably named hierarchy, and files which are produced as a result of operating \iottb are both uniquely identifiable and interpretable for humans. This is achieved by using the file system paths to provide some context, such that file names must only contain minimal information to make it meaningful to humans. Additionally, all created resources are identified by a \uuid which ensures that even if data is accidentally moved, their data is linked at least in principle.
+In summary, \iottbsc is a testbed which takes the first step toward a future where data is FAIR and experiments are reproducible.