# Testbed - What is a testbed? - "[...] wissenschaftliche Plattform für Experimente" german [Wikipedia](https://de.wikipedia.org/wiki/Testbed) - What is a "Platform"? - Example [ORBIT](https://www.orbit-lab.org/) Testbed as wireless network emulator (software I guess) + computing resources. Essence of offered service: Predictable environment. What is tested: Applications and protocols. - [APE](https://apetestbed.sourceforge.net/) "APE testbed is short for **Ad hoc Protocol Evaluation testbed**." But also ["What exaclty is APE"](https://apetestbed.sourceforge.net/#What_exactly_is_APE): "There is no clear definition of what a testbed is or what it comprises. APE however, can be seen as containing two things: - An encapsulated execution environment, or more specifically, a small Linux distribution. - Tools for post testrun data analysis." - [DES-Testbed](https://www.des-testbed.net) Freie Universität Berlin. Random assortment of sometimes empy(?!) posts to a sort of bulletin board. ## IoT Automation Testbed #### From Abstract: In this project, the student de- signs a testbed for the **automated analysis** of the **privacy implications** IoT devices, paying particular attention to features that support reproducibility. #### From Project description: To study the privacy and security as- pects of IoT devices **_systematically_** and **_reproducibly_** , we need an easy-to-use testbed that _automates_ the **_process of experimenting_** with **_IoT devices_**. **Automation recipes**: Automate important aspects of experiments, in particular: - Data Collection - Analysis (= Experiment in most places) **FAIR data storage** making data - Findable - Accessible - Interoperable - Reusable ### Implications/Open questions #### FAIR Data Storage 1. Who are the stakeholders? What is the scope of "FAIRness"? 1. PersonalDB? --> [X], Tiny scope, $\lnot$ FAIR almost by definition. would only be tool/ suggestion on layout. 2. ProjectDB? --> [X], no, probably a project _uses_ a testbed 3. Research Group --> Focues on **F a IR**. Accessibility _per se_ not an issue. Findability -> By machine AND Human. Interoperable --> Specs may rely on local/uni/group idiosyncracies. 4. AcademicDB --> (Strict)Subset of 3. Consider field-specific standards. Must start decerning between public/non-public parts of db/testbed. One may unwittingly leak privacy information: Like location, OS of capture host, usernames, absolute file paths etc.See [here](https://www.netresec.com/?page=Blog&month=2013-02&post=Forensics-of-Chinese-MITM-on-GitHub) and [pcapng.com](https://pcapng.com/) under "Metadata Block Types" 5. Public DB --> (Strict) Subset of 4. 2. Seems like something between 3. and 4. Some type of repository. Full Fledged DB? Probably unnecessary. Mix text + low spec like sqlite? Could still be tracked by git probably. 3. Interoperability $\cap$ Automation recipes --> Recipes built from and depend only on widly available, platform-independent tools. 4. Accessibility $\cap$ Autorec --> Built from and only depend on tools which are 1. widly available and (have permissive license OR have equivalent with permissive license). Human: Documentation. 5. Reusable $\cap$ Autorec --> Modular tools, and accessible (license, etc.) dependencies (e.g. experiment specific scripts). 6. Findable $\cap$ Autorec--> Must assume that recipe is found and selected manually by researcher. 7. Interoperable --> Collected Data (Measurements) across different must follow a schema which is meaning full for #### Usage paths/ Workflows: Data Collection --> Deposit in FAIR repository Primary Experiment --> Define Spec. Write script/code --> Access FAIR repo for data. Possibly Access FAIR repo for predefined scripts --> Where do results go. Results "repo" Replication Experiment --> Chose experiment/benchmark script from testbed. --> Execute --> Publish (Produces Replication Result, i.e. same "schema" as primary experiment) Replication Experiment Variant --> Chose experiment/benchmark. add additional processing and input --> run --> posibbly publish How to define static vs dynamic aspect of experiment? Haven't even thought about encryption/decryption specifics.... But also could go like this: First design analysis/experiment --> Collect data --> data cleaned according to testbed scripts --> #TODO Get new device and want to perform some predefined tests --> first need to collect data For _some_ device (unknown if data already exists) want to perform test _T_ --> run script with device spec as input -> Script checks if data already available; If not, perform data collection first -> run analysis on data --> publish results to results/benchmark repo of device; if was new device, open new results branch for that device and publish initial results. _Primary Experiment_ with data collection. Types of Experiments: "Full Stack": Data Collection + Analysis "Model Test": Data Access (+ Sampling) + Model (Or complete Workflow). Test subject: Model "Replicaton Experiment": _secondary_ data collection + testbed model + quality criteria? Test Subject: Collection scheme + analysis model = result "Exploratory Collection + Analysis": aka unsupervised #TODO **Note**: #TODO What types of metadata are of interest. Are metadata simple, minimal compute features. Complicated extracted/computed features? Where do we draw the line. #TODO Say for the same devices. When is data merged, when not? I.e. under what conditions can datasets automatically be enlarged? How is this tracked as to not tamper with reproducibility? ### Reproducibility: What are we trying to reproduce? What are possible results from experiments/tests? Types of artifacts: Static: Raw data. Labaled Data. Computational/ Instructive: Supervised Training. Input: Labaled Data + Learning algo. Output: Model. Model Applicability Test: Input: unlabeled data + model. Output: Predication/Label Feature Extraction: (raw, labeled?) data + extraction algo. Output: Labaled Dataset. New Feature Test: labeled data + feature extraction algo + learning algo. Output: Model + Model Verification -> Usability of new features... ( #todo this case exemplifies why we need modularity: we want to apply/compose new "feature extraction algo" e.g. to all those devices where applicable and train new models and verify "goodness" of new features per device/dataset etc.... ) ### data collection and cleaning (and features): How uniform is the schema of data we want to collect accross IoT spectrum. Per device? Say two (possibly unrelated) datasets happen to share the same schema, can we just merge them, say, even if one set is from a VR headset and another from a roomba? Is the scheema always the same e.g. (Timestamp, src ip, dst ip, (mac ports? or unused features), payload?, protocols?). If testbed data contains uniform data --> only "one" extraction algo and dataset schema = all relevant features Alternativly, testbed data is heterogeneous --> feature extracts defines interoperability/mergeability of datasets. Training Algo: Flexible schema, output only usable on data with same schema(?) Model Eval: Schema fixed, eval data must have correct schema Say a project output is model which retrieves privacy relevant information from the network traffic of IoT device. #TODO how to guaranty applicability to other devices? What are the needs in the aftermath? Apply same model to other data? What of raw data schema match, but incompatible labels? #todo schema <-> applicable privacy metric matching