Testbed

What is a testbed?
- "[...] wissenschaftliche Plattform für Experimente" german Wikipedia
  - What is a "Platform"?
- Example ORBIT Testbed as wireless network emulator (software I guess) + computing resources. Essence of offered service: Predictable environment. What is tested: Applications and protocols.
- APE "APE testbed is short for Ad hoc Protocol Evaluation testbed." But also "What exaclty is APE": "There is no clear definition of what a testbed is or what it comprises. APE however, can be seen as containing two things:
  - An encapsulated execution environment, or more specifically, a small Linux distribution.
  - Tools for post testrun data analysis."
- DES-Testbed Freie Universität Berlin. Random assortment of sometimes empy(?!) posts to a sort of bulletin board.

IoT Automation Testbed

From Abstract:

In this project, the student de- signs a testbed for the automated analysis of the privacy implications IoT devices, paying particular attention to features that support reproducibility.

From Project description:

To study the privacy and security as- pects of IoT devices systematically and reproducibly , we need an easy-to-use testbed that automates the process of experimenting with IoT devices.

Automation recipes: Automate important aspects of experiments, in particular:

Data Collection
Analysis (= Experiment in most places)

FAIR data storage making data

Findable
Accessible
Interoperable
Reusable

Implications/Open questions

FAIR Data Storage

Who are the stakeholders? What is the scope of "FAIRness"?
1. PersonalDB? --> [X], Tiny scope, \lnot FAIR almost by definition. would only be tool/ suggestion on layout.
2. ProjectDB? --> [X], no, probably a project uses a testbed
3. Research Group --> Focues on F a IR. Accessibility per se not an issue. Findability -> By machine AND Human. Interoperable --> Specs may rely on local/uni/group idiosyncracies.
4. AcademicDB --> (Strict)Subset of 3. Consider field-specific standards. Must start decerning between public/non-public parts of db/testbed. One may unwittingly leak privacy information: Like location, OS of capture host, usernames, absolute file paths etc.See here and pcapng.com under "Metadata Block Types"
5. Public DB --> (Strict) Subset of 4.
Seems like something between 3. and 4. Some type of repository. Full Fledged DB? Probably unnecessary. Mix text + low spec like sqlite? Could still be tracked by git probably.
Interoperability \cap Automation recipes --> Recipes built from and depend only on widly available, platform-independent tools.
Accessibility \cap Autorec --> Built from and only depend on tools which are 1. widly available and (have permissive license OR have equivalent with permissive license). Human: Documentation.
Reusable \cap Autorec --> Modular tools, and accessible (license, etc.) dependencies (e.g. experiment specific scripts).
Findable \cap Autorec--> Must assume that recipe is found and selected manually by researcher.
Interoperable --> Collected Data (Measurements) across different must follow a schema which is meaning full for

Usage paths/ Workflows:

Data Collection --> Deposit in FAIR repository Primary Experiment --> Define Spec. Write script/code --> Access FAIR repo for data. Possibly Access FAIR repo for predefined scripts --> Where do results go. Results "repo" Replication Experiment --> Chose experiment/benchmark script from testbed. --> Execute --> Publish (Produces Replication Result, i.e. same "schema" as primary experiment) Replication Experiment Variant --> Chose experiment/benchmark. add additional processing and input --> run --> posibbly publish How to define static vs dynamic aspect of experiment? Haven't even thought about encryption/decryption specifics....

But also could go like this: First design analysis/experiment --> Collect data --> data cleaned according to testbed scripts --> #TODO Get new device and want to perform some predefined tests --> first need to collect data For some device (unknown if data already exists) want to perform test T --> run script with device spec as input -> Script checks if data already available; If not, perform data collection first -> run analysis on data --> publish results to results/benchmark repo of device; if was new device, open new results branch for that device and publish initial results. Primary Experiment with data collection.

Types of Experiments: "Full Stack": Data Collection + Analysis "Model Test": Data Access (+ Sampling) + Model (Or complete Workflow). Test subject: Model "Replicaton Experiment": secondary data collection + testbed model + quality criteria? Test Subject: Collection scheme + analysis model = result "Exploratory Collection + Analysis": aka unsupervised #TODO Note: #TODO What types of metadata are of interest. Are metadata simple, minimal compute features. Complicated extracted/computed features? Where do we draw the line. #TODO Say for the same devices. When is data merged, when not? I.e. under what conditions can datasets automatically be enlarged? How is this tracked as to not tamper with reproducibility?

Reproducibility:

What are we trying to reproduce? What are possible results from experiments/tests? Types of artifacts: Static: Raw data. Labaled Data. Computational/ Instructive: Supervised Training. Input: Labaled Data + Learning algo. Output: Model. Model Applicability Test: Input: unlabeled data + model. Output: Predication/Label Feature Extraction: (raw, labeled?) data + extraction algo. Output: Labaled Dataset. New Feature Test: labeled data + feature extraction algo + learning algo. Output: Model + Model Verification -> Usability of new features... ( #todo this case exemplifies why we need modularity: we want to apply/compose new "feature extraction algo" e.g. to all those devices where applicable and train new models and verify "goodness" of new features per device/dataset etc.... )

data collection and cleaning (and features):

How uniform is the schema of data we want to collect accross IoT spectrum. Per device? Say two (possibly unrelated) datasets happen to share the same schema, can we just merge them, say, even if one set is from a VR headset and another from a roomba? Is the scheema always the same e.g. (Timestamp, src ip, dst ip, (mac ports? or unused features), payload?, protocols?). If testbed data contains uniform data --> only "one" extraction algo and dataset schema = all relevant features Alternativly, testbed data is heterogeneous --> feature extracts defines interoperability/mergeability of datasets.

Training Algo: Flexible schema, output only usable on data with same schema(?) Model Eval: Schema fixed, eval data must have correct schema

Say a project output is model which retrieves privacy relevant information from the network traffic of IoT device. #TODO how to guaranty applicability to other devices? What are the needs in the aftermath? Apply same model to other data? What of raw data schema match, but incompatible labels?

#todo schema <-> applicable privacy metric matching

7.3 KiB Raw Blame History