Add pre meeting notes etc.

2024-04-18 02:20:08 +02:00
parent fa48b15fc7
commit 038f6e308b
15 changed files with 22172 additions and 0 deletions
--- a/data/csv-16april24-iphone.csv
+++ b/data/csv-16april24-iphone.csv
--- a/data/iphon-16-04-24-pcap-dump.pcap
+++ b/data/iphon-16-04-24-pcap-dump.pcap
--- a/data/iphone-seb-16-04-24-dump.pcapng
+++ b/data/iphone-seb-16-04-24-dump.pcapng
--- a/data/mi-16april-filtered.csv
+++ b/data/mi-16april-filtered.csv
--- a/data/mi-26-april-24.pcap
+++ b/data/mi-26-april-24.pcap
--- a/data/mi-26-aprl-24.pcapng
+++ b/data/mi-26-aprl-24.pcapng
--- a/notes/journal/09-04-2024-Tue.md
+++ b/notes/journal/09-04-2024-Tue.md
@@ -0,0 +1,7 @@
+New promising setup:
+- Raspberry Pi 5
+- Wired connection to router for internet
+- Can very easily create wifi network and also connect to it (tested form iPhone 13)
+- Can capture on the wifi card while still providing internet access to iPhone
+- Sanity Test: Opening Youtube app on iPhone produces a large flow of QUIC packets, likely from the video that starts autoplaying.
+
--- a/notes/meeting_18_april/IoTdb.md
+++ b/notes/meeting_18_april/IoTdb.md
@@ -0,0 +1,34 @@
+IoT.db/
+├── Device1/
+│   ├── Rawdata/
+│   │   ├── measurement#D1#1/
+│   │   │   ├── capfile
+│   │   │   └── meta
+│   │   └── measurement#D1#2/
+│   │       └── ...
+│   ├── Experiments/
+│   │   ├── exp1#D1/
+│   │   │   └── files etc
+│   │   └── exp2#D1/
+│   │       └── ...
+│   └── Device 1 (Fixed) metadata
+├── Device2/
+│   ├── Rawdata/
+│   │   ├── measurement#D2#1/
+│   │   │   ├── capfile
+│   │   │   └── meta
+│   │   └── ...
+│   ├── Experiments/
+│   │   ├── exp1#d2/
+│   │   │   └── ...
+│   │   └── ...
+│   └── Device 2 fixed metadata
+└── .../
+    ├── .../
+    │   ├── ..
+    │   └── ..
+    ├── .../
+    │   ├── .../
+    │   │   └── ...
+    │   └── ...
+    └── ...
--- a/notes/meeting_18_april/IoTdb.txt
+++ b/notes/meeting_18_april/IoTdb.txt
@@ -0,0 +1,34 @@
+IoT.db/
+├── Device1/
+│   ├── Rawdata/
+│   │   ├── measurement#D1#1/
+│   │   │   ├── capfile
+│   │   │   └── meta
+│   │   └── measurement#D1#2/
+│   │       └── ...
+│   ├── Experiments/
+│   │   ├── exp1#D1/
+│   │   │   └── files etc
+│   │   └── exp2#D1/
+│   │       └── ...
+│   └── Device 1 (Fixed) metadata
+├── Device2/
+│   ├── Rawdata/
+│   │   ├── measurement#D2#1/
+│   │   │   ├── capfile
+│   │   │   └── meta
+│   │   └── ...
+│   ├── Experiments/
+│   │   ├── exp1#d2/
+│   │   │   └── ...
+│   │   └── ...
+│   └── Device 2 fixed metadata
+└── .../
+    ├── .../
+    │   ├── ..
+    │   └── ..
+    ├── .../
+    │   ├── .../
+    │   │   └── ...
+    │   └── ...
+    └── ...
--- a/notes/meeting_18_april/IoTdb2_3.txt
+++ b/notes/meeting_18_april/IoTdb2_3.txt
@@ -0,0 +1,51 @@
+Reasoning is that experiments might want data from measurements of multiple 
+devices.
+IoT.db2/
+├── Devices/
+│   ├── Dev1/
+│   │   ├── devmeta
+│   │   └── Measurements/
+│   │       ├── m1/
+│   │       │   ├── raw
+│   │       │   ├── meta
+│   │       │   └── spec
+│   │       └── m2/
+│   │           └── ...
+│   ├── Dev2/
+│   │   ├── devmeta
+│   │   └── Measurements/
+│   │       ├── m1/
+│   │       │   ├── raw
+│   │       │   ├── meta
+│   │       │   └── spec
+│   │       ├── m2/
+│   │       │   └── ...
+│   │       ├── m3/
+│   │       │   └── ...
+│   │       └── ...
+│   └── Dev3/
+│       └── ....
+└── Experiments/(Or projects? Or cleaned data)
+    ├── E1/
+    │   ├── involved measurements
+    │   ├── filters/ feature extraction algo etc.
+    │   └── etcetc...
+    ├── E2/
+    │   ├── .....
+    │   ├── ..
+    │   ├── ...
+    │   └── ..
+    └── ....
+IoT.db3/
+├── Measurements/
+│   ├── m1/
+│   │   ├── follows from above
+│   │   └── ...
+│   ├── m2
+│   └── ....
+└── Experiments/
+    ├── e1/
+    │   ├── follows from above
+    │   └── ...
+    ├── e2
+    └── ...
--- a/notes/meeting_18_april/IoTdb4.txt
+++ b/notes/meeting_18_april/IoTdb4.txt
@@ -0,0 +1,15 @@
+Like IoTdb but has no opinion on experiments
+IoT.db4/
+├── Dev1/
+│   ├── Measurements (basically raw data)/
+│   │   ├── m1/
+│   │   │   └── ....
+│   │   └── m2/
+│   │       └── ....
+│   └── Cleaned?/Features extracted?/Merged?/
+│       └── -- Where to put clean data?
+├── Dev2/
+│   └── Measurements/
+│       └── ...
+└── Algos/Scripts?/
+    └── ..
--- a/notes/meeting_18_april/further
+++ b/notes/meeting_18_april/further
@@ -0,0 +1,92 @@
+# Testbed
+- What is a testbed?
+	- "[...] wissenschaftliche Plattform für Experimente" german [Wikipedia](https://de.wikipedia.org/wiki/Testbed)
+		- What is a "Platform"?
+	- Example [ORBIT](https://www.orbit-lab.org/) Testbed as wireless network emulator (software I guess) + computing resources. Essence of offered service: Predictable environment. What is tested: Applications and protocols.
+	- [APE](https://apetestbed.sourceforge.net/) "APE testbed is short for **Ad hoc Protocol Evaluation testbed**." But also ["What exaclty is APE"](https://apetestbed.sourceforge.net/#What_exactly_is_APE): "There is no clear definition of what a testbed is or what it comprises. APE however, can be seen as containing two things:
+		- An encapsulated execution environment, or more specifically, a small Linux distribution.
+		- Tools for post testrun data analysis."
+	- [DES-Testbed](https://www.des-testbed.net) Freie Universität Berlin. Random assortment of sometimes empy(?!) posts to a sort of bulletin board.
+## IoT Automation Testbed
+#### From Abstract:
+In this project, the student de-
+signs a testbed for the **automated analysis** of the **privacy implications** IoT devices, paying particular
+attention to features that support reproducibility.
+#### From Project description:
+To study the privacy and security as-
+pects of IoT devices **_systematically_** and **_reproducibly_** , we need an easy-to-use testbed that _automates_ the
+**_process of experimenting_** with **_IoT devices_**.
+
+**Automation recipes**:
+Automate important aspects of experiments, in particular:
+- Data Collection
+- Analysis (= Experiment in most places)
+
+**FAIR data storage**
+making data
+- Findable
+- Accessible
+- Interoperable
+- Reusable
+### Implications/Open questions
+#### FAIR Data Storage
+1. Who are the stakeholders? What is the scope of "FAIRness"?
+	1. PersonalDB? --> [X], Tiny scope, $\lnot$ FAIR almost by definition. would only be tool/ suggestion on layout. 
+	2. ProjectDB? --> [X], no, probably a project _uses_ a testbed
+	3. Research Group --> Focues on **F a IR**. Accessibility _per se_ not an issue. Findability -> By machine AND Human. Interoperable --> Specs may rely on local/uni/group idiosyncracies.
+	4. AcademicDB --> (Strict)Subset of 3. Consider field-specific standards. Must start decerning between public/non-public parts of db/testbed. One may unwittingly leak privacy information: Like location, OS of capture host, usernames, absolute file paths etc.See [here](https://www.netresec.com/?page=Blog&month=2013-02&post=Forensics-of-Chinese-MITM-on-GitHub) and [pcapng.com](https://pcapng.com/) under "Metadata Block Types"
+	5. Public DB --> (Strict) Subset of 4. 
+2. Seems like something between 3. and 4. Some type of repository. Full Fledged DB? Probably unnecessary. Mix text + low spec like sqlite? Could still be tracked by git probably. 
+3. Interoperability $\cap$ Automation recipes --> Recipes built from and depend only on widly available, platform-independent tools.
+4. Accessibility $\cap$ Autorec --> Built from and only depend on tools which are 1. widly available and (have permissive license OR have equivalent with permissive license). Human: Documentation.
+5. Reusable $\cap$ Autorec --> Modular tools, and accessible (license, etc.) dependencies (e.g. experiment specific scripts). 
+6. Findable $\cap$ Autorec--> Must assume that recipe is found and selected manually by researcher. 
+7. Interoperable --> Collected Data (Measurements) across different must follow a schema which is meaning full for 
+#### Usage paths/ Workflows:
+Data Collection --> Deposit in FAIR repository
+Primary Experiment --> Define Spec. Write script/code --> Access FAIR repo for data. Possibly Access FAIR repo for predefined scripts --> Where do results go. Results "repo"
+Replication Experiment --> Chose experiment/benchmark script from testbed. --> Execute --> Publish (Produces Replication Result, i.e. same "schema" as primary experiment)
+Replication Experiment Variant --> Chose experiment/benchmark. add additional processing and input --> run --> posibbly publish
+How to define static vs dynamic aspect of experiment?
+Haven't even thought about encryption/decryption specifics....
+
+But also could go like this:
+First design analysis/experiment --> Collect data --> data cleaned according to testbed scripts --> #TODO
+Get new device and want to perform some predefined tests --> first need to collect data
+For _some_ device (unknown if data already exists) want to perform test _T_ --> run script with device spec as input -> Script checks if data already available; If not, perform data collection first -> run analysis on data --> publish results to results/benchmark repo of device; if was new device, open new results branch for that device and publish initial results. _Primary Experiment_ with data collection.
+
+Types of Experiments:
+ "Full Stack": Data Collection + Analysis
+ "Model Test": Data Access (+ Sampling) + Model (Or complete Workflow). Test subject: Model
+ "Replicaton Experiment": _secondary_ data collection + testbed model + quality criteria? Test Subject: Collection scheme + analysis model = result
+ "Exploratory Collection + Analysis": aka unsupervised #TODO
+  **Note**: 
+#TODO What types of metadata are of interest. Are metadata simple, minimal compute features. Complicated extracted/computed features? Where do we draw the line. 
+#TODO Say for the same devices. When is data merged, when not? I.e. under what conditions can datasets automatically be enlarged? How is this tracked as to not tamper with reproducibility?
+
+### Reproducibility:
+What are we trying to reproduce?
+What are possible results from experiments/tests?
+Types of artifacts:
+Static:
+Raw data.
+Labaled Data.
+Computational/ Instructive:
+Supervised Training. Input: Labaled Data + Learning algo. Output: Model. 
+Model Applicability Test: Input: unlabeled data + model. Output: Predication/Label
+Feature Extraction: (raw, labeled?) data + extraction algo. Output: Labaled Dataset.
+New Feature Test: labeled data + feature extraction algo + learning algo. Output: Model + Model Verification -> Usability of new features... ( #todo this case exemplifies why we need modularity: we want to apply/compose new "feature extraction algo" e.g. to all those devices where applicable and train new models and verify "goodness" of new features per device/dataset etc.... )
+
+### data collection and cleaning (and features):
+How uniform is the schema of data we want to collect accross IoT spectrum. Per device? Say two (possibly unrelated) datasets happen to share the same schema, can we just merge them, say, even if one set is from a VR headset and another from a roomba? 
+Is the scheema always the same e.g. (Timestamp, src ip, dst ip, (mac ports? or unused features), payload?, protocols?). 
+If testbed data contains uniform data --> only "one" extraction algo and dataset schema = all relevant features
+Alternativly, testbed data is heterogeneous --> feature extracts defines interoperability/mergeability of datasets. 
+
+Training Algo: Flexible schema, output only usable on data with same schema(?)
+Model Eval: Schema fixed, eval data must have correct schema
+
+Say a project output is model which retrieves privacy relevant information from the network traffic of IoT device. #TODO how to guaranty applicability to other devices? What are the needs in the aftermath? Apply same model to other data? What of raw data schema match, but incompatible labels? 
+
+#todo schema <-> applicable privacy metric matching
+
--- a/notes/testbed/data
+++ b/notes/testbed/data
--- a/notes/testbed/scope.md
+++ b/notes/testbed/scope.md
@@ -0,0 +1 @@
+What is scope if testbed as a system?
--- a/notes/testbed/testbed
+++ b/notes/testbed/testbed
@@ -0,0 +1,14 @@
+FAIR data + privacy metric evaluation algos.
+Dataset offers some schema.
+Privacy metric requires some set of features.
+**Case 1**
+dataset schema = required features by privacy metric
+**Case 2**
+dataset schema $\subset$ required features ->
+_2.1:_ feature extraction algo(s) exists, which can computed missing features
+_2.2:_ missing features CANNOT be computed form available schema/data
+**Case 3**
+dataset schema $\supset$ required features -> project schema down into relevant feature space/ leave out uneeded data
+**Case 4**
+Unknown relationship -> further investigaton needed 
+Is this realistic case?