2024-07-17 12:31:54 +02:00

448 lines
16 KiB
Typst

#import "/globals.typ": *
//#outline-slide()
= Introduction
== Why are we here?
#slide[
#set align(center)
#grid(align: auto, rows: (13fr, 1fr), gutter: 1pt, inset: 1pt,
[#image("resources/iot-diagram-1.jpg")
#set text(size: 13pt)
#link("https://tse3.mm.bing.net/th?id=OIP.o3AVQNkQCCG_2cmhQzD1zQHaEW&pid=Api"),
#v(5pt)]
)
]
#slide[
#set align(left)
== Project Description
To study the privacy and security aspects of IoT devices
- _systematically_ and
- _reproducibly_,
we need an easy-to-use
- _testbed_
that
- _automates_
#text(size: 0.7em, [(some aspects of)]) the process of experimenting with IoT devices.
#v(5pt)
*In this presentation I describe an implementation of such a testbed:* `IOTTB`
#speaker-note[
- _systematically_: standardization,
- _reproducible_: a systematic approach promises more reproducible experiments, and thus better verifiable results.
- _testbed_: and environment which fixes certain parameters
- _automates_: beyond reproducibility, the level of manual involvement influences feasibility w.r.t. reproduction
]
]
== Principal Objectives
#slide[
#v(5pt)
== Objectives
Key objectives:
+ _Automation recipes_ @fursinckorg2021 for repeated execution of experiments, including data collection and analysis.
+ _FAIR_ data storage (Findable, Accessible, Interoperable, Reusable) (see @faircsartefacts2022, @go-fair and @wilkinson_fair_2016).
]
= Motivation
== Problem(s)
#slide(composer: utils.side-by-side)[
1 Manual setup and configuration of tools
- e.g. `tcpdump`, `Wireshark`, `Frida`
- configurations not interoperable between tools
#pause
2 Ad-hoc decisions
- file/artefact naming
- measured/extracted data features
- metadata recorded
#pause
3 Tailored utilities
- lack interoperability
- require adaptation depending on project
][
#pause
4 Scattered data and lack of standardization
- Inconsistent data collection and storage
- Difficult to maintain compatibility across projects
#pause
5 Onboarding challenges
- New members create ad-hoc solutions
- Perpetuates inefficiency and inconsistency
]
== Challenges Faced
#slide[
- Problems with current approach:
+ Inconsistent data collection
+ Lack of standardized tools and methods
+ Issues with file naming and data structuring
- Resulting difficulties:
+ Compatibility across projects
+ Onboarding new members
+ Ad-hoc solutions perpetuating inefficiency
]
= Background
== IoT Devices
#slide[
#set text(size: 14pt)
#grid(
rows: (4fr, 7fr),
gutter: 3pt,
grid(columns: 4,
[#figure(image("resources/philips-hue.jpg"),caption: [Smart Lighting])<fig:philips-hue>],
[#figure(image("resources/echo-dot.jpeg"), caption: [Smart Speakers])<fig:echo-dot>],
[#figure(image("resources/mi-camera.png", height: 80%), caption: [Home Surveillance Camera])<fig:mi-camera>],
[#figure(image("resources/meta-quest-2.png"), caption: [VR Headset])<fig:meta-quest-2>]),
grid(columns: (2.5fr, 3fr,2.5fr, 3fr),
[#figure(image("resources/dall-e-home-topo-1.jpeg", height: 80%), caption: [Dall-E Diagram of a Smart Home Network])],
grid.cell(colspan: 1, align: top+left, inset: 0.5em, breakable: true, [
#set text(size: 15pt)
#h(12pt)
#v(12pt)
IoT devices offer #alert[benefits]:
- Home lighting control
- Remote video monitoring
- Automated cleaning
#v(-5pt)
and more! But, they becuase
+ Used in Homes
+ Connected
- LAN only
- Internet
- #text(size:0.8em, [May lead to information leakage])
]),
grid.cell(colspan: 1, align: top+left, inset: 1em, breakable: true, [
#set text(size: 15pt)
#h(12pt)
#v(12pt)
#math.arrow.r.double Security and privacy *risks*
- Surveillance potential
- Unauthorized data sharing
- Vulnerable to bugs and security failures]),
[#figure(image("resources/dall-e-home-topo-2.jpeg", height: 80%), caption: [Dall-E Schematic Smart Home Network])]
)
)
]
#slide[
#set align(left)
- *IoT Devices Overview:*
- Devices connected to the internet (voice assistants, smart watches, smart home gadgets)
- Embedded with microprocessors and software
- *Examples of IoT Devices:*
- Security cameras
- Home lighting systems
- Children's toys
- *Importance of IoT:*
- Physical dimension (sensors, controllers)
- Internet connectivity
]
== Testbeds
#slide[
#set align(left)
- *What is a Testbed?*
- Controlled environment for experiments
- Ensures reproducibility and standardization
- *Examples of Testbeds:*
- Industry and Engineering: Platforms for product development
- Natural Sciences: Laboratories (e.g., climate chambers, wind tunnels, see @vaughan2005use)
- Computing: Software testing environments (unit tests, IDEs)
- Interdisciplinary: Complex systems (e.g., smart electric grid testbeds, see @tbsmartgrid2013)
]
== FAIR Data Principles
#slide[
#set align(left)
- *FAIR Data Principles:* @wilkinson_fair_2016, @go-fair
- *Findability:* Data should be easy to find
- *Accessibility:* Data should be accessible under well-defined conditions
- *Interoperability:* Data should be integrated with other data
- *Reusability:* Data should be reusable for future research
- *Purpose:*
- Improve reusability of scientific data
- Guide for designing _data storage_ systems
#speaker-note[
#set text(size: 0.5em)
#grid(columns: 2,[
*Findability:*
- Ensuring data is easily locatable and identifiable.
- Use of persistent identifiers like DOIs.
- Metadata should be richly described to enable precise searching.
- *Positive Example:* A dataset with a DOI and comprehensive metadata that is indexed in major search engines.
- *Negative Example:* A dataset stored on a personal computer with no metadata and no persistent identifier.
*Accessibility:*
- Data should be retrievable by authorized users.
- Use of standardized protocols for data access.
- Clear access conditions and usage licenses.
- *Positive Example:* A dataset available through a well-documented API with clear access guidelines and permissions.
- *Negative Example:* A dataset stored in a proprietary format that requires special software to access, with unclear or restrictive access conditions.
],[
*Interoperability:*
- Data should integrate with other datasets.
- Use of standardized formats and vocabularies.
- Ensure compatibility with existing data and tools.
- *Positive Example:* A dataset in CSV format using standardized column headers that align with other datasets in the field.
- *Negative Example:* A dataset in a non-standard format with custom jargon that is difficult to merge with other data sources.
*Reusability:*
- Data should be well-documented to allow future use.
- Include clear licensing for reuse.
- Ensure data quality and provenance are maintained.
- *Positive Example:* A dataset with a clear Creative Commons license, detailed documentation, and a version history.
- *Negative Example:* A dataset with no documentation, unclear provenance, and no stated reuse policy.
])
]
]
== Network Traffic
#slide[
#set align(left)
- *Importance of Network Traffic in IoT:*
+ Captures communication patterns (device-to-server (internet), device-to-device (LAN, e.g., companion apps))
+ Essential for evaluating performance and identifying unauthorized communications
- *Protocol Analysis:*
+ Understand device operation and communication protocols
+ Identify compatibility, efficiency, and security issues
- *Flow Monitoring:*
+ Detect potential security threats (data breaches, unauthorized access, malware)
+ Monitor for anomalies indicating security incidents or vulnerabilities
- *Information Leakage:*
+ Adversaries can passively observe traffic and extract sensitive information
+ Even encrypted traffic can leak information about the smart environment and users
see @infoexpiot, @iothome2019, @friesssniffing2018, @infoexpiot and @peekaboo2020
#speaker-note[
- Nw traffic important for various reasons for us
- due to data being encrypted in many cases now adays
- most methods boild down to some type of network traffic analysis
]
]
== Findings from Key Studies
#slide[
#set align(left)
*Examples:*\
- *Leakage:* Personal data and device usage patterns. @infoexpiot
- *Details:* The study found that IoT devices often leak personal data and detailed usage patterns to third-party servers.
- *Leakage:* Home device interactions and usage. @iothome2019
- *Details:* This research revealed that interactions with home devices can be intercepted, providing insights into daily routines and activities.
- *Leakage:* Device/Network communication _patterns_.@friesssniffing2018
- *Details:* Sniffing tools can capture communications between IoT devices. WiFi packets expose usage patterns regardless of encryption@peekaboo2020. Those patterns contain features which can be extracted (i.e. leaked) and fed into machine learning models which are capable of exposing more meaningful information (e.g., identifying devices and their functionality) @alyamiwifi2022.
In the end these are all some aspect of the same issue: even encrypted traffic leaks information which can be valuable to adversaries.
#speaker-note[
Examples:
- how many people live in a houshold
- how many devices are in the household
- when which devices are on line
- when, who is home
]
]
== Packet Capture
#slide[
#set align(left)
- *Network Packet Capture:*
+ Intercepting and storing data packets on a network
+ Principal technique for studying device behavior and communication patterns
- *Importance in IoT Security Research:*
+ Main data collection mechanism
+ Essential for analyzing network traffic
//#math.arrow.r.double Wireshark Example
#speaker-note[
- data collection for network traffic
]
]
== Automation Recipes
#slide[
#set align(left)
- *Automation Recipes:*
- Platform agnostic automation
- e.g., install tool y, retrieve dataset x
- Integrate with existing scripts/tools
- Examples in ML
- _Collective Mind Framework:_ @CommonLanguageFacilitate2023, @fursinckorg2021
- Provides reusable recipes for building, running, benchmarking, and optimizing applications
- Platform-independent or supplemented with user-specific scripts
#speaker-note[
- *Importance of Automation:*
- Automates workflows irrespective of underlying tools
- the agnostic part is just the goal
- these recipies must be able to integrate well with existing tools, personal scripts
- Enhances reproducibility and efficiency in experiments
- Underlying data has a standardized (w.r.t. to tooling) format, if tool is available
]
]
== Summary of Key Points
#slide[
#set align(left)
- *Key Issues Identified:*
+ Manual setup and configuration of tools
+ Ad-hoc decisions in file naming, data features, and metadata
+ Tailored utilities lacking interoperability
+ Scattered data and lack of standardization
+ Onboarding challenges for new members
- *Importance of Addressing These Issues:*
+ Improve reproducibility and reliability of experiments
+ Enhance data quality and interoperability
+ Facilitate easier onboarding and collaboration
]
== Return to ...
#slide[
#set align(left)
- *How IOTTB Addresses These Issues:*
+ *Automation Recipes:*
- Standardize the setup and configuration of tools
- Ensure consistent data collection and analysis processes
+ *FAIR Data Storage:*
- Enhance findability, accessibility, interoperability, and reusability of data
- Improve data management and sharing practices
+ *Testbed Design:*
- Provide a controlled environment for reproducible experiments
- Simplify onboarding and collaboration through standardized procedures
]
= #smallcaps[IoTdb]
== Model Environment
#slide(composer: (1fr, 1fr))[
#figure(
image("resources/network-setup1.png"),
caption: [Common capture setup. Separate AP, switch and capturing device.]
)<fig:setup1>
][
#figure(
image("resources/setup2.png"),
caption: [Setup with AP and "Capture Device" on same machine.]
)
]
== The testbed
#slide[
#align(top + center)[_[...] testbed for IoT devices which automates aspects of running experiments._]
#pause
How is this realized?\
#pause
*`iottb`*:
- Python Package
- Defines Data Storage (implicit in behaviour)
- Database is a directory hierarchy in a file system
- DB is a collection of "device"-folders
- Devices in turn hold some metadata and can have subfolders containing capture data #pause
- Defines a metadata schema for devices, as well as captures
- Automates collecting of metadata + data
]
#focus-slide[#align(center+horizon,[DEMO])]
= Outlook
== Evaluation
#slide[
*FAIR*-ness?\
#pause
_Findability_:\
- supported through use of UUIDs, while maintaining human readability.
#speaker-note[Findable
F1. (Meta)data are assigned a globally unique and persistent identifier
F2. Data are described with rich metadata (defined by R1 below)
F3. Metadata clearly and explicitly include the identifier of the data they describe
F4. (Meta)data are registered or indexed in a searchable resourc]
]
#slide[
*FAIR*-ness?\
_Findability_:\
- supported through use of UUIDs, while maintaining human readability.
_Accessibility_:\
- to a degree up to user of testbed
- UUID precondition for data met
- metadata makes sense also without data
#speaker-note[
A1. (Meta)data are retrievable by their identifier using a standardised communications protocol
A1.1 The protocol is open, free, and universally implementable
A1.2 The protocol allows for an authentication and authorisation procedure, where necessary
A2. Metadata are accessible, even when the data are no longer available
]
]
#slide[
*FAIR*-ness?\
_Findability_:\
- supported through use of UUIDs, while maintaining human readability.
_Accessibility_:\
- to a degree up to user of testbed
- UUID precondition for data met
- metadata makes sense also without data
_Interoperability_:\
- Used data formats are common and well known (json, pcap)
- Metadata schema understandable given example
#speaker-note[
1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
I2. (Meta)data use vocabularies that follow FAIR principles
I3. (Meta)data include qualified references to other (meta)data
]
]
#slide[
*FAIR*-ness?\
_Findability_:\
- supported through use of UUIDs, while maintaining human readability.
_Accessibility_:\
- to a degree up to user of testbed
- UUID precondition for data met
- metadata makes sense also without data
_Interoperability_:\
- Used data formats are common and well known (json, pcap)
- Metadata schema understandable given example
_Reusability_:\
- Used formats support this.
- Data capture tool (`iottb`) can be made available
- + rerun with the same configuration
#speaker-note[
R1. (Meta)data are richly described with a plurality of accurate and relevant attributes
R1.1. (Meta)data are released with a clear and accessible data usage license
R1.2. (Meta)data are associated with detailed provenance
R1.3. (Meta)data meet domain-relevant community standard
]
]
#slide[
*Automation Recipes*?\
- `iottb` automates capture
- Metadata should allow repeating experiments
- want: configure capture based on metadata
]
= Questions
= Appendix
#bibliography("presentation-bsc.bib", style: "ieee")
== Images
#slide[
#set text(size: 13pt)
//#show link: underline
#show link: set text(stroke: blue)
*Introduction*#footnote([Images licenced for free share and use to the best of my knowledge.])\
- IoT Network Diagram: #link("https://tse3.mm.bing.net/th?id=OIP.o3AVQNkQCCG_2cmhQzD1zQHaEW&pid=Api")
- @fig:echo-dot: #link("https://i0.wp.com/thegroyne.com/wp-content/uploads/2018/04/Amazon-Echo-Dot-Altavoces-inteligentes-04.jpeg")
- @fig:philips-hue: #link("https://www.multimediaplayer.it/wp-content/uploads/kit-philips-hue.jpg")
- @fig:mi-camera: #link("https://d.otto.de/files/bd42f6e9-ac45-5e1c-8d5f-ac3affcee9d6.pdf")#footnote("Unclear licence")
]