Identifying the true IP/Network identity of I2P service hosts

Identifying the true IP/Network identity of I2P service hosts

Adrian Crenshaw

Final paper is finished, this page was just the proposal. Read the final paper here:
Darknets and hidden servers:Identifying the true IP/network identity of I2P service hosts

Section 1, Introduction:

Abstract:

I2Pⁱ is a distributed Darknet using the mixnet model in some ways similar to Tor, but specializing in providing internal services instead of out-proxying to the general Internet. The name I2P was original short for “Invisible Internet Project”, although it is rarely referred to by this long form anymore. It is meant to act as an overlay network on top of the public Internet to add anonymity and security. The core aim of this paper will be to test the anonymity provided by I2P, focusing primarily on the application layer and mistakes that expose a service provider’s identity or reduce the anonymity setⁱⁱ they are part of.

A short introduction to I2P:

Since the academic community seems to be far more aware of Tor than I2P, it may be helpful to compare the two systems and cover some of the basics concerning how I2P works. Both Tor and I2P use layered cryptography so that intermediates cannot decipher the contents of connections beyond what they need to know to forward the connection on to the next hop in the chain. Rather than focusing on anonymous access to the public Internet, I2P’s core design goal is to allow the anonymous hosting of services (similar in concept to Tor Hidden Services). It does provide proxied access to the public Internet via what are referred to as “out proxies”, as well as various internal services to proxy out onto the Tor and Freenet systems, but that is not its core design goal.

Every I2P node is also generally a router (and you can use the terms somewhat interchangeably when it comes to I2P) so there is not a clear distinction between a server and a mere client like there is with the Tor network. Some I2P nodes do take on more responsibility than others, such as floodfill routers that participate in NetDB. Unlike Tor, I2P does not use centralized directory servers to connect nodes, but instead utilizes a DHT (Distributed Hash Table), based on Kademlia, referred to as NetDB. This distributed system helps to eliminate a single point of failure, and stems off blocking attempts similar to what happened to Tor when China blocked access to the core directory servers on September 25^th 2009ⁱⁱⁱ. I2P’s reliance on a peer to peer system for distributing routing information does open up more avenues for Sybil attacks and rouge peers, but steps have been take to help mitigate this and are covered in the documentation^iv.

Instead of referring to other routers and services by their IP, I2P uses cryptographically identifiers to specify both routers and end point services. For example the identifier for “www.i2p2.i2p”, the project’s main website internal to the I2P network, is:

-KR6qyfPWXoN~F3UzzYSMIsaRy4udcRkHu2Dx9syXSz
UQXQdi2Af1TV2UMH3PpPuNu-GwrqihwmLSkPFg4fv4y
QQY3E10VeQVuI67dn5vlan3NGMsjqxoXTSHHt7C3nX3
szXK90JSoO~tRMDl1xyqtKm94-RpIyNcLXofd0H6b02
683CQIjb-7JiCpDD0zharm6SU54rhdisIUVXpi1xYgg
2pKVpssL~KCp7RAGzpt2rSgz~RHFsecqGBeFwJdiko-
6CYW~tcBcigM8ea57LK7JjCFVhOoYTqgk95AG04-hfe
hnmBtuAFHWklFyFh88x6mS9sbVPvi-am4La0G0jvUJw
9a3wQ67jMr6KWQ~w~bFe~FDqoZqVXl8t88qHPIvXelv
Ww2Y8EMSF5PJhWw~AZfoWOA5VQVYvcmGzZIEKtFGE7b
gQf3rFtJ2FAtig9XXBsoLisHbJgeVb29Ew5E7bkwxvE
e9NYkIqvrKvUAt1i55we0Nkt6xlEdhBqg6xXOyIAAAA

This is the base64 representation of the destination. Obviously having a user type in this 516 byte chuck of date as an Identifier would be somewhat less than use friendly, and it would not be valid in some protocols anyway (HTTP for example). I2P provides some workarounds for naming identifiers; one is called “Base 32 Names”, similar in many ways to Tor’s .onion naming convention. Essential the 516 byte Identifier is decoded (with some character replacements) into its raw value, the value hashed with SHA256, then this hash is base 32 encoded and “.b32.i2p” is concatenated onto the end^v. The results for the “www.i2p2.i2p” identifier shown above would be:

rjxwbsw4zjhv4zsplma6jmf5nr24e4ymvvbycd3swgiinbvg7oga.b32.i2p

This form is much easier to work with. For most eepSite users the most common naming solution is just to use the local I2P address book that maps a simple name like “www.i2p2.i2p” to its much long Base 64 identifier. There is no official DNS like service to do this lookup as that would be a single point of failure that I2P wishes to avoid. Each I2P node has its own series of text files that contain the name mappings in much the same way that the Internet use to use just HOSTS files to translate names to IPs before DNS. There are however naming subscription services inside of I2P that can be synced to if the user wishes, though this means the user is putting some level of trust in these services not to hijack the name mappings.

A router’s ID is not the same as a service’s ID, so even if the service happens to be running on a particular router the two identifiers cannot be easily tied together. I2P also uses a few techniques to help mitigate traffic correlation attacks. While the Tor network uses a single changing path for communications, I2P uses the concept of “in” and “out” tunnels so requests and responses are not necessarily using the same paths for exchanging information. I2P also uses an Onion routing variant referred to as Garlic routing, where more than one message is bundled together into a “clove”. This mixing of messages using Garlic routing can lead to confusion for attackers attempting to correlate transmission sizes and timings, and if “cloves” are composed of messages from both high latency tolerant applications (e.g. email) and low latency applications (e.g. web traffic) correlation could become even harder. More comparisons between I2P, Tor and other anonymity networks can be found on I2P’s “I2P Compared to Other Anonymous Networks” page^vi.

Many services can be hosted inside of the I2P overlay network (IRC, Bittorent, eDonkey, Email, etc.), and the I2P team has provided an API for creating new applications that ride on top of the I2P overlay network. As the developers note on their page, many standard Internet applications are not designed with anonymity in mind, so caution should be taken when adapting an existing application to run on top of I2P. While many applications exist and could be researched for application data leaks, this paper will be concentrating on eepSites which are websites internal to I2P. Some measures are taken by the default I2P install to help filter revealing information at the application level, but service providers do make mistakes that can lead to too much information being revealed.

My primary motivation for this project is to help secure the identity of I2P eepSite service hosts by finding weakness in the implementation of these systems at higher levels that can lead to their real IP or administrator being revealed, or the anonymity set being greatly reduced. Exposing these weaknesses will allow the administrators of I2P eepSite services to avoid these pitfalls when they implement their I2P web applications. A secondary objective would be to allow the identification of certain groups that law enforcement might be interested in locating, specifically pedophiles. These goals are somewhat at odds, since law enforcement could use the knowledge to harass groups I do support, and pedophiles could use the knowledge to help hide themselves, neither of which are goals I would desire, but with privacy matters you sometimes have to take the bad with the good. A tertiary goal would be just to see if I can do it, and what I can learn skill wise along the way. I2P was chosen as my platform since less research has gone into it verses Tor, but many of the same ideas and techniques should be applicable to both systems as they offer similar functionality when it comes to hidden services that are HTTP based. Another feature that makes this research somewhat different is that more work has been done in the past trying to detect users, not providers, of services in a Darknet.

While there are many papers on attacking anonymizing networks, most seem to be pretty esoteric. A few previous papers that may be of use in my research are:

Locating Hidden Servers^vii
Lasse Overlier, Paul Syverson, sp, pp.100-114, 2006 IEEE Symposium on Security and Privacy (S&P'06), 2006

Low-resource routing attacks against anonymous systems^viii
Kevin Bauer, Damon McCoy, Dirk Grunwald, Tadayoshi Kohno, Douglas Sicker

The “Locating Hidden Servers” paper may not be directly applicable as it seems I2P goes to some effort to synchronize times and avoid clock skew problems^ix. A more directly I2P related analysis can be found on the I2P site’s “I2P’s Threat Model^x” and guides to make services more anonymous can be found on “Ugha’s I2P Wiki^xi”. The threat model page points to many more resources and papers on possible attack vectors. More background papers that will be of use during testing are listed in the approach section.

Section 2, Approach:

My main approach will be looking at the application layer and seeing what details about the host are given away. This has already been done in the past against cloaked clients with much success:

Metasploit Decloaking Engine^xii

EFF write-ups on client identification^xiii

Since I’m targeting the identity of servers instead of clients the exact vectors for attack will differ, but there will be some overlap. Many I2P services are hosted on nodes/routers that also act as the owner’s client node so client based attacks may also be fruitful in revealing their identity. People regularly make mistakes in how they configure web servers and applications that cause too much information to be leaked out to an attacker, information that can make finding a vulnerability much easier. This sort of information leakage is regularly mentioned in the OWASP (Open Web App Security Project) Top 10^xiv in one form or another. One of my mantras is “Specific exploits are temporary, bad configuration mistakes are forever”. A few of the techniques I plan to try to reveal identifying information about the host of an eepSite include:

Spidering the content of the eepSite for related sites. This should be made somewhat easier because I can restrict the spidering to just sites ending in .i2p, a pseudo top level domain name commonly used in the I2P network.
Using tools like Nikto to find directories and files that reveal server information. Just because a directory is not linked to does not mean it can’t be found by brute forcing common directory paths.
HTTP headers may be returned by the sites that reveal information about the type of web daemon that is running (IIS/Apache/etc.). By default the I2P install package comes with the Jetty webserver, but this can be changed by the user if they desire different functionality. I imagine this sort of attack won’t lead to outright identification, but may be useful for reducing the anonymity set, especial if the administrator makes the mistake of using the same server instance on an Internet facing site.
Putting bait in logs via the user agent string that may make the administrator of the site visit a tracking page without using an I2P outproxy. This could take the form of a simple XSS (Cross Site Scripting) redirect attack or web bugs^xv embedded in a page.
See if reverse DNS lookups done by the webserver when it generates logs give away its true IP. Some web servers are configured to automatically do a reverse DNS lookup on visiting IPs to find their host name. This may be outside of my ability as I do not control an authoritative DNS server for reverse lookups, but perhaps I can find someone to help with the research that does control such a resource.
I plan to also ask the security and privacy community at large for more ideas, and of course give credit to their contributions. Via my contacts in the community I imagine I can elicit quite a few responses.

I sent an early draft of this proposal to ZZZ (the lead developer of I2P and as the development is done pseudonymously that is the only name I have for him) and he proposed a few additional tracks I should take:

Flesh out some of the attacks listed in the threat model page.

Review the server and client proxy code for flaws.

Look at the Tor change log and see if any bugs were fixed that may still exist in I2P.

Some of the techniques that I plan to test may not be appropriate to do against resources I do not own, so my plan is to put up my own eepSite to do many of the tests. For common web vulnerabilities that could lead to identity discloser I plan to install the Mutillidae^xvi training package that implements the OWASP Top 10 as a test bed.

There will be a few challenges imposed because of the nature of the I2P darknet. I’m sure more challenges will become apparent as I get deeper into the research, but a few I am concerned about going into the project are:

Communications with the eepSites is normal done via an HTTP proxy. This is somewhat more limiting connection wise than using a SOCKS proxy, and way more limiting that having a direct TCP/IP connection. Also, the default HTTP proxy that comes with I2P does not support the “connect” command. While this is stated in the documentation, I encountered it while trying to run an Nmap scan using proxychains, and seeing the following message when I used Wireshark to try to diagnose why my attempts were failing:

<h3>Warning: Non-HTTP Protocol</h3>

The request uses a bad protocol.

The I2P HTTP Proxy supports http:// requests ONLY. Other protocols such as https:// and ftp:// are not allowed.

While this is challenging, I’m fairly confident I can work around the problem. ZZZ tells me that SOCKS and Connect should work if I set up the tunnels for them but so far I have not gotten those two proxy tunnel types to successfully connect to an eepSite.

Perhaps because of point one, many of the tools I have experimented with so far have a tendency to give false results or hang while working on spidering an eepSite. I may have to create some custom spider scripts that compensate for eepSite oddities.
While spidering I need to be careful not to download contraband onto my own system. There is a fair amount of child pornography out on I2P, and laws in the United States are pretty unforgiving on the issue, even if the files were obtained while doing legitimate research. As such I plan to mostly spider for text, which is unfortunate as EXIF data in images hosted on eepSites may be of value in identifying individuals.

I’m hoping that this research will be an improvement over existing work in the following ways:

Clearer examples of how leaked information can be found. It is one thing to say “headers can leak information”, and it’s another to give exact ncat commands to reveal the header information.
A concentration on I2P instead of Tor. The academic world seems to write many papers on the Tor network, but I2P seems to get only a passing mention, if mentioned at all.
A concentration on the application layer instead of the network or transport layers. Since many of the same application layer protocols are used on different anonymity networks, the research will hopefully have a broader use scope.
Real world tests on systems that have been implemented for more than just academic purposes. Some of the papers I’ve read on privacy seem to cover systems that have not seen much real world deployment (Tor being a very notable exception).
Less reliance on esoteric attack vectors. For example, timing attacks are interesting, but I’m not convinced they would be easy to pull off under real world conditions and on in-use systems.

Section 3, Deliverables:

Hopefully this research will lead to tools and techniques that can be used to locate the true identity, IP, or at least reduce the anonymity set of servers hosting I2P services (specifically eepSites). Attack vectors that worked will be outlined in such a way that others can follow them, and those that failed will be analyzed for why they failed and how they could be tweaked to succeed. After these potential problems are found and explained, site administrators can then take steps to mitigate the issues and avoid information and anonymity leaks. The ultimate test will come when others try the techniques I outline and hopefully reproduce my results for their own goals, either to stay more anonymous or to reveal weaknesses in other’s anonymity.

Section 4, Schedule:

Week of Oct 5:
Research deeper into I2P and how it works.
Evaluate web application fingerprinting tools.

Week of Oct 12:
Give project proposal presentation.
Continue work from week one.
Look into developing or modifying existing tools to work better with I2P.

Week of Oct 19:
Run extensive tests with tools to see what information can be found.

Week of Oct 26:
Continue testing tools and collecting data on eepSites. This will continue up until the final draft of the project paper.

Week of Nov 2:
Parse collected data into a format that can be explained to others.

Week of Nov 9:
Work on status report.

Week of Nov 16:
Turn in status report and consider new directions to go.

Week of Nov 23:
Implement changes based on status report feedback.

Week of Nov 30:
Polish draft of final project report so it can be tuned in next week.

Week of Dec 7:
Turn in final project report and begin work on presentation.

Week of Dec 14:
Give final project presentation.

i Full details of how I2P is implemented can be found at:
http://www.i2p2.de

ii An anonymity set is the total number of possible candidates for the identity of an entity. Reducing the anonymity set means that you can narrow down the suspects.

iii More details on China’s blocking of the Tor directory servers can be found at:
https://blog.torproject.org/blog/tor-partially-blocked-china

iv More details on the inner workings of I2P, and it’s mitigation techniques against Sybil attacks and rouge peers can be found in the “Technical Introduction”:
http://www.i2p2.de/techintro.html

v Some things are better explained in source code, which you can find provided here in the Python scripting language:
http://forum.i2p2.de/viewtopic.php?t=4367

viI2P Compared to Other Anonymous Networks
http://www.i2p2.de/how_networkcomparisons

vii “Locating Hidden Servers” paper available at:
http://www.onion-router.net/Publications/locating-hidden-servers.pdf

viii Low-resource routing attacks against anonymous systems:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.133.4562&rep=rep1&type=pdf

ix Clock skews are lightly mentioned here:
http://www.i2p2.de/techintro.html#op.netdb

x I2P’s Threat Model:
http://www.i2p2.de/how_threatmodel

xi Ugha’s Wiki (note that you have to use an I2P proxy to access the site):
http://ugha.i2p/HowTo

xii Metasploit Decloaking Engine code and details are available at:
http://www.decloak.net/

xiii EFF write-ups on client identification, three part article starting with:
https://www.eff.org/deeplinks/2009/09/new-cookie-technologies-harder-see-and-remove-wide

xiv OWASP Top 10
http://www.owasp.org/index.php/Category:OWASP_Top_Ten_Project

xv I give background information and source code for some simple web bugs here:
http://www.irongeek.com/i.php?page=security/webbugs

xvi Mutillidae may be found at the following URL:
http://www.irongeek.com/i.php?page=security/mutillidae-deliberately-vulnerable-php-owasp-top-10