| 
 | |||||||||
| 
 | |||||||||
| 
Search Irongeek.com:   
 Help Irongeek.com pay for bandwidth and research equipment: | 
 
 
Proposal for 
“Out of Character: Use of Punycode and Homoglyph Attacks to Obfuscate URLs for 
Phishing” 
Adrian 
Crenshaw 
Full article now posted here:
http://www.irongeek.com/i.php?page=security/out-of-character-use-of-punycode-and-homoglyph-attacks-to-obfuscate-urls-for-phishing
 Below is a 
project I'm doing for class. If you want to make suggestions and tell me about 
weird Unicode/Homoglyph security issues,
please email me. If you 
want to play with making homographs, look at my
Homoglyph 
Attack Generator. 
Introduction 
            
One of the key components users leverage to tell if a URL is part of a phishing 
attack is to compare the host and domain name to their expectations for the 
legitimate site. For example, an email asking users to summit bank information 
to a website with the domain name adrianshouseofpwnage.com is not as likely to 
receive submissions as a website that was hosted under a more reasonable 
sounding name. There are many common techniques used currently and in the past 
to make links look more legitimate. One would be to have the link text say one 
thing, but to have to anchor actually point elsewhere, for example: 
<a href=”http://irongeek.com”>http://www.microsoft.com</a> 
The above is 
mitigated in many mail services by having the actual link printed out next to 
the linking text if they differ. Another technique is to confuse the users by 
modifying the URL to have a valid sounding name in the credentials part of the 
URL, but the actual host name in the trailing part: 
http://www.microsoft.com@irongeek.com 
Some modern 
browser mitigate this by either popping up a warning (Firefox) or just refusing 
to see this as a valid URL (Internet Explorer). There are many more techniques 
that can be used to obfuscate a URL however. The technique this paper will focus 
on is the use of Punycode and homoglyphs.  
            
Normally, DNS labels (the parts separated by dots) have to be in the ASCII 
subset of just letters, digits and the hyphen (sometimes called the LDH rule). 
Also, a label cannot start or end with a hyphen, and is case insensitive. This 
limited set of characters causes a problem if someone wants to use a character 
in a DNS label that is not part of the LDH set.  
Punycode, or 
more formally the Internationalized Domain Names in Applications (IDNA) 
framework as it is used on the Internet, was designed as a way to map characters 
that would normally be invalid in DNS host names to valid characters. In this 
way, domain and host names can be created using characters from a user’s native 
language, but still have them translated into something the DNS system can use 
(assuming the application supports decoding IDNA). Examples can be as simple as 
characters with accents such as “café.com” (which browsers that support the IDNA 
specification will translate to “xn--caf-dma.com”) to more complex ones where 
even the top level domain name is not in ASCII, such as “http://北京大学.中國” 
(which converts to http://xn--1lq90ic7fzpc.xn--fiqz9s). Explaining the IDNA 
algorithm and how it maps to Unicode symbols is beyond the scope of this 
proposal paper, and all an attacker need do is use one of the many online 
generators to create a valid IDNA label. For more details on how the system 
works, see RFC 3492. CITATION ACo03 \l 1033  [1] 
The second 
facet of this attack is homoglyphs. A homoglyphs is a symbol that appears to be 
the same or very similar to another symbol. An example most would be familiar 
with is the letter O and the number 0. Depending on the font used they may be 
hard to distinguish from each other. The letters l (lower case L) and I 
(uppercase i) are another common example. Where it becomes even more interesting 
are the places in Unicode where very similar characters exist from different 
languages. Languages that use characters which look similar to the normal Latin 
alphabet with diacritic accents, letter-like symbols and other useable 
homoglyphs pop up with great regularity, some seeming to be almost exact 
duplicates of the same symbol. Cyrillic script is a common example, possessing 
very close homoglyphs for a, c, e, o, p, x and y. Even the Latin alphabet 
appears twice in Unicode. The characters: 
!"$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ 
are 
represented in both the 0021-007E (Basic Latin) and the FF01-FF5E (Full width 
Latin) ranges of Unicode. This means changing from one encoding for a given 
Latin character to the other is as easy as adding the decimal values 65248 to 
the lower range versions. Depending on the font used mixing character families 
this way may cause a “Ransom Note” like visual effect. 
While the 
intended purpose of IDNA is to allow for internationalized DNS labels it can 
also be used to make a URL or host name appear more legitimate than it really 
is. Because the Unicode representation may cause visual confusion for a user, it 
could cause trust where there should be none. For example:  
http://www.microsoft.com⁄index.html.irongeek.com 
may look like 
a legitimate Microsoft URL, but on closer inspection it redirects to a site that 
the author controls. This is because the third slash symbol is not really a 
slash symbol at all. The real DNS entry is: 
microsoft.xn--comindex-g03d.html.irongeek.com 
More 
obfuscated DNS names could be created by choosing something less obvious than 
Irongeek.com, or my having the Punycode be in the domain name itself. How could 
an attacker leverage this? It should be noted that two resources in the 
bibliography helped greatly to inspire this project.  CITATION Jon10 \l 1033 [2] 
CITATION Mic11 \l 1033  [3] 
Approach 
            
The approach we plan to take is fairly simple. The plan will be to generate many 
potential attack URLs and then test the following: 
1.   
How different 
browsers show the Punycode in the URL bar. 
2.   
How different 
mail systems show the URL when email is displayed. 
3.   
How social 
networks render the URL. 
Some of these 
IDNA DNS names will be tested using a domain we control (irongeek.com), while 
others will be tested using the local hosts file in lieu of making real DNS 
entries. Buying many domain names could become expensive, and the local hosts 
file services most of the proposed tests adequately (other than testing the 
policies of registrars). We intend to cover mitigations that are already in use 
to quell these sorts of attacks as well as what mitigations might be possible. 
Tools may also be developed to help generate the attack URLs and released to the 
pen-test community. These features could be a usefully addition to the 
Social-Engineering Toolkit (SET) and other projects. CITATION Dav \l 1033  [4] 
Schedule 
Weeks 1-2: 
Work on understanding Unicode, IDNA and Punycode.  
Weeks 3-4: 
Look at other research on the topic of using this attack vector. 
Weeks 5-6: 
Hands on tests of using IDNA to generate potential attack URLs. 
Weeks 7-8: 
Test how different browsers, mail systems and social networks render the attack 
URLs. 
Week 9: 
Finish paper and presentation.  
  
References: 
 BIBLIOGRAPHY  \l 1033  [1] A. Costello. (2003, March) IETF RFC 3492. 
		[Online].
		
		http://www.ietf.org/rfc/rfc3492.txt Jonathan Abolins. (2010, December) When 
		Domain Names Look Like Spaghetti (or Whatever) Internationalized Domain 
		Names & Investigations in the Networked World. [Online].
		
		http://www.irongeek.com/i.php?page=videos/dojocon-2010-videos#Internationalized%20Domain%20Names%20&%20Investigations%20in%20the%20Networked%20World Michal Zalewski, The Tangled Web: A Guide 
		to Securing Modern Web Applications, 1st ed.: No Starch Press, 2011. Dave Kennedy. Social-Engineering Toolkit. 
		[Online].
		
		http://www.secmaniac.com/download/ 
 
  
15 most recent posts on Irongeek.com: 
 | ||||||||
If you would like to republish one of the articles from this site on your
webpage or print journal please contact IronGeek.
Copyright 2020, IronGeek
Louisville / Kentuckiana Information Security Enthusiast