Thursday, June 12, 2014

Why Those Strings?

So I was looking back at some of the older malware on my backup drive and came across a bunch of malware I got from contagio a while back.  Like anything you find that you haven't looked at in a while, I stopped to see what I had, and noticed that I had started looking at some of them, but there weren't any notes.  So I decided to grab one of the smaller ones and just have a look.

I decided on the "SWORD" malware sample from the APT1 section.  This was described as a single-sourced remote command utility, so I figured it would be a good little one to start the blog off with.  Initially I just ran strings across it to see if there was anything there of interest.  As strings is basically the lowest hanging fruit in malware analysis, I didn't expect to find anything that I would have been able to take to responders and say "You'll need to look for this."  If only.  What I was really looking for were any strings that weren't explicitly helpful (i.e., appear random or gibbereish) but could potentially be used by the binary to do something interesting.  Here are the strings of interest that I found in the binary.

Random Strings and Misspellings
I always like to pick out the strings with misspellings, misused words, or 133t speak, they can come in handy.  The other strings could be useful in creating a yara signature for this, if we can determine what their use is in the binary.  So, I fire up IDA to see what is actually going on with these strings.  The first call in the program that isn't to a Windows API points us to the following section:

Random Strings Seen Used
Okay, so they at least two of them are being used in the code.  Looking at the pattern of the first string that we find in the code, we can see that it is the keys of the keyboard in the following order:

  1. Start at the numeric row, and press every key from right to left
  2. Retype the numeric row, this time holding the Shift key
  3. Repeat until the last row is typed, skipping the control keys
  4. Remove the &"\{}
We will call this the "keyboard" string.  The second is "the quick brown fox jumps over the lazy dog" with the vowels (except y) removed after the word 'brown'.  We will call this the "fox" string.  This is not necessarily important, but the eyes are prone to determining patterns.  So we know that these are being used in the program, and that there is some logic surrounding them.  Checking through the rest of the program doesn't show the last strings being used.  However, running it in a debugger shows that string being passed as to the same function as the other two.

There is Another String!
Right.  We'll call this one String 3.  So, we have three strings in this section, and some logic around them.  What is that logic doing?  Well, initially it is just reading the values into memory locations, but the interesting part of what these are doing comes around this code segment.

The Interesting Bit
We see three separate strchr calls, and they all sit within a loop.  This tells us two things.  One, the program was written in something that can use native C strings (such as C or C++), and that this is likely pulling something out of each string in a formulaic method to create the actual data that it needs.  In debugging, we see that ecx being pushed the first two times is 0x75, which is the 'u' character, which is the starting character of string 3.

So, first strchr calls determines where in the "keyboard" string is the first character of String 3 located.  Then, if the location is equal to zero, then the program throws an error.  Otherwise, it will check again (the second strchr call) and store in esi.  This segment then goes on to grab the first character in the "fox" string, and determine where it is located in the "keyboard" string (the third strchr call).  This leaves with the following values stored in memory:

  • Location of the first character of String 3 in the "keyboard" string
  • The first character of the "fox" string
  • Location of the first character of the "fox" string in the "keyboard" string
Just after the "jnz short loc_4010D4" call we see the logic that subtracts the strchr(keyboard, fox[0]) value from the strchr(keyboard, String3[0]) value.  Then the program grabs the value in the "keyboard" string at the position resultant from the previous subtraction.  So the ending location that the program is looking for can be written like this:

value = keyboard[strchr(keyboard, String3[0]) - strchr(keyboard, fox[0])]

Knowing that, I decided to knock up a quick version of the logic (without the error checking) just to see what the value would be before going through debugging it.  I decided to do it in python, because I hadn't used it in a while and why not?  This is what I came up with:


So, we run this real quick to see what we come up with and...


Yep.  There is the IP address and port number used by this binary (remember it only talks to one location).  Checking in debugging and by detonating this binary confirms what we are seeing from running our script.


There are the callouts to the same IP address and port that we determined from our script!  Now, we could have gotten this IP and port by dynamic analysis, and we could have determined that two of the strings were used in the binary.  Some analysts would continue on either due to time constraints, or satisfied that indicators were found.  However, by actually looking into why and how these strings are used, we can be confident in using those strings in yara rules that could help us distinguish the presence of this malware in the future.  Ideally, we would want to determine the commonality of these string between multiple samples, but I only had one sample of this malware.  However, we could create something like the following as a starting rule:

Yara Rule for Given Indicators

So, when you get asked by someone "Why did you use these strings for your signature?", you will have an answer. =)


[EDIT: 06.19.14 - Fixed yara rule to include the "wide" and "ascii" modifiers]

No comments:

Post a Comment

Threat Hunting: Experts Not Required

Apologies to anyone that may have periodically checked this blog in the past few years only to find the same two posts (I'm betting th...