After processing tons of malicous binaries, I would like to share my findings about anomalies found in PE binaries. These anomaly information will be helpful for security researchers on suspicious sample validation and sample clustering.
1. Binary strings nearby EP
Of course, EP binary is very popular for AV companies to work out malware signatures. So I put it at first. 81ec8001000053555633db57895c2418c74424103091400033 is most frequent EP string used by malware, which stands for stack operations. The second one is 60e803000000e9eb045d4555c3e801000000eb5dbbedffffff
This finding is pretty much similar with another research work from http://www.hexacorn.com/blog/2012/07/04/random-stats-from-300k-malicious-samples-entry-points/
That article listed top-10 EP strings as the followings:
35498 55 8B EC 6A FF 68 22712 55 8B EC 83 C4 F0 14775 55 8B EC 53 8B 5D 7711 4D 5A 90 00 03 00 6959 55 8B EC 83 C4 C4 5775 4D 5A 50 00 02 00 3497 55 8B EC 83 C4 F4 3190 60 E8 00 00 00 00 3080 83 7C 24 08 01 75 2152 55 8B EC 83 C4 B4
2. Section names
I concatenated each section name into a string. Here is the top ones.
UPX is still the favorite packer for malware wirters, followed by UPack.
The above figure shows the values of section number in DESC order. Most malicious samples have 3 sections.
I also picked some funny section names in Chinese:
天使免杀
天外来客
黑教基地
荒山一鱼
BY 小广
放荡不羁挖出
木马彩衣
牧民战天
傻傻
狂少爷
Here is the longest one:
国庆专版祖国大寿祖国繁荣人民安康祝福大家身体健康工作顺利合家欢乐心想事成万事如意笑脸敬上.
Google translate results:
National Day special edition birthday of the motherland motherland's prosperity and people's well being I wish you all good health and success in your work all wishes come true and good luck. And a big smile!
3. anomaly score
I defined about 10 anomaly features which were used to calculate the total anomaly score.
The following is the score distribution.