Disassembler & DeCompiler for C, AGNSS & DCC 2.60

1.1.

1.2.

2.1.

2.2.

4.1.

4.2.

4.3.

We will describe some check points to output the complete disassembled source files in AGNSS.

The main check points to know whether or not a disassembling operation is achieved successfully are the following two;

1. Is correct the distinction between code area and data area?

About the distinction between code area and data area, there could be two undesirable cases, a case where an area, should be data area, is described as code area, and a case where an area, should be code area, is described as data area.

1.1. The area, should be data area, is described as code area.

In the case where an area, should be data area, is described as code area, it will be immediately modified by restarting XSIM from pass 1 after simply added 'dat' command in pas 1 of a command file, e.g., www.agn. For example, after added 'dat' command as follows;

pas 1

and assign the start option, -i, in XSIM such as

C>xsim -iv www .

In 'dat' command, you must assign a data area precisely.
(About seg in an address, seg:off, it is general to describe a relative value from the beginning seg in the EXE, by assigning ON in 'rel' command. Remark that the default 'rel' command is OFF, an absolute seg value!)

In order to know whether or not an area is data area, 'u' command in BROW, browser mode, is useful, that is the same simple disassembling as a debugger. On the place where a code array is deteriorated, by seeing it in 'u' command, you will immediately know the data area between the two code areas. That is, it is all you should do that, looking at the source by 'l' command, l+ or l++, adding 'dat' commands to the places where code arrays are deteriorated, restarting XSIM to modify them, and checking them modified In BROW.

1.2. The area, should be code area, is described as data area.

There could be two cases that an area, should be code area, is described as data area. One is the case where, because an area, should be data area, is described as code area, as explained in section 1.1, the incorrect code deteriorated the correct code that should be code area. In this case, they will be modified at all by adding 'dat' command assignments, as explained in section 1.1.

Second is the case that XSIM has not simulated there yet. If you found an area, should be code area, is described as data area, see its DT-attributes by 'y' command in BROW. In that area, if its DT-attributes are all 00s, then the area is not simulated yet.

Though XSIM executes loops to find dead codes, the default of this 'loo' command is 6 (times) in cases that the input executable file is more than 300KB in size. Hence, by assigning ffff for 'loo' command in the command file such that

pas 1

you can make XSIM simulate the whole area. Even if assigned ffff, not looping ffffH times, the loop is stopped when any more codes are not found, and so, XSIM stops to simulate in about 5 minutes even for EXE more than 1MB in size.

If all codes and data are correctly distincted, the value of 'neutral area' becomes 0, that is displayed if started XSIM with -v start option.

2. Is correct the distinction between offset operators and immediate values?

2.1. OBJ files and PE files

With regard to the distinction between offset operators and immediate values, there is not occured this problem in cases that the input executable file is an OBJ file or a PE file, the executable file in Windows 95/98/2000/NT/XP, since all offset operators in code areas and all data offsets in data areas are always completely resolved. Thus, by assigning data areas by 'dat' command routinely following 1.1 and 1.2, the complete assembly sources are produced, that can reproduce an EXE by reassembling and relinking, with the same behavor as the original EXE.

In cases that the input executable file is a PE file, LE file in flat model such as VxD or 386, device drivers in Windows, or the one made by a compiler, you need to rename all code segment names and their class names, included in flat segment, as _TEXT and CODE, and rename all data segment names and their class names as _DATA and DATA, comprehensively by 'ren' command.

In the case of OBJ file, you need not to rename since all segment names, class names and group names are automatically renamed in the include file (.INC) as the original names.

2.2. EXE/COM/SYS files in MSDOS, NE files, and MEM/EXP files

In cases that the input executable file is ones (EXE/COM/SYS) in MSDOS, a NE file in 16-bits Windows such as Windows 3.0 and 3.1, a memory file (MEM), or ones (EXP) in PharLap Dos-Extender, XSIM simulates it such as to add offset operators as much as possible. It is because the referred data areas such as strings can be easily found by 'f' command in BROW and the comment '; from' of the cross reference in DASM.

To restore the erroneous offset operators to immediate values, you can use 'imd' command. To modify such offset operators and to assign data offsets in data areas, restart XSIM (without -i start option) after assigned 'ofs' command in pas 2!

3. Screen output in BROW

Screen output in BROW is not the complete ASM source different with the output of DASM, the source generator, that is considered as to be reassembled and linked.
For example, 'endp' corresponding to 'proc' is not displayed if prolonged to another screen page in BROW, and the include files (.INC) are not output, that are indispensable when the output file is divided into module files by 'mod' command.

Also the module definition file (.DEF) and the resource file (.RC) are not output. It is because they are all generated by DASM, the source generator.
But, via XSIM (EXE simulator) and BROW (browser mode), you can become experienced in the fundamental disassembling work such as to evaluate their capacity for the distinction between code and data areas, and for the distinction between offsets operators and immediate values.

It is no problem even if you always to generate source files and list files by DASM, not using BROW. In this case, by dividing the output file into module files by 'mod' command previously, you can get the necessary module files by ON/OFF option in 'mod' command.

4. Disassembling in higher levels

If you have become experienced in the fundamental disassemling work, you can proceed in higher disassembling work to know the meanings of each procedure and data in the input executable file.
Here is three steps as follows;

An input executable file includes sometimes its debug information. In this case, for renaming work in [1] and for dividing modules in [2], you can do it in the completely same manner as the original by using the dump file output of 'd' command in BROW or by using the automatical output mechanism of map/ren/ mod/dcc files from TDS file in DASM or CGN2, the C source generator 2.

4.1. Make a rename file (.ren) for ren command!

Making a rename file (.ren) for ren command, and add the name of function (name of proc) into the file if you recognize its name in turn. As for segment names, referring the segment names generated into .msg file if started DASM with -v start option, they should be described at the top of the rename file.
In cases that the input executable file is a PE file, LE file in flat model, or the one made by a compiler, you need to rename all code segment names and their class names, included in flat segment, as _TEXT and CODE, and rename all data segment name and their class names as _DATA and DATA, comprehensively.

4.2. Make a module file (.mod) for mod command!

Making a module file (.mod) for mod command, and add the name of module into the file if you recognize its name in turn. You may firstly set the name as same as its segment name. You can restrain DASM to output the unnecessary module by ON/OFF option in 'mod' file.

4.3. Make a comment file (.cmt) for cmt command!

Making a comment file (.cmt) for cmt command, and add the comment of each procedure and data into the file if you recognize its meaning in turn.

5. Toward decompiling for C

In case that the input executable file is the one in Windows, by seeing export/import functions in the module definition file and menus or dialogs in the resource file, you can easily see about how it behaves as a whole. But only by seeing the source list, it is difficult to go further, and so, by tracing the EXE actually, by generating the log file executing, or by trying other trials, you can go further work.

For example, by generating simbols/source information by CGN2, C source generator 2, you can directly trace the original EXE in source debugging. And also you can go further by renaming functions and data if you recognize its meaning or constitution of structures, pointers, or variables, by describing them in DCC command file as 'cst' and 'cgl/cfu' commands, by tracing and displaying the DLL access information in real time with the sister product PROXYAN, by generating the log file executing, and by dumping data from the linked the reproduced EXE with the dump routine.

In case that the input executable file is the one in Windows, you cannot reach anywhere by tracing from win_start by a debugger since each procedure is started by a message, but, by setting a breakpoint near the call address displayed in the log file of PROXYAN, you can easily find the place of source list for each procedure. In order to do it from top-down steps of message starting, the most important key is ID value of each procedure displayed in menus or dialogs in the resource file. A procedure usually calls other many procedures, and so, in order to know the procedure as a whole, it is very helpful for you to generate the call flow diagrams in ctr/cfr command with 'nest' option about 2, not so large, as an automatically generating flow chart.