NASM入门

NASM语法

查看NASM官网

查看稍low但是符合本菜鸡水平的教程

查看手册 Introduction to NASM—— A Study Material for CS2093 - Hardware Laboratory

0. 写在前面

如何判断这段nasm程序是32位还是64位?

为了更好地向下兼容(backward compatibility), 选取高位架构cpu寄存器的低位作为低位程序的寄存器

例如:

I-386的8个通用寄存器

实际上, 现在的x86-64有16个寄存器:

rax,rbx,rcx,rdx,esi,edi,rbp,rsp,r8,r9,r10,r11,r12,r13,r14,r15。

因此程序中如果出现eax, 说明这段程序应当运行在32位操作系统上(如果考虑向下兼容性的话, 也可以运行在64位操作系统上). 如果出现rax, 说明这段程序只能运行在64位操作系统上

To assemble a file, you issue a command of the form

1
nasm -f <format> <filename> [-o <output>]

0.1.一段程序如何开始

When we start a program, it will be copied into the main memory and EIP is the pointer which points to the starting of this program in memory and execute each instruction sequentially.

0.3. Big Endian & Little Endian

大端存储符合人们正常的阅读习惯, 但是x86架构的cpu采用小端存储的方式

img

1. section

一个典型的nasm程序包括三种section

1
2
3
section .text
section .bss
section .data

(1) section .text 相当于 C 程序中的main, 是汇编程序的入口

(2) section .bss 是汇编程序中声明变量而不用初始化的地方, 例如:

1
2
3
section .bss 
var3: resb 1
var4: resq 1

其中 res<$x$> 表示在内存中保留大小与<$x$>相对应的的空间, 但是并不初始化. $x\in \{b, w, d, q, t\}$

<VAR_NAME>: res<x> <NUM>

(3) section .data 是汇编程序中声明变量, 并且需要初始化的地方, 例如:

1
2
3
section .data 
var1: db 10
str1: db “Hello World!..”

其中 d<$x$> 表示在内存中保留大小与<$x$>相对应的的空间, 同时也需要初始化. $x\in \{b, w, d, q, t\}$

<VAR_NAME>: d<x> <VALUE>

x可能的取值

但是section并不是必须的, 每一个section的作用是给程序员一种感觉上连续的“虚拟内存空间”

“Even the most structured assembler does not care if there is no text, data, or bss.”

  • 数的表示

    • 以b结尾的数为二进制数

    • 以o结尾的数为八进制数

    • 以h结尾的数为十六进制数

  • 字符串的表示

    • 无论是“Hello” 还是"H", “e”, “l”, “l”, “o”, 都表示相同的意思
  • 数组的表示

    • <VAR_NAME>: d<x> <VALUE1>, <VALUE2>, …
  • 大批量初始化——times

    times is used to create and initialize large arrays with a common initial value for all its elements.

    • <VAR_NAME>: times <NUM> d<x> <VALUE>
  • NASM中的解引用

    • [] 用在变量上表示解应用, 相当于 C语言中的 *

2. x86指令集

easy版的x86指令集

MOV: mov/copy

$sy: mov\ \ dest, src$

效果是: dest = src

  • src should be a register / memory operand
  • Both src and dest cannot together be memory operands.
    • bbzl, 高中英语强调的部分否定

MOVZX: move and extend

$sy:movzx\ \ dest,src$

效果是: dest = src

  • size of dest should be >= size of src
  • src should be a register / memory operand
  • Both src and dest cannot together be memory operands.
  • Works only with signed numbers.

ADD: addition

$sy:add\ \ dest,src$

效果是: dest = dest + src

  • src should be a register / memory operand
  • Both src and dest cannot together be memory operands.
  • Both the operands should have the same size.

SUB: subtraction

$sy:sub\ \ dest,src$

效果是: dest = dest - src

  • src should be a register / memory operand
  • Both src and dest cannot together be memory operands.
  • Both the operands should have the same size.

下面的有时间再补

我的建议是RTFM

INC: increment

DEC: decrement

MUL: multiplication

IMUL: multiplication of signed numbers

DIV: division

NEG: negation of signed numbers

CLC:

ADC:

2.13. SBB

2.14. JMP

2.15. CMP

2.16. LOOP

2.17. AND

2.18. OR

2.19. XOR

2.20. NOT

2.21. TEST

2.22. SHL

2.23. SHR

2.24. ROL

2.25. ROR

2.26. RCL

2.27. RCR

2.28. PUSH

POP: Pop off a value from the system stack

PUSHA: Pushes the value of all general purpose registers

PUSHA is used to save the value of general purpose registers especially when calling some subprograms which will modify their values.

POPA: POPA – POP off the value of all general purpose registers which we have pushed before using PUSHA instruction

PUSHF: Pushes all the CPU FLAGS

2.33. POPF

2.34. 预处理

3. NASM I/O

​ The input from the standard input device (Keyboard) and Output to the standard output device (monitor) in a NASM Program is implemented using the Operating System’s read and write system call. Interrupt no: 80h is given to the software generated interrupt in Linux Systems. Applications implement the System Calls using this interrupt. When an application triggers int 80h, then OS will understand that it is a request for a system call and it will refer the general purpose registers to find out and execute the exact Interrupt Service Routine (i.e. System Call here).

​ The standard convention to use the software 80h interrupt is, we will put the system call no in eax register and other parameters needed to implement the system calls in the other general purpose registers. Then we will trigger the 80h interrupt using the instruction ‘INT 80h’. Then OS will implement the system call.

3.1. exit system call

  • System call number for exit is 1, so it is copied to eax reg.
  • Output of a program if the exit is successful is 0 and it is being passed as a parameter for exit( ) system call. We need to copy 0 to ebx reg.
  • Then we will trigger INT 80h mov eax, 1
1
2
3
;System Call Number 
mov ebx, 0 ;Parameter
int 80h ;Triggering OS Interrupt

3.2. read system call

read指的是从输入设备读入

  • Using this we could read only string / character
  • System Call Number for Read is 3. It is copied to eax.
  • The standard Input device(keyboard) is having the reference number 0 and it must be copied to ebx reg.
  • We need to copy the pointer in memory, to which we need to store the input string to ecx reg.
  • We need to copy the number of characters in the string to edx reg.
  • Then we will trigger INT 80h.
  • We will get the string to the location which we copied to ecx reg.
1
2
3
4
5
mov eax, 3 ;Sys_call number for read 
mov ebx, 0 ;Source Keyboard
mov ecx, var ;Pointer to memory location
mov edx, dword[size] ;Size of the string
int 80h ; Triggering OS Interrupt
  • This method is also used for reading integers and it is a bit tricky. If we need to read a single digit, we will read it as a single character and then subtract 30h from it(ASCII of 0 = 30h). Then we will get the actual value of that number in that variable.
1
2
3
4
5
6
mov eax, 3 
mov ebx, 0
mov ecx, digit1
mov edx, 1
int 80h
sub byte[digit1], 30h ;Now we have the actual number in [var]

3.3. write system call

write指的是写到输出设备上

  • Using this we could write only string / character
  • System Call Number for Write is 4. It is copied to eax.
  • The standard Output device(Monitor) is having the reference number 1 and it must be copied to ebx reg.
  • We need to copy the pointer in memory, where the output sting resides to ecx reg.
  • We need to copy the number of characters in the string to edx reg.
  • sThen we will trigger INT 80h.
1
2
3
4
5
mov eax, 4 ;Sys_call number 
mov ebx, 1 ;Standard Output device
mov ecx, msg1 ;Pointer to output string
mov edx, size1 ;Number of characters
int 80h ;Triggering interrupt.

4. NASM 函数调用

1
2
3
func:
push ebx