210 lines
7.6 KiB
Markdown
210 lines
7.6 KiB
Markdown
---
|
|
title: using sys_write to print a string (x86-64, NASM)
|
|
author: 0x3bb
|
|
date: M07-27-2024
|
|
---
|
|
|
|
## context
|
|
I'm reading a book on x86-64 NASM.
|
|
|
|
One of the exercises involves printing a string to stdout by leveraging a Linux syscall called `sys_write`, but doesn't mention the reasoning behind the `mov` instructions into the registers or how they're used once you make the syscall.
|
|
|
|
An approachable way for beginners to understand how syscalls are made in Linux is referring to the [Linux System Call Table](https://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/). It documents each parameter and which in which specific register it expects to find it.
|
|
|
|
Looking at that table, it can be concluded the following registers need to be populated:
|
|
|
|
- `rdi` → the call identifier. for `sys_write`, it's `1` from reading the first column of that table
|
|
- `rdi` → file descriptor, `1` being for `stdout`
|
|
- `rsi` → contents of buffer (i.e. the string)
|
|
- `rdx` → length of the buffer for stdout
|
|
|
|
## .data
|
|
The first step is to define the `.data` section and initialize memory for `rsi`.
|
|
|
|
The syntax looks like this:
|
|
<variable name> <type> <value>
|
|
|
|
`0xa` is just the ASCII hex representation of a new line.
|
|
|
|
```
|
|
; sys_write_string.asm
|
|
section .data
|
|
s1 db "s1", 0x0a, 0; terminated string with NL
|
|
```
|
|
|
|
Then, for `rdx`, the requirement is the length of the string, minus the terminator.
|
|
|
|
Many standard libraries add the `NULL` terminator to initialized strings. This is so when the string is passed around, its length can be implicitly determined without the caller having to pass an additional parameter (the length of the string).
|
|
|
|
Although, since `sys_write` does not expect a terminated string, this should be stripped. By taking the address of `s1` minus the terminated character (1 byte/8 bits).
|
|
|
|
```
|
|
s1Len equ $-s1-1; offset of s1 - terminator
|
|
```
|
|
|
|
## .bss
|
|
|
|
- `.bss` stands for _Block Started by Symbol_, and contains uninitialized variables that assign memory at runtime. For this example, it makes sense to define the value in the `.data` block above instead.
|
|
|
|
```
|
|
section .bss
|
|
```
|
|
|
|
## .text
|
|
|
|
The final section required is to define the entrypoint of the program
|
|
|
|
- `.text` refers to the code segment: the program's virtual address space that contains executable instructions.
|
|
- `main` is our program's entrypoint.
|
|
|
|
```
|
|
section .text
|
|
global main
|
|
```
|
|
|
|
`main` must be prefixed with the `global` directive.
|
|
|
|
If the `global` keyword was omitted, then the linker (`ld`) will not see it, since `main` is scoped to the object file:
|
|
|
|
```
|
|
[0x3bb@heimat 2]$ make
|
|
nasm -f elf64 -g -F dwarf sys_write_string.asm -l sys_write_string.lst
|
|
sys_write_string.asm:9: warning: label alone on a line without a colon might be in error [-w+label-orphan]
|
|
gcc -o sys_write_string sys_write_string.o -no-pie
|
|
/usr/sbin/ld: /usr/lib/gcc/x86_64-pc-linux-gnu/14.1.1/../../../../lib/crt1.o: in function `_start':
|
|
(.text+0x1b): undefined reference to `main'
|
|
collect2: error: ld returned 1 exit status
|
|
```
|
|
|
|
This can also be seen using GNU `nm`, a tool that can dump symbols from binaries and object files. Without the `global` directive, the symbol `t` is local.
|
|
|
|
```
|
|
[0x3bb@heimat 2]$ nm sys_write_string.o
|
|
0000000000000000 t main
|
|
0000000000000000 d s1
|
|
```
|
|
|
|
With the `global` directive set correctly, the [symbol type identifier](https://sourceware.org/binutils/docs-2.39/binutils/nm.html) will be uppercase and linked correctly.
|
|
|
|
## label
|
|
The `main` label serves as an alias to the block of instructions defined below
|
|
|
|
```
|
|
main:
|
|
push rbp; push base address of stack frame to restore later
|
|
mov rbp, rsp; copy address of current stack frame
|
|
mov rax, 1; 1 = sys_write call
|
|
mov rdi, 1; 1 = file descriptor (stdout)
|
|
mov rsi, s1; buf
|
|
mov rdx, s1Len; length of buf
|
|
syscall ; invoke sys_write to stdout
|
|
mov rsp, rbp; copy original base address of stack frame
|
|
pop rbp; restore caller stack frame
|
|
mov rax, 60; sys_exit
|
|
mov rdi, 0; = exit code
|
|
syscall ; invoke sys_exit
|
|
```
|
|
|
|
### assembler output
|
|
|
|
Looking to see what the assembler did, the label instructions have been assigned a relative address by the assembler
|
|
|
|
```
|
|
[0x3bb@heimat 2]$ cat sys_write_string.lst
|
|
1 ; sys_write_string.asm
|
|
2 section .data
|
|
3 00000000 73310A00 s1 db "s1", 0x0a, 0; terminated string with NL
|
|
4 s1Len equ $-s1-1; address of s1 - terminator
|
|
5
|
|
6 section .bss
|
|
7
|
|
8 section .text
|
|
9 global main
|
|
10
|
|
11 main:
|
|
12 00000000 55 push rbp; push base address of stack frame to restore later
|
|
13 00000001 4889E5 mov rbp, rsp; copy address of current stack frame
|
|
14 00000004 B801000000 mov rax, 1; 1 = sys_write call
|
|
```
|
|
|
|
### gdb
|
|
|
|
Debugging the program, the assembler chose to use 32-bit value registers; as it would be wasteful to put these values for `sys_write` in their 64-bit counterparts.
|
|
|
|
```
|
|
[0x3bb@heimat 2]$ gdb sys_write_string
|
|
GNU gdb (GDB) 15.1
|
|
Copyright (C) 2024 Free Software Foundation, Inc.
|
|
|
|
(gdb) disassemble main
|
|
Dump of assembler code for function main:
|
|
0x0000000000401110 <+0>: push %rbp
|
|
0x0000000000401111 <+1>: mov %rsp,%rbp
|
|
0x0000000000401114 <+4>: mov $0x1,%eax
|
|
0x0000000000401119 <+9>: mov $0x1,%edi
|
|
0x000000000040111e <+14>: movabs $0x404010,%rsi
|
|
0x0000000000401128 <+24>: mov $0x3,%edx
|
|
0x000000000040112d <+29>: syscall
|
|
0x000000000040112f <+31>: mov %rbp,%rsp
|
|
0x0000000000401132 <+34>: pop %rbp
|
|
0x0000000000401133 <+35>: mov $0x3c,%eax
|
|
0x0000000000401138 <+40>: mov $0x0,%edi
|
|
0x000000000040113d <+45>: syscall
|
|
End of assembler dump.
|
|
|
|
(gdb) break main
|
|
Breakpoint 1 at 0x401110: file sys_write_string.asm, line 12.
|
|
|
|
(gdb) run
|
|
Starting program: /mnt/h/src/asm/2/sys_write_string
|
|
|
|
Breakpoint 1, main () at sys_write_string.asm:12
|
|
12 push rbp; function prologue
|
|
(gdb) step
|
|
13 mov rbp, rsp; function prologue
|
|
(gdb) step
|
|
14 mov rax, 1; 1 = sys_write call
|
|
(gdb) step
|
|
15 mov rdi, 1; 1 = file descriptor (stdout)
|
|
(gdb) step
|
|
16 mov rsi, s1; buf
|
|
(gdb) step
|
|
17 mov rdx, s1Len; length of buf
|
|
(gdb) step
|
|
18 syscall ; invoke sys_write to stdout
|
|
|
|
(gdb) info registers
|
|
rax 0x1 1
|
|
rbx 0x7fffffffe368 140737488348008
|
|
rcx 0x403e30 4210224
|
|
rdx 0x3 3
|
|
rsi 0x404010 4210704
|
|
rdi 0x1 1
|
|
rbp 0x7fffffffe240 0x7fffffffe240
|
|
rsp 0x7fffffffe240 0x7fffffffe240
|
|
r8 0x0 0
|
|
r9 0x7ffff7fcb200 140737353921024
|
|
r10 0x7fffffffdf70 140737488346992
|
|
r11 0x203 515
|
|
r12 0x1 1
|
|
r13 0x0 0
|
|
r14 0x7ffff7ffd000 140737354125312
|
|
r15 0x403e30 4210224
|
|
rip 0x40112d 0x40112d <main+29>
|
|
eflags 0x246 [ PF ZF IF ]
|
|
```
|
|
|
|
The string length from `rdx` passed to `sys_write` contains no `NULL` terminator
|
|
|
|
```
|
|
(gdb) info registers rdx
|
|
|
|
rdx 0x3 3
|
|
|
|
(gdb) x/3 $rsi
|
|
0x404010 <s1>: 115 's' 49 '1' 10 '\n'
|
|
|
|
(gdb) x/4 $rsi
|
|
0x404010 <s1>: 115 's' 49 '1' 10 '\n' 0 '\000'
|
|
```
|