7.6 KiB
title | author | date |
---|---|---|
x86-64 NASM using syswrite to print a string | 0x3bb | M07-27-2024 |
I'm reading a book on x86-64 NASM.
One of the exercises involves printing a string to stdout by leveraging a Linux syscall called sys_write
, but doesn't mention the reasoning behind the mov
instructions into the registers or how they're used once you make the syscall.
An approachable way for beginners to understand how syscalls are made in Linux is referring to the Linux System Call Table. It documents each parameter and in which specific register it expects to find it.
Looking at that table, it can be concluded the following registers need to be populated:
rax
→ the call identifier. forsys_write
, it's1
from reading the first column of that tablerdi
→ file descriptor,1
being forstdout
rsi
→ contents of buffer (i.e. the string)rdx
→ length of the buffer for stdout
.data
The first step is to define the .data
section and initialize memory for rsi
.
The syntax looks like this:
0x0a
is just the ASCII hex representation of a new line.
; sys_write_string.asm
section .data
s1 db "s1", 0x0a, 0; terminated string with NL
Then, for rdx
, the requirement is the length of the string, minus the terminator.
Many standard libraries add the NULL
terminator to initialized strings. This is so when the string is passed around, its length can be implicitly determined without the caller having to pass an additional parameter (the length of the string).
Although, since sys_write
does not expect a terminated string, this should be stripped. By taking the address of s1
minus the terminated character (1 byte/8 bits).
s1Len equ $-s1-1; offset of s1 - terminator
.bss
.bss
stands for Block Started by Symbol, and contains uninitialized variables that assign memory at runtime. For this example, it makes sense to define the value in the.data
block above instead.
section .bss
.text
The final section required is to define the entrypoint of the program
.text
refers to the code segment: the program's virtual address space that contains executable instructions.main
is our program's entrypoint.
section .text
global main
main
must be prefixed with the global
directive.
If the global
keyword was omitted, then the linker (ld
) will not see it, since main
is scoped to the object file:
[0x3bb@heimat 2]$ make
nasm -f elf64 -g -F dwarf sys_write_string.asm -l sys_write_string.lst
sys_write_string.asm:9: warning: label alone on a line without a colon might be in error [-w+label-orphan]
gcc -o sys_write_string sys_write_string.o -no-pie
/usr/sbin/ld: /usr/lib/gcc/x86_64-pc-linux-gnu/14.1.1/../../../../lib/crt1.o: in function `_start':
(.text+0x1b): undefined reference to `main'
collect2: error: ld returned 1 exit status
This can also be seen using GNU nm
, a tool that can dump symbols from binaries and object files. Without the global
directive, the symbol t
is local.
[0x3bb@heimat 2]$ nm sys_write_string.o
0000000000000000 t main
0000000000000000 d s1
With the global
directive set correctly, the symbol type identifier will be uppercase and linked correctly.
label
The main
label serves as an alias to the block of instructions defined below
main:
push rbp; push base address of stack frame to restore later
mov rbp, rsp; copy address of current stack frame
mov rax, 1; 1 = sys_write call
mov rdi, 1; 1 = file descriptor (stdout)
mov rsi, s1; buf
mov rdx, s1Len; length of buf
syscall ; invoke sys_write to stdout
mov rsp, rbp; copy original base address of stack frame
pop rbp; restore caller stack frame
mov rax, 60; sys_exit
mov rdi, 0; = exit code
syscall ; invoke sys_exit
assembler output
Looking to see what the assembler did, the label instructions have been assigned a relative address by the assembler
[0x3bb@heimat 2]$ cat sys_write_string.lst
1 ; sys_write_string.asm
2 section .data
3 00000000 73310A00 s1 db "s1", 0x0a, 0; terminated string with NL
4 s1Len equ $-s1-1; address of s1 - terminator
5
6 section .bss
7
8 section .text
9 global main
10
11 main:
12 00000000 55 push rbp; push base address of stack frame to restore later
13 00000001 4889E5 mov rbp, rsp; copy address of current stack frame
14 00000004 B801000000 mov rax, 1; 1 = sys_write call
gdb
Debugging the program, the assembler chose to use 32-bit value registers; as it would be wasteful to put these values for sys_write
in their 64-bit counterparts.
[0x3bb@heimat 2]$ gdb sys_write_string
GNU gdb (GDB) 15.1
Copyright (C) 2024 Free Software Foundation, Inc.
(gdb) disassemble main
Dump of assembler code for function main:
0x0000000000401110 <+0>: push %rbp
0x0000000000401111 <+1>: mov %rsp,%rbp
0x0000000000401114 <+4>: mov $0x1,%eax
0x0000000000401119 <+9>: mov $0x1,%edi
0x000000000040111e <+14>: movabs $0x404010,%rsi
0x0000000000401128 <+24>: mov $0x3,%edx
0x000000000040112d <+29>: syscall
0x000000000040112f <+31>: mov %rbp,%rsp
0x0000000000401132 <+34>: pop %rbp
0x0000000000401133 <+35>: mov $0x3c,%eax
0x0000000000401138 <+40>: mov $0x0,%edi
0x000000000040113d <+45>: syscall
End of assembler dump.
(gdb) break main
Breakpoint 1 at 0x401110: file sys_write_string.asm, line 12.
(gdb) run
Starting program: /mnt/h/src/asm/2/sys_write_string
Breakpoint 1, main () at sys_write_string.asm:12
12 push rbp; function prologue
(gdb) step
13 mov rbp, rsp; function prologue
(gdb) step
14 mov rax, 1; 1 = sys_write call
(gdb) step
15 mov rdi, 1; 1 = file descriptor (stdout)
(gdb) step
16 mov rsi, s1; buf
(gdb) step
17 mov rdx, s1Len; length of buf
(gdb) step
18 syscall ; invoke sys_write to stdout
(gdb) info registers
rax 0x1 1
rbx 0x7fffffffe368 140737488348008
rcx 0x403e30 4210224
rdx 0x3 3
rsi 0x404010 4210704
rdi 0x1 1
rbp 0x7fffffffe240 0x7fffffffe240
rsp 0x7fffffffe240 0x7fffffffe240
r8 0x0 0
r9 0x7ffff7fcb200 140737353921024
r10 0x7fffffffdf70 140737488346992
r11 0x203 515
r12 0x1 1
r13 0x0 0
r14 0x7ffff7ffd000 140737354125312
r15 0x403e30 4210224
rip 0x40112d 0x40112d <main+29>
eflags 0x246 [ PF ZF IF ]
The string length from rdx
passed to sys_write
contains no NULL
terminator
(gdb) info registers rdx
rdx 0x3 3
(gdb) x/3 $rsi
0x404010 <s1>: 115 's' 49 '1' 10 '\n'
(gdb) x/4 $rsi
0x404010 <s1>: 115 's' 49 '1' 10 '\n' 0 '\000'