title: using sys_write to print a string (x86-64, NASM)
author: 0x3bb
date: M07-27-2024
---
## context
I'm reading a book on x86-64 NASM.
One of the exercises involves printing a string to stdout by leveraging a Linux syscall called `sys_write`, but doesn't mention the reasoning behind the `mov` instructions into the registers or how they're used once you make the syscall.
An approachable way for beginners to understand how syscalls are made in Linux is referring to the [Linux System Call Table](https://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/). It documents each parameter and which in which specific register it expects to find it.
Looking at that table, it can be concluded the following registers need to be populated:
-`rdi` → the call identifier. for `sys_write`, it's `1` from reading the first column of that table
-`rdi` → file descriptor, `1` being for `stdout`
-`rsi` → contents of buffer (i.e. the string)
-`rdx` → length of the buffer for stdout
## .data
The first step is to define the `.data` section and initialize memory for `rsi`.
The syntax looks like this:
<variablename><type><value>
`0xa` is just the ASCII hex representation of a new line.
Then, for `rdx`, the requirement is the length of the string, minus the terminator.
Many standard libraries add the `NULL` terminator to initialized strings. This is so when the string is passed around, its length can be implicitly determined without the caller having to pass an additional parameter (the length of the string).
Although, since `sys_write` does not expect a terminated string, this should be stripped. By taking the address of `s1` minus the terminated character (1 byte/8 bits).
-`.bss` stands for _Block Started by Symbol_, and contains uninitialized variables that assign memory at runtime. For this example, it makes sense to define the value in the `.data` block above instead.
/usr/sbin/ld: /usr/lib/gcc/x86_64-pc-linux-gnu/14.1.1/../../../../lib/crt1.o: in function `_start':
(.text+0x1b): undefined reference to `main'
collect2: error: ld returned 1 exit status
```
This can also be seen using GNU `nm`, a tool that can dump symbols from binaries and object files. Without the `global` directive, the symbol `t` is local.
```
[0x3bb@heimat 2]$ nm sys_write_string.o
0000000000000000 t main
0000000000000000 d s1
```
With the `global` directive set correctly, the [symbol type identifier](https://sourceware.org/binutils/docs-2.39/binutils/nm.html) will be uppercase and linked correctly.
## label
The `main` label serves as an alias to the block of instructions defined below
Debugging the program, the assembler chose to use 32-bit value registers; as it would be wasteful to put these values for `sys_write` in their 64-bit counterparts.
```
[0x3bb@heimat 2]$ gdb sys_write_string
GNU gdb (GDB) 15.1
Copyright (C) 2024 Free Software Foundation, Inc.
(gdb) disassemble main
Dump of assembler code for function main:
0x0000000000401110 <+0>: push %rbp
0x0000000000401111 <+1>: mov %rsp,%rbp
0x0000000000401114 <+4>: mov $0x1,%eax
0x0000000000401119 <+9>: mov $0x1,%edi
0x000000000040111e <+14>: movabs $0x404010,%rsi
0x0000000000401128 <+24>: mov $0x3,%edx
0x000000000040112d <+29>: syscall
0x000000000040112f <+31>: mov %rbp,%rsp
0x0000000000401132 <+34>: pop %rbp
0x0000000000401133 <+35>: mov $0x3c,%eax
0x0000000000401138 <+40>: mov $0x0,%edi
0x000000000040113d <+45>: syscall
End of assembler dump.
(gdb) break main
Breakpoint 1 at 0x401110: file sys_write_string.asm, line 12.