blog/2024-07-27-sys-write-string.md

7.6 KiB

title author date
using sys_write to print a string (x86-64, NASM) 0x3bb M07-27-2024

context

I'm reading a book on x86-64 NASM.

One of the exercises involves printing a string to stdout by leveraging a Linux syscall called sys_write, but doesn't mention the reasoning behind the mov instructions into the registers or how they're used once you make the syscall.

An approachable way for beginners to understand how syscalls are made in Linux is referring to the Linux System Call Table. It documents each parameter and which in which specific register it expects to find it.

Looking at that table, it can be concluded the following registers need to be populated:

  • rdi → the call identifier. for sys_write, it's 1 from reading the first column of that table
  • rdi → file descriptor, 1 being for stdout
  • rsi → contents of buffer (i.e. the string)
  • rdx → length of the buffer for stdout

.data

The first step is to define the .data section and initialize memory for rsi.

The syntax looks like this:

0xa is just the ASCII hex representation of a new line.

;       sys_write_string.asm
section .data
s1      db    "s1", 0x0a, 0; terminated string with NL

Then, for rdx, the requirement is the length of the string, minus the terminator.

Many standard libraries add the NULL terminator to initialized strings. This is so when the string is passed around, its length can be implicitly determined without the caller having to pass an additional parameter (the length of the string).

Although, since sys_write does not expect a terminated string, this should be stripped. By taking the address of s1 minus the terminated character (1 byte/8 bits).

s1Len   equ   $-s1-1; offset of s1 - terminator

.bss

  • .bss stands for Block Started by Symbol, and contains uninitialized variables that assign memory at runtime. For this example, it makes sense to define the value in the .data block above instead.
section .bss

.text

The final section required is to define the entrypoint of the program

  • .text refers to the code segment: the program's virtual address space that contains executable instructions.
  • main is our program's entrypoint.
section .text
global  main

main must be prefixed with the global directive.

If the global keyword was omitted, then the linker (ld) will not see it, since main is scoped to the object file:

  [0x3bb@heimat 2]$ make
  nasm -f elf64 -g -F dwarf sys_write_string.asm -l sys_write_string.lst
  sys_write_string.asm:9: warning: label alone on a line without a colon might be in error [-w+label-orphan]
  gcc -o sys_write_string sys_write_string.o -no-pie
  /usr/sbin/ld: /usr/lib/gcc/x86_64-pc-linux-gnu/14.1.1/../../../../lib/crt1.o: in function `_start':
  (.text+0x1b): undefined reference to `main'
  collect2: error: ld returned 1 exit status

This can also be seen using GNU nm, a tool that can dump symbols from binaries and object files. Without the global directive, the symbol t is local.

  [0x3bb@heimat 2]$ nm sys_write_string.o
  0000000000000000 t main
  0000000000000000 d s1

With the global directive set correctly, the symbol type identifier will be uppercase and linked correctly.

label

The main label serves as an alias to the block of instructions defined below

main:
  push    rbp; push base address of stack frame to restore later
  mov     rbp, rsp; copy address of current stack frame
  mov     rax, 1; 1 = sys_write call
  mov     rdi, 1; 1 = file descriptor (stdout)
  mov     rsi, s1; buf
  mov     rdx, s1Len; length of buf
  syscall ; invoke sys_write to stdout
  mov     rsp, rbp; copy original base address of stack frame
  pop     rbp; restore caller stack frame
  mov     rax, 60; sys_exit
  mov     rdi, 0; = exit code
  syscall ; invoke sys_exit

assembler output

Looking to see what the assembler did, the label instructions have been assigned a relative address by the assembler

[0x3bb@heimat 2]$ cat sys_write_string.lst
   1                                          ;       sys_write_string.asm
   2                                          section .data
   3 00000000 73310A00                        s1      db    "s1", 0x0a, 0; terminated string with NL
   4                                          s1Len   equ   $-s1-1; address of s1 - terminator
   5
   6                                          section .bss
   7
   8                                          section .text
   9                                          global  main
  10
  11 main:
  12 00000000 55                              push    rbp; push base address of stack frame to restore later
  13 00000001 4889E5                          mov     rbp, rsp; copy address of current stack frame
  14 00000004 B801000000                      mov     rax, 1; 1 = sys_write call

gdb

Debugging the program, the assembler chose to use 32-bit value registers; as it would be wasteful to put these values for sys_write in their 64-bit counterparts.

  [0x3bb@heimat 2]$ gdb sys_write_string
  GNU gdb (GDB) 15.1
  Copyright (C) 2024 Free Software Foundation, Inc.

  (gdb) disassemble main
  Dump of assembler code for function main:
  0x0000000000401110 <+0>:     push   %rbp
  0x0000000000401111 <+1>:     mov    %rsp,%rbp
  0x0000000000401114 <+4>:     mov    $0x1,%eax
  0x0000000000401119 <+9>:     mov    $0x1,%edi
  0x000000000040111e <+14>:    movabs $0x404010,%rsi
  0x0000000000401128 <+24>:    mov    $0x3,%edx
  0x000000000040112d <+29>:    syscall
  0x000000000040112f <+31>:    mov    %rbp,%rsp
  0x0000000000401132 <+34>:    pop    %rbp
  0x0000000000401133 <+35>:    mov    $0x3c,%eax
  0x0000000000401138 <+40>:    mov    $0x0,%edi
  0x000000000040113d <+45>:    syscall
  End of assembler dump.

  (gdb) break main
  Breakpoint 1 at 0x401110: file sys_write_string.asm, line 12.

  (gdb) run
  Starting program: /mnt/h/src/asm/2/sys_write_string

  Breakpoint 1, main () at sys_write_string.asm:12
  12              push    rbp; function prologue
  (gdb) step
  13              mov     rbp, rsp; function prologue
  (gdb) step
  14              mov     rax, 1; 1 = sys_write call
  (gdb) step
  15              mov     rdi, 1; 1 = file descriptor (stdout)
  (gdb) step
  16              mov     rsi, s1; buf
  (gdb) step
  17              mov     rdx, s1Len; length of buf
  (gdb) step
  18              syscall ; invoke sys_write to stdout

  (gdb) info registers
  rax            0x1                 1
  rbx            0x7fffffffe368      140737488348008
  rcx            0x403e30            4210224
  rdx            0x3                 3
  rsi            0x404010            4210704
  rdi            0x1                 1
  rbp            0x7fffffffe240      0x7fffffffe240
  rsp            0x7fffffffe240      0x7fffffffe240
  r8             0x0                 0
  r9             0x7ffff7fcb200      140737353921024
  r10            0x7fffffffdf70      140737488346992
  r11            0x203               515
  r12            0x1                 1
  r13            0x0                 0
  r14            0x7ffff7ffd000      140737354125312
  r15            0x403e30            4210224
  rip            0x40112d            0x40112d <main+29>
  eflags         0x246               [ PF ZF IF ]

The string length from rdx passed to sys_write contains no NULL terminator

(gdb) info registers rdx

rdx            0x3                 3

(gdb) x/3 $rsi
0x404010 <s1>:  115 's' 49 '1'  10 '\n'

(gdb) x/4 $rsi
0x404010 <s1>:  115 's' 49 '1'  10 '\n' 0 '\000'