xv6 System Call

Part One: System call tracing

Introduction:

Your first task is to modify the xv6 kernel to print out a line for each system call invocation. It is enough to print the name of the system call and the return value; you don’t need to print the system call arguments.

When you’re done, you should see output like this when booting xv6:

...
fork -> 2
exec -> 0
open -> 3
close -> 0
write -> 1
write -> 1

That’s init forking and execing sh, sh making sure only two file descriptors are open, and sh writing the $ prompt. (Note: the output of the shell and the system call trace are intermixed, because the shell uses the write syscall to print its output.)

Optional challenge: print the system call arguments.

Solution:

第一部分需要打印出系统调用函数和返回的结果，根据文档给出的提示，我们知道这部分是定义在syscall.c文件里。接下来我们需要去阅读syscall这个函数的代码：

int num;
struct proc *curproc = myproc();

num = curproc->tf->eax;

根据这几行代码，我们可以知道当操作系统由用户态进入到内核态的时候，需要进行陷阱中断，由用户态进入内核态，用户态的内容在xv6系统中是存在trapframe这个结构体中，这个结构体存的是中断前寄存器的内容,trapframe的结构如下所示：

struct trapframe {
  // registers as pushed by pusha
  uint edi;
  uint esi;
  uint ebp;
  uint oesp;      // useless & ignored
  uint ebx;
  uint edx;
  uint ecx;
  uint eax;

  // rest of trap frame
  ushort gs;
  ushort padding1;
  ushort fs;
  ushort padding2;
  ushort es;
  ushort padding3;
  ushort ds;
  ushort padding4;
  uint trapno;

  // below here defined by x86 hardware
  uint err;
  uint eip;
  ushort cs;
  ushort padding5;
  uint eflags;

  // below here only when crossing rings, such as from user to kernel
  uint esp;
  ushort ss;
  ushort padding6;
};

同样通过阅读源码我们可以知道当前系统调用数是存在eax寄存器中的，因此要打印出系统调用名称我们需要写一个数组进行一一映射：

static char* syscallNames[] = {
  [SYS_fork]  "fork",
  [SYS_exit]  "exit",
  [SYS_wait]  "wait",
  [SYS_pipe]  "pipe",
  [SYS_read]  "read",
  [SYS_kill]  "kill",
  [SYS_exec]  "exec",
  [SYS_fstat] "fstat",
  [SYS_chdir] "chdir",
  [SYS_dup]   "dup",
  [SYS_getpid]"getpid",
  [SYS_sbrk]  "sbrk",
  [SYS_sleep] "sleep",
  [SYS_uptime]"uptime",
  [SYS_open]  "open",
  [SYS_write] "write",
  [SYS_mknod] "mknod",
  [SYS_unlink]"unlink",
  [SYS_link]  "link",
  [SYS_mkdir] "mkdir",
  [SYS_close] "close",
  [SYS_date]  "date",
  [SYS_alarm] "alarm",
  [SYS_dup2]  "dup2"
};

而返回值结果则被存在eax寄存器中：

1	curproc->tf->eax = syscalls[num]();

因此此题解法就呼之欲出：

1	cprintf("SYSCALL: name: %s --> return value: %d\n",syscallNames[num], curproc->tf->eax);

接下来的挑战内容是打印出系统调用参数。

阅读源码，我们可以发现argint函数及其描述：

// Fetch the nth 32-bit system call argument.
int
argint(int n, int *ip)
{
  return fetchint((myproc()->tf->esp) + 4 + 4*n, ip);
}

可以看到这个函数调用了fetchint函数：

// User code makes a system call with INT T_SYSCALL.
// System call number in %eax.
// Arguments on the stack, from the user call to the C
// library system call function. The saved user %esp points
// to a saved program counter, and then the first argument.

// Fetch the int at addr from the current process.
int
fetchint(uint addr, int *ip)
{
  struct proc *curproc = myproc();

  if(addr >= curproc->sz || addr+4 > curproc->sz)
    return -1;
  *ip = *(int*)(addr);
  return 0;
}

结合阅读fetchint的源码和其上面的注释，我们可以很轻松地知道系统调用的参数存储在(tf->esp)+4之上，其中(tf->esp)+4即为第一个参数所在的位置。由于系统调用参数是在用户态中设置，在陷阱中断的过程中压入栈中，并且由esp寄存器记录栈顶指针。所以我们在内核态中无法知道每个系统调用共有几个参数，因此我们也可以定义一个参数用来记录每个系统调用函数有几个参数：

static int syscallArgs[] = {
  [SYS_fork]    0,
  [SYS_exit]    0,
  [SYS_wait]    0,
  [SYS_pipe]    1,
  [SYS_read]    3,
  [SYS_kill]    1,
  [SYS_exec]    2,
  [SYS_fstat]   1,
  [SYS_chdir]   1,
  [SYS_dup]     1,
  [SYS_getpid]  0,
  [SYS_sbrk]    1,
  [SYS_sleep]   1,
  [SYS_uptime]  2,
  [SYS_open]    2,
  [SYS_write]   3,
  [SYS_mknod]   3,
  [SYS_unlink]  1,
  [SYS_link]    2,
  [SYS_mkdir]   1,
  [SYS_close]   1,
  [SYS_date]    1,
  [SYS_alarm]   2,
  [SYS_dup2]    2
};

同时，根据数组的参数个数的记录，我们遍历栈空间去输出参数：

uint esp = curproc->tf->esp;
int nums = syscallArgs[num]; 
int i = 1;
cprintf("args: ");
if(nums == 0){
    cprintf("No Arguments");
}
while(nums >= 1){
    cprintf("0x%x ", *((int *)(esp + 4*i)));
    i++;
    nums--;
}
cprintf("\n");

Part Two: Data system call

Introduction

Your second task is to add a new system call to xv6. The main point of the exercise is for you to see some of the different pieces of the system call machinery. Your new system call will get the current UTC time and return it to the user program. You may want to use the helper function, cmostime() (defined in lapic.c), to read the real time clock. date.h contains the definition of the struct rtcdate struct, which you will provide as an argument to cmostime() as a pointer.

You should create a user-level program that calls your new date system call; here’s some source you should put in date.c:

#include "types.h"
#include "user.h"
#include "date.h"

int
main(int argc, char *argv[])
{
  struct rtcdate r;

  if (date(&r)) {
    printf(2, "date failed\n");
    exit();
  }

  // your code to print the time in any format you like...

  exit();
}

In order to make your new date program available to run from the xv6 shell, add _date to the UPROGS definition in Makefile.

Your strategy for making a date system call should be to clone all of the pieces of code that are specific to some existing system call, for example the “uptime” system call. You should grep for uptime in all the source files, using grep -n uptime *.[chS].

When you’re done, typing date to an xv6 shell prompt should print the current UTC time.

Write down a few words of explanation for each of the files you had to modify in the process of creating your date system call.

Optional challenge: add a dup2() system call and modify the shell to use it.

Solution:

第二部分要求我们去添加一个系统调用date用来打印当前日期，其中用户态的函数已经给出：

#include "types.h"
#include "user.h"
#include "date.h"

int
main(int argc, char *argv[])
{
  struct rtcdate r;

  if (date(&r)) {
    printf(2, "date failed\n");
    exit();
  }

  // your code to print the time in any format you like...

  exit();
}

需要我们写出内核态的系统调用供用户使用。

根据题目提示，我们知道date系统调用需要接受一个rctdate结构体参数，并且通过cmostime()函数获取当前的时间。因此我们要做的就是把结构体参数压入栈中，并且调用cmostime()去获得当前时间：

int
sys_date(struct rtcdate* r)
{
  if (argptr(0, (void *)&r, sizeof(*r)) < 0)
          return -1;
  cmostime(r);  //从cmos中获取时间
  return 0;
}

值得注意的是，由于rtcdate是个结构体参数，因此我们需要通过argptr而非argint来进行参数的压栈：

// Fetch the nth word-sized system call argument as a pointer
// to a block of memory of size bytes.  Check that the pointer
// lies within the process address space.
int
argptr(int n, char **pp, int size)
{
  int i;
  struct proc *curproc = myproc();
 
  if(argint(n, &i) < 0)
    return -1;
  if(size < 0 || (uint)i >= curproc->sz || (uint)i+size > curproc->sz)
    return -1;
  *pp = (char*)i;
  return 0;
}

接下来是可选挑战，可选挑战需要我们去实现一个dup2的系统调用，这个挑战有些难度，需要我们文件描述符等概念比较熟悉，在开始实现dup2()之前，我们先讲解一下dup函数：

int
sys_dup(void)
{
  struct file *f;
  int fd;

  if(argfd(0, 0, &f) < 0)
    return -1;
  if((fd=fdalloc(f)) < 0)
    return -1;
  filedup(f);
  return fd;
}

利用函数dup，我们可以复制一个描述符。传给该函数一个既有的描述符，它就会返回一个新的描述符，这个新的描述符是传给它的描述符的拷贝。这意味着，这两个描述符共享同一个数据结构。例如，如果我们对一个文件描述符执行lseek操作，得到的第一个文件的位置和第二个是一样的。

注意到sys_dup函数调用了argfd函数：

static int
argfd(int n, int *pfd, struct file **pf)
{
  int fd;
  struct file *f;

  if(argint(n, &fd) < 0)
    return -1;
  if(fd < 0 || fd >= NOFILE || (f=myproc()->ofile[fd]) == 0)
    return -1;
  if(pfd)
    *pfd = fd;
  if(pf)
    *pf = f;
  return 0;
}

在这个函数中，我们传入参数位置以及文件，其中argint函数可以通过参数的位置或者文件描述符，再由文件描述符判断传入的文件是否合理。

其中，fdalloc函数是在ofile（即打开的文件）中选择空位，如果有空位，就把文件插入进去，将数组下标作为文件描述符返回：

fdalloc(struct file *f)
{
  int fd;
  struct proc *curproc = myproc();

  for(fd = 0; fd < NOFILE; fd++){
    if(curproc->ofile[fd] == 0){
      curproc->ofile[fd] = f;
      return fd;
    }
  }
  return -1;
}

随后调用filedup函数，将当前复制文件的引用加上一：

// Increment ref count for file f.
struct file*
filedup(struct file *f)
{
  acquire(&ftable.lock);
  if(f->ref < 1)
    panic("filedup");
  f->ref++;
  release(&ftable.lock);
  return f;
}

以上就是dup的实现过程。

dup2函数跟dup函数相似，但dup2函数允许调用者规定一个有效描述符和目标描述符的id。dup2函数成功返回时，目标描述符（dup2函数的第二个参数）将变成源描述符（dup2函数的第一个参数）的复制品，换句话说，两个文件描述符现在都指向同一个文件，并且是函数第一个参数指向的文件。

而实现dup2函数就需要将第二个参数的文件描述符指向第一个参数的文件描述符，并将第二个文件描述符所指向的文件关掉。因此需要接受两个文件，老文件和新文件：

1	struct file oldfile, newfile;

其中，当有用户态切换到内核态时，新文件是第一个参数，老文件在第二个参数，因此新文件先被压入栈中，老文件后被压入栈中，因此我们需要先对参数进行判断并且取出文件描述符：

if(argfd(0, 0, &oldfile) < 0 ){
    return -1;
  }

if(argint(1, &newfd) < 0){
    return -1;
  }

其中，我们需要通过argfd函数来获取已经打开了的老文件，通过argint来获取新文件的文件描述符。

接下来需要判断新文件描述符是否超出范围：

1
2
3

if(newfd < 0 || newfd >= NOFILE){
    return -1;
  }

最后则将新的文件描述符指向老文件，需要注意的是，这里也需要进行一些不合理情况的判断：

if(currproc->ofile[newfd] == 0){
    currproc->ofile[newfd] = oldfile;
    filedup(oldfile);
    return newfd;

  }else if(argfd(1, &newfd, &newfile) < 0){
      return -1;
  }

  if(oldfile == newfile){
    return newfd;
  }

  if(oldfile->ref > 0){
    fileclose(oldfile);
  }

  currproc->ofile[newfd] = oldfile;
  filedup(oldfile);
  return newfd;