0%

MIT-6.5840 Assignment Instruction

Introduction 介绍

In this lab you'll build a MapReduce system. You'll implement a worker process that calls application Map and Reduce functions and handles reading and writing files, and a master process that hands out tasks to workers and copes with failed workers. You'll be building something similar to the MapReduce paper. 在本实验中,您将构建一个 MapReduce 系统。您将实施一个调用应用程序 Map 和 Reduce 函数并处理文件读写的工作进程,以及一个将任务分配给工作程序并处理失败的工作程序的主进程。您将构建类似于 MapReduce 论文的内容。

Collaboration Policy 合作政策

You must write all the code you hand in for 6.824, except for code that we give you as part of assignments. You are not allowed to look at anyone else's solution, and you are not allowed to look at solutions from previous years. You may discuss the assignments with other students, but you may not look at or copy each others' code. The reason for this rule is that we believe you will learn the most by designing and implementing your lab solution yourself. 您必须编写为 6.824 提交的所有代码,但我们作为作业的一部分提供给您的代码除外。你不能看别人的解决方案,也不能看前几年的解决方案。您可以与其他学生讨论作业,但不得查看或复制彼此的代码。制定此规则的原因是,我们相信您自己设计和实施实验室解决方案会学到最多。

Please do not publish your code or make it available to current or future 6.824 students. github.com repositories are public by default, so please don't put your code there unless you make the repository private. You may find it convenient to use MIT's GitHub, but be sure to create a private repository. 请不要发布您的代码或将其提供给当前或未来的 6.824 学生。 github.com 仓库默认为公有,因此除非您将仓库设为私有,否则请不要将代码放在那里。您可能会发现使用 MIT 的 GitHub 很方便,但请务必创建一个私有存储库。

Software 软件

You'll implement this lab (and all the labs) in Go. The Go web site contains lots of tutorial information. We will grade your labs using Go version 1.13; you should use 1.13 too. You can check your Go version by running go version. 您将在 Go 中实现此实验室(以及所有实验室)。Go Web 站点包含许多教程信息。我们将使用 Go 1.13 版对您的实验室进行评分;你也应该使用 1.13。您可以通过运行 go version 来检查您的 Go 版本。

We recommend that you work on the labs on your own machine, so you can use the tools, text editors, etc. that you are already familiar with. Alternatively, you can work on the labs on Athena. 我们建议您在自己的计算机上进行实验,以便您可以使用您已经熟悉的工具、文本编辑器等。或者,您可以在 Athena 上进行实验。

macOS macOS操作系统

You can use Homebrew to install Go. After installing Homebrew, run brew install go. 您可以使用 Homebrew 安装 Go。安装 Homebrew 后,运行 brew install go .

Linux Linux的

Depending on your Linux distribution, you might be able to get an up-to-date version of Go from the package repository, e.g. by running apt install golang. Otherwise, you can manually install a binary from Go's website. First, make sure that you're running a 64-bit kernel (uname -a should mention "x86_64 GNU/Linux"), and then run: 根据您的 Linux 发行版,您或许能够从包存储库获取最新版本的 Go,例如,通过运行 apt install golang .否则,您可以从 Go 的网站手动安装二进制文件。首先,确保您运行的是 64 位内核( uname -a 应该提到 “x86_64 GNU/Linux”),然后运行:

1
$ wget -qO- https://dl.google.com/go/go1.13.6.linux-amd64.tar.gz | sudo tar xz -C /usr/local

You'll need to make sure /usr/local/bin is on your PATH. 您需要确保 /usr/local/bin 在您的 PATH .

Windows 窗户

The labs probably won't work directly on Windows. If you're feeling adventurous, you can try to get them running inside Windows Subsystem for Linux and following the Linux instructions above. Otherwise, you can fall back to Athena. 实验室可能无法直接在 Windows 上运行。如果您喜欢冒险,可以尝试让它们在适用于 Linux 的 Windows 子系统中运行,并按照上面的 Linux 说明进行操作。否则,您可以回退到 Athena。

Athena 雅典娜

You can log into a public Athena host with ssh {your kerberos}@athena.dialup.mit.edu. Once you're logged in, to get Go 1.13, run: 您可以使用 ssh {your kerberos}@athena.dialup.mit.edu .登录后,要获取 Go 1.13,请运行:

1
$ setup ggo

Getting started 开始

You'll fetch the initial lab software with git (a version control system). To learn more about git, look at the Pro Git book or the git user's manual. To fetch the 6.824 lab software: 您将使用 git(一个版本控制系统)获取初始实验室软件。要了解有关 git 的更多信息,请查看 Pro Git 书籍git 用户手册。要获取 6.824 lab 软件,请执行以下操作:

1
2
3
4
5
$ git clone git://g.csail.mit.edu/6.824-golabs-2020 6.824
$ cd 6.824
$ ls
Makefile src
$

We supply you with a simple sequential mapreduce implementation in src/main/mrsequential.go. It runs the maps and reduces one at a time, in a single process. We also provide you with a couple of MapReduce applications: word-count in mrapps/wc.go, and a text indexer in mrapps/indexer.go. You can run word count sequentially as follows: 我们为您提供了一个简单的顺序 mapreduce 实现 src/main/mrsequential.go 。它在单个进程中运行映射并逐个缩减。我们还为您提供了几个 MapReduce 应用程序:中的 mrapps/wc.go word-count 和 中的 mrapps/indexer.go 文本索引器。您可以按如下方式按顺序运行字数统计:

1
2
3
4
5
6
7
8
9
10
$ cd ~/6.824
$ cd src/main
$ go build -buildmode=plugin ../mrapps/wc.go
$ rm mr-out*
$ go run mrsequential.go wc.so pg*.txt
$ more mr-out-0
A 509
ABOUT 2
ACT 8
...

mrsequential.go leaves its output in the file mr-out-0. The input is from the text files named pg-xxx.txt. mrsequential.go 将其输出保留在 file mr-out-0 .输入来自名为 pg-xxx.txt 的文本文件。

Feel free to borrow code from mrsequential.go. You should also have a look at mrapps/wc.go to see what MapReduce application code looks like. 随意从 mrsequential.go 中借用代码。您还应该查看 mrapps/wc.go MapReduce 应用程序代码的外观。

Your Job 您的工作

Your job is to implement a distributed MapReduce, consisting of two programs, the master and the worker. There will be just one master process, and one or more worker processes executing in parallel. In a real system the workers would run on a bunch of different machines, but for this lab you'll run them all on a single machine. The workers will talk to the master via RPC. Each worker process will ask the master for a task, read the task's input from one or more files, execute the task, and write the task's output to one or more files. The master should notice if a worker hasn't completed its task in a reasonable amount of time (for this lab, use ten seconds), and give the same task to a different worker. 您的工作是实现一个分布式 MapReduce,它由 两个程序,主程序和工作程序。将会有 只有一个主进程,以及一个或多个在 平行。在实际系统中,worker 将在一堆 不同的计算机,但对于本练习,您将在一台计算机上运行它们。 worker 将通过 RPC 与 master 对话。每个 worker 进程都会询问 任务的主节点,从一个或多个文件中读取任务的输入, 执行任务,并将任务的输出写入一个 或更多文件。master 应该注意到 worker 是否尚未完成 其任务在合理的时间内完成(对于本实验,请使用 10 个 秒),并将相同的任务分配给不同的 worker。

We have given you a little code to start you off. The "main" routines for the master and worker are in main/mrmaster.go and main/mrworker.go; don't change these files. You should put your implementation in mr/master.go, mr/worker.go, and mr/rpc.go. 我们为您提供了一些代码,以便您开始。master 和 worker 的 “main” 例程位于 main/mrmaster.gomain/mrworker.go ;请勿更改这些文件。您应该将实现放在 mr/master.gomr/worker.gomr/rpc.go 中。

Here's how to run your code on the word-count MapReduce application. First, make sure the word-count plugin is freshly built: 下面介绍如何在字数统计 MapReduce 应用程序上运行代码。首先,确保 word-count 插件是新构建的:

1
$ go build -buildmode=plugin ../mrapps/wc.go

In the main directory, run the master. 在该 main 目录中,运行 master.

1
2
$ rm mr-out*
$ go run mrmaster.go pg-*.txt

The pg-*.txt arguments to mrmaster.go are the input files; each file corresponds to one "split", and is the input to one Map task. 的 pg-*.txt mrmaster.go 参数是 输入文件;每个文件对应一个 “split”,并且是 input 添加到一个 Map 任务中。

In one or more other windows, run some workers: 在一个或多个其他窗口中,运行一些 worker:

1
$ go run mrworker.go wc.so

When the workers and master have finished, look at the output in mr-out-*. When you've completed the lab, the sorted union of the output files should match the sequential output, like this: 当 worker 和 master 完成后,查看输出 在 mr-out-* .完成实验后, 输出文件的排序并集应与 Sequential 匹配 output 中,如下所示:

1
2
3
4
5
$ cat mr-out-* | sort | more
A 509
ABOUT 2
ACT 8
...

We supply you with a test script in main/test-mr.sh. The tests check that the wc and indexer MapReduce applications produce the correct output when given the pg-xxx.txt files as input. The tests also check that your implementation runs the Map and Reduce tasks in parallel, and that your implementation recovers from workers that crash while running tasks. 我们在 中 main/test-mr.sh 为您提供了一个测试脚本。测试将检查 wcindexer MapReduce 应用程序在将 pg-xxx.txt 文件作为输入时是否生成正确的输出。这些测试还会检查您的实施是否并行运行 Map 和 Reduce 任务,以及您的实施是否从运行任务时崩溃的工作程序中恢复。

If you run the test script now, it will hang because the master never finishes: 如果您现在运行测试脚本,它将挂起,因为 master 从未完成:

1
2
3
$ cd ~/6.824/src/main
$ sh test-mr.sh
*** Starting wc test.

You can change ret := false to true in the Done function in mr/master.go so that the master exits immediately. Then: 您可以在 Done 函数中更改为 ret := false true, mr/master.go 以便主服务器立即退出。然后:

1
2
3
4
5
6
7
$ sh ./test-mr.sh
*** Starting wc test.
sort: No such file or directory
cmp: EOF on mr-wc-all
--- wc output is not the same as mr-correct-wc.txt
--- wc test: FAIL
$

The test script expects to see output in files named mr-out-X, one for each reduce task. The empty implementations of mr/master.go and mr/worker.go don't produce those files (or do much of anything else), so the test fails. 测试脚本希望在名为 mr-out-X 的文件中看到输出,每个 reduce 任务一个。 mr/master.go 和 的 mr/worker.go 空实现不会生成这些文件(或执行其他大部分操作),因此测试失败。

When you've finished, the test script output should look like this: 完成后,测试脚本输出应如下所示:

1
2
3
4
5
6
7
8
9
10
11
12
13
$ sh ./test-mr.sh
*** Starting wc test.
--- wc test: PASS
*** Starting indexer test.
--- indexer test: PASS
*** Starting map parallelism test.
--- map parallelism test: PASS
*** Starting reduce parallelism test.
--- reduce parallelism test: PASS
*** Starting crash test.
--- crash test: PASS
*** PASSED ALL TESTS
$

You'll also see some errors from the Go RPC package that look like 你还会看到 Go RPC 包中的一些错误,如下所示

1
2019/12/16 13:27:09 rpc.Register: method "Done" has 1 input parameters; needs exactly three

Ignore these messages. 忽略这些消息。

A few rules: 一些规则:

  • The map phase should divide the intermediate keys into buckets for nReduce reduce tasks, where nReduce is the argument that main/mrmaster.go passes to MakeMaster(). map 阶段应将中间键划分为 reduce 任务的 nReduce 存储桶,其中 nReducemain/mrmaster.go 传递给 的 MakeMaster() 参数。
  • The worker implementation should put the output of the X'th reduce task in the file mr-out-X. worker 实现应将第 X 个 reduce 任务的输出放在 file mr-out-X 中。
  • A mr-out-X file should contain one line per Reduce function output. The line should be generated with the Go "%v %v" format, called with the key and value. Have a look in main/mrsequential.go for the line commented "this is the correct format". The test script will fail if your implementation deviates too much from this format. mr-out-X 文件的每个 Reduce 函数输出应包含一行。该行应使用 Go "%v %v" 格式生成,并使用 key 和 value 调用。看看 main/mrsequential.go 评论“This is the correct format”的那一行。如果您的 implementation 与此格式有太多偏差,则测试脚本将失败。
  • You can modify mr/worker.go, mr/master.go, and mr/rpc.go. You can temporarily modify other files for testing, but make sure your code works with the original versions; we'll test with the original versions. 您可以修改 mr/worker.gomr/master.gomr/rpc.go 。您可以临时修改其他文件以进行测试,但请确保您的代码与原始版本兼容;我们将使用原始版本进行测试。
  • The worker should put intermediate Map output in files in the current directory, where your worker can later read them as input to Reduce tasks. worker 应该将中间的 Map 输出放在当前目录的文件中,以便 worker 稍后可以将它们作为 Reduce 任务的输入读取。
  • main/mrmaster.go expects mr/master.go to implement a Done() method that returns true when the MapReduce job is completely finished; at that point, mrmaster.go will exit. main/mrmaster.go mr/master.go 期望实现一个 Done() 在 MapReduce 作业完全完成时返回 true 的方法;此时, mrmaster.go 将退出。
  • When the job is completely finished, the worker processes should exit. A simple way to implement this is to use the return value from call(): if the worker fails to contact the master, it can assume that the master has exited because the job is done, and so the worker can terminate too. Depending on your design, you might also find it helpful to have a "please exit" pseudo-task that the master can give to workers. 当作业完全完成后,工作进程应退出。实现这一点的一种简单方法是使用 return value from call() : 如果 worker 没有联系 master,它可以假设 master 已经退出,因为 job 已经完成,因此 worker 也可以终止。根据您的设计,您可能还会发现 master 可以将其分配给 worker 的 “please exit” 伪任务会很有帮助。

Hints 提示

  • One way to get started is to modify mr/worker.go's Worker() to send an RPC to the master asking for a task. Then modify the master to respond with the file name of an as-yet-unstarted map task. Then modify the worker to read that file and call the application Map function, as in mrsequential.go. 一种开始的方法是修改 mr/worker.goWorker() 将 RPC 发送到 master 请求任务。然后修改主服务器以使用尚未启动的映射任务的文件名进行响应。然后修改 worker 以读取该文件并调用应用程序 Map 函数,如 mrsequential.go .

  • The application Map and Reduce functions are loaded at run-time using the Go plugin package, from files whose names end in .so. 应用程序 Map 和 Reduce 函数在运行时使用 Go 插件包从名称以 . .so

  • If you change anything in the mr/ directory, you will probably have to re-build any MapReduce plugins you use, with something like go build -buildmode=plugin ../mrapps/wc.go 如果你更改 mr/ 了目录中的任何内容,你可能必须重新构建你使用的任何 MapReduce 插件,如下所示 go build -buildmode=plugin ../mrapps/wc.go

  • This lab relies on the workers sharing a file system. That's straightforward when all workers run on the same machine, but would require a global filesystem like GFS if the workers ran on different machines. 此实验室依赖于共享文件系统的工作程序。当所有 worker 都在同一台机器上运行时,这很简单,但如果 worker 在不同机器上运行,则需要像 GFS 这样的全局文件系统。

  • A reasonable naming convention for intermediate files is mr-X-Y, where X is the Map task number, and Y is the reduce task number. 中间文件的合理命名约定是 mr-X-Y ,其中 X 是 Map 任务编号,Y 是 reduce 任务编号。

  • The worker's map task code will need a way to store intermediate key/value pairs in files in a way that can be correctly read back during reduce tasks. One possibility is to use Go's

encoding/json

 package. To write key/value pairs to a JSON file:


worker 的 map 任务代码将需要一种方法来存储中间 键/值对,以便正确读回 在 减少 任务 期间。一种可能性是使用 Go 的包 `encoding/json` 。自 将键/值对写入 JSON 文件:

1
2
3
enc := json.NewEncoder(file)
for _, kv := ... {
err := enc.Encode(&kv)
and to read such a file back: 并要读回这样的文件:
1
2
3
4
5
6
7
8
dec := json.NewDecoder(file)
for {
var kv KeyValue
if err := dec.Decode(&kv); err != nil {
break
}
kva = append(kva, kv)
}
  • The map part of your worker can use the ihash(key) function (in worker.go) to pick the reduce task for a given key. worker 的 map 部分可以使用 ihash(key) 函数 (in worker.go ) 为给定的 key 选择 reduce 任务。

  • You can steal some code from mrsequential.go for reading Map input files, for sorting intermedate key/value pairs between the Map and Reduce, and for storing Reduce output in files. 您可以从中窃取一些代码 mrsequential.go ,用于读取 Map 输入文件,在 Map 和 Reduce 之间对中间键/值对进行排序,以及将 Reduce 输出存储在文件中。

  • The master, as an RPC server, will be concurrent; don't forget to lock shared data. master 作为 RPC 服务器,将是并发的;不要忘记锁定共享数据。

  • Use Go's race detector, with go build -race and go run -race. test-mr.sh has a comment that shows you how to enable the race detector for the tests. 使用 Go 的 race 检测器,其中有 go build -racego run -racetest-mr.sh 有一个注释,向您展示如何为测试启用争用检测器。

  • Workers will sometimes need to wait, e.g. reduces can't start until the last map has finished. One possibility is for workers to periodically ask the master for work, sleeping with time.Sleep() between each request. Another possibility is for the relevant RPC handler in the master to have a loop that waits, either with time.Sleep() or sync.Cond. Go runs the handler for each RPC in its own thread, so the fact that one handler is waiting won't prevent the master from processing other RPCs. Worker 有时需要等待,例如,在最后一个 map 完成之前,减少无法启动。一种可能性是 worker 定期向 master 请求工作,在每次请求之间休 time.Sleep() 眠。另一种可能性是主服务器中的相关 RPC 处理程序有一个等待的循环,要么是 或 time.Sleep() sync.Cond 。Go 在自己的线程中运行每个 RPC 的处理程序,因此一个处理程序正在等待的事实不会阻止 master 处理其他 RPC。

  • The master can't reliably distinguish between crashed workers, workers that are alive but have stalled for some reason, and workers that are executing but too slowly to be useful. The best you can do is have the master wait for some amount of time, and then give up and re-issue the task to a different worker. For this lab, have the master wait for ten seconds; after that the master should assume the worker has died (of course, it might not have). 主节点无法可靠地区分崩溃的 worker、存活但由于某种原因而停滞的 worker 以及正在执行但太慢而无法使用的 worker。你能做的最好的事情是让 master 等待一段时间,然后放弃并将任务重新分配给不同的 worker。对于此实验室,让主服务器等待 10 秒;之后,master 应该假设 worker 已经死亡(当然,它可能没有)。

  • To test crash recovery, you can use the mrapps/crash.go application plugin. It randomly exits in the Map and Reduce functions. 要测试崩溃恢复,您可以使用 mrapps/crash.go 应用程序插件。它会在 Map 和 Reduce 函数中随机退出。

  • To ensure that nobody observes partially written files in the presence of crashes, the MapReduce paper mentions the trick of using a temporary file and atomically renaming it once it is completely written. You can use ioutil.TempFile to create a temporary file and os.Rename to atomically rename it. 为了确保在崩溃的情况下没有人观察到部分写入的文件,MapReduce 论文提到了使用临时文件并在完全写入后对其进行原子重命名的技巧。您可以使用 ioutil.TempFile 创建临时文件并 os.Rename 对其进行原子重命名。

  • test-mr.sh runs all the processes in the sub-directory mr-tmp, so if something goes wrong and you want to look at intermediate or output files, look there.
    test-mr.sh 运行 sub directory mr-tmp 中的所有进程,因此如果出现问题并且您想查看中间文件或输出文件,请查看那里。

Handin procedure Handin 程序

Before submitting, please run test-mr.sh one final time. 在提交之前,请最后一次运行 test-mr.sh

Use the make lab1 command to package your lab assignment and upload it to the class's submission website, located at https://6824.scripts.mit.edu/2020/handin.py/. 使用该 make lab1 命令打包您的实验室作业并将其上传到位于 https://6824.scripts.mit.edu/2020/handin.py/ 的课程提交网站。

You may use your MIT Certificate or request an API key via email to log in for the first time. Your API key (XXX) is displayed once you logged in, which can be used to upload lab1 from the console as follows. 首次登录时,您可以使用 MIT 证书或通过电子邮件请求 API 密钥。登录后,将显示您的 API 密钥 ( XXX ),该密钥可用于从控制台上传 lab1,如下所示。

1
2
3
$ cd ~/6.824
$ echo XXX > api.key
$ make lab1

Check the submission website to make sure it thinks you submitted this lab! 检查提交网站,确保它认为您提交了此实验!

You may submit multiple times. We will use the timestamp of your last submission for the purpose of calculating late days. 您可以多次提交。我们将使用您上次提交的时间戳来计算逾期天数。