issuekey,title,description,storypoint MESOS-313,"Report executor terminations to framework schedulers.","The Scheduler interface has a callback for executorLost, but currently it is never called.",2 MESOS-336,"Mesos slave should cache executors","The slave should be smarter about how it handles pulling down executors. In our environment, executors rarely change but the slave will always pull it down from regardless HDFS. This puts undue stress on our HDFS clusters, and is not resilient to reduced HDFS availability.",5 MESOS-343,"Expose TASK_FAILED reason to Frameworks.","We now have a message string inside TaskStatus that provides human readable information about TASK_FAILED. It would be good to add some structure to the failure reasons, for framework schedulers to act on programmatically. E.g. enum TaskFailure { EXECUTOR_OOM; EXECUTOR_OUT_OF_DISK; EXECUTOR_TERMINATED; SLAVE_LOST; etc.. }",8 MESOS-487,"Balloon framework fails to run due to bad flags","I suspect this has to do with the latest flags refactor. [vinod@smfd-bkq-03-sr4 build]$ sudo GLOG_v=1 ./bin/mesos-tests.sh --gtest_filter=""*Balloon*"" --verbose WARNING: Logging before InitGoogleLogging() is written to STDERR I0529 22:28:13.094351 31506 process.cpp:1426] libprocess is initialized on 10.37.184.103:53425 for 24 cpus I0529 22:28:13.095010 31506 logging.cpp:91] Logging to STDERR Source directory: /home/vinod/mesos Build directory: /home/vinod/mesos/build ------------------------------------------------------------- We cannot run any cgroups tests that require mounting hierarchies because you have the following hierarchies mounted: /cgroup We'll disable the CgroupsNoHierarchyTest test fixture for now. ------------------------------------------------------------- Note: Google Test filter = *Balloon*-CgroupsNoHierarchyTest.ROOT_CGROUPS_NOHIERARCHY_MountUnmountHierarchy: [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from CgroupsIsolatorTest [ RUN ] CgroupsIsolatorTest.ROOT_CGROUPS_BalloonFramework Using temporary directory '/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_BalloonFramework_pWWdE1' Launched master at 31574 Failed to load unknown flag 'build_dir' Usage: lt-mesos-master [...] Supported options: --allocation_interval=VALUE Amount of time to wait between performing (batch) allocations (e.g., 500ms, 1sec, etc) (default: 1secs) --cluster=VALUE Human readable name for the cluster, displayed in the webui --framework_sorter=VALUE Policy to use for allocating resources between a given user's frameworks. Options are the same as for user_allocator (default: drf) --[no-]help Prints this help message (default: false) --ip=VALUE IP address to listen on --log_dir=VALUE Location to put log files (no default, nothing is written to disk unless specified; does not affect logging to stderr) --logbufsecs=VALUE How many seconds to buffer log messages for (default: 0) --port=VALUE Port to listen on (default: 5050) --[no-]quiet Disable logging to stderr (default: false) --[no-]root_submissions Can root submit frameworks? (default: true) --slaves=VALUE Initial slaves that should be considered part of this cluster (or if using ZooKeeper a URL) (default: *) --user_sorter=VALUE Policy to use for allocating resources between users. May be one of: dominant_resource_fairness (drf) (default: drf) --webui_dir=VALUE Location of the webui files/assets (default: /usr/local/share/mesos/webui) --whitelist=VALUE Path to a file with a list of slaves (one per line) to advertise offers for; should be of the form: file://path/to/file (default: *) --zk=VALUE ZooKeeper URL (used for leader election amongst masters) May be one of: zk://host1:port1,host2:port2,.../path zk://username:password@host1:port1,host2:port2,.../path file://path/to/file (where file contains one of the above) (default: ) {RED}Master crashed; failing test /home/vinod/mesos/src/tests/balloon_framework_test.sh: line 31: kill: (31574) - No such process ../../src/tests/script.cpp:76: Failure Failed balloon_framework_test.sh exited with status 2 [ FAILED ] CgroupsIsolatorTest.ROOT_CGROUPS_BalloonFramework (2031 ms) [----------] 1 test from CgroupsIsolatorTest (2031 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test case ran. (2031 ms total) [ PASSED ] 0 tests. [ FAILED ] 1 test, listed below: [ FAILED ] CgroupsIsolatorTest.ROOT_CGROUPS_BalloonFramework 1 FAILED TEST ",1 MESOS-598,"Also check 'git diff --shortstat --staged' in post-reviews.py.","We current check if you have any changes before we run post-reviews.py but we don't check for staged changes which IIUC could get lost.",1 MESOS-708,"Static files missing ""Last-Modified"" HTTP headers","Static assets served by the Mesos master don't return ""Last-Modified"" HTTP headers. That means clients receive a 200 status code and re-download assets on every page request even if the assets haven't changed. Because Angular JS does most of the work, the downloading happens only when you navigate to Mesos master in your browser or use the browser's refresh. Example header for ""mesos.css"": HTTP/1.1 200 OK Date: Thu, 26 Sep 2013 17:18:52 GMT Content-Length: 1670 Content-Type: text/css Clients sometimes use the ""Date"" header for the same effect as ""Last-Modified"", but the date is always the time of the response from the server, i.e. it changes on every request and makes the assets look new every time. The ""Last-Modified"" header should be added and should be the last modified time of the file. On subsequent requests for the same files, the master should return 304 responses with no content rather than 200 with the full files. It could save clients a lot of download time since Mesos assets are rather heavyweight.",2 MESOS-723,"Expose total number of resources allocated to the slave in its endpoint","This could be useful information if there are bugs in master/slave that causes slaves to overcommit its resources.",2 MESOS-752,"SlaveRecoveryTest/0.ReconcileTasksMissingFromSlave test is flaky","[ RUN ] SlaveRecoveryTest/0.ReconcileTasksMissingFromSlave Checkpointing executor's forked pid 32281 to '/tmp/SlaveRecoveryTest_0_ReconcileTasksMissingFromSlave_NT1btb/meta/slaves/201310151913-16777343-35153-31491-0/frameworks/201310151913-16777343-35153-31491-0000/executors/0514b52f-3c17-4ee5-ba16-635198701ca2/runs/97c9e2cc-ceea-40a8-a915-aed5fed1dcb3/pids/forked.pid' Fetching resources into '/tmp/SlaveRecoveryTest_0_ReconcileTasksMissingFromSlave_NT1btb/slaves/201310151913-16777343-35153-31491-0/frameworks/201310151913-16777343-35153-31491-0000/executors/0514b52f-3c17-4ee5-ba16-635198701ca2/runs/97c9e2cc-ceea-40a8-a915-aed5fed1dcb3' Registered executor on localhost.localdomain Starting task 0514b52f-3c17-4ee5-ba16-635198701ca2 Forked command at 32317 sh -c 'sleep 10' tests/slave_recovery_tests.cpp:1927: Failure Mock function called more times than expected - returning directly. Function call: statusUpdate(0x7fffae636eb0, @0x7f1590027a00 64-byte object ) Expected: to be called once Actual: called twice - over-saturated and active Command exited with status 0 (pid: 32317) ",1 MESOS-786,"Update semantics of when framework registered()/reregistered() get called","Current semantics: 1) Framework connects w/ master very first time --> registered() 2) Framework reconnects w/ same master after a zk blip --> reregistered() 3) Framework reconnects w/ failed over master --> registered() 4) Failed over framework connects w/ same master --> registered() 5) Failed over framework connects w/ failed over master --> registered() Updated semantics: Everything same except 3) Framework reconnects w/ failed over master --> reregistered()",3 MESOS-830,"ExamplesTest.JavaFramework is flaky","Identify the cause of the following test failure: [ RUN ] ExamplesTest.JavaFramework Using temporary directory '/tmp/ExamplesTest_JavaFramework_wSc7u8' Enabling authentication for the framework I1120 15:13:39.820032 1681264640 master.cpp:285] Master started on 172.25.133.171:52576 I1120 15:13:39.820180 1681264640 master.cpp:299] Master ID: 201311201513-2877626796-52576-3234 I1120 15:13:39.820194 1681264640 master.cpp:302] Master only allowing authenticated frameworks to register! I1120 15:13:39.821197 1679654912 slave.cpp:112] Slave started on 1)@172.25.133.171:52576 I1120 15:13:39.821795 1679654912 slave.cpp:212] Slave resources: cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] I1120 15:13:39.822855 1682337792 slave.cpp:112] Slave started on 2)@172.25.133.171:52576 I1120 15:13:39.823652 1682337792 slave.cpp:212] Slave resources: cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] I1120 15:13:39.825330 1679118336 master.cpp:744] The newly elected leader is master@172.25.133.171:52576 I1120 15:13:39.825445 1679118336 master.cpp:748] Elected as the leading master! I1120 15:13:39.825907 1681264640 state.cpp:33] Recovering state from '/tmp/ExamplesTest_JavaFramework_wSc7u8/0/meta' I1120 15:13:39.826127 1681264640 status_update_manager.cpp:180] Recovering status update manager I1120 15:13:39.826331 1681801216 process_isolator.cpp:317] Recovering isolator I1120 15:13:39.826738 1682874368 slave.cpp:2743] Finished recovery I1120 15:13:39.827747 1682337792 state.cpp:33] Recovering state from '/tmp/ExamplesTest_JavaFramework_wSc7u8/1/meta' I1120 15:13:39.827945 1680191488 slave.cpp:112] Slave started on 3)@172.25.133.171:52576 I1120 15:13:39.828415 1682337792 status_update_manager.cpp:180] Recovering status update manager I1120 15:13:39.828608 1680728064 sched.cpp:260] Authenticating with master master@172.25.133.171:52576 I1120 15:13:39.828606 1680191488 slave.cpp:212] Slave resources: cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] I1120 15:13:39.828680 1682874368 slave.cpp:497] New master detected at master@172.25.133.171:52576 I1120 15:13:39.828765 1682337792 process_isolator.cpp:317] Recovering isolator I1120 15:13:39.829828 1680728064 sched.cpp:229] Detecting new master I1120 15:13:39.830288 1679654912 authenticatee.hpp:100] Initializing client SASL I1120 15:13:39.831635 1680191488 state.cpp:33] Recovering state from '/tmp/ExamplesTest_JavaFramework_wSc7u8/2/meta' I1120 15:13:39.831991 1679118336 status_update_manager.cpp:158] New master detected at master@172.25.133.171:52576 I1120 15:13:39.832042 1682874368 slave.cpp:524] Detecting new master I1120 15:13:39.832314 1682337792 slave.cpp:2743] Finished recovery I1120 15:13:39.832309 1681264640 master.cpp:1266] Attempting to register slave on vkone.local at slave(1)@172.25.133.171:52576 I1120 15:13:39.832929 1680728064 status_update_manager.cpp:180] Recovering status update manager I1120 15:13:39.833371 1681801216 slave.cpp:497] New master detected at master@172.25.133.171:52576 I1120 15:13:39.833273 1681264640 master.cpp:2513] Adding slave 201311201513-2877626796-52576-3234-0 at vkone.local with cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] I1120 15:13:39.833595 1680728064 process_isolator.cpp:317] Recovering isolator I1120 15:13:39.833859 1681801216 slave.cpp:524] Detecting new master I1120 15:13:39.833861 1682874368 status_update_manager.cpp:158] New master detected at master@172.25.133.171:52576 I1120 15:13:39.834092 1680191488 slave.cpp:542] Registered with master master@172.25.133.171:52576; given slave ID 201311201513-2877626796-52576-3234-0 I1120 15:13:39.834486 1681264640 master.cpp:1266] Attempting to register slave on vkone.local at slave(2)@172.25.133.171:52576 I1120 15:13:39.834549 1681264640 master.cpp:2513] Adding slave 201311201513-2877626796-52576-3234-1 at vkone.local with cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] I1120 15:13:39.834750 1680191488 slave.cpp:555] Checkpointing SlaveInfo to '/tmp/ExamplesTest_JavaFramework_wSc7u8/0/meta/slaves/201311201513-2877626796-52576-3234-0/slave.info' I1120 15:13:39.834875 1682874368 hierarchical_allocator_process.hpp:445] Added slave 201311201513-2877626796-52576-3234-0 (vkone.local) with cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] (and cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] available) I1120 15:13:39.835155 1680728064 slave.cpp:542] Registered with master master@172.25.133.171:52576; given slave ID 201311201513-2877626796-52576-3234-1 I1120 15:13:39.835458 1679118336 slave.cpp:2743] Finished recovery I1120 15:13:39.835739 1680728064 slave.cpp:555] Checkpointing SlaveInfo to '/tmp/ExamplesTest_JavaFramework_wSc7u8/1/meta/slaves/201311201513-2877626796-52576-3234-1/slave.info' I1120 15:13:39.835922 1682874368 hierarchical_allocator_process.hpp:445] Added slave 201311201513-2877626796-52576-3234-1 (vkone.local) with cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] (and cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] available) I1120 15:13:39.836120 1681264640 slave.cpp:497] New master detected at master@172.25.133.171:52576 I1120 15:13:39.836340 1679118336 status_update_manager.cpp:158] New master detected at master@172.25.133.171:52576 I1120 15:13:39.836436 1681264640 slave.cpp:524] Detecting new master I1120 15:13:39.836629 1682874368 master.cpp:1266] Attempting to register slave on vkone.local at slave(3)@172.25.133.171:52576 I1120 15:13:39.836653 1682874368 master.cpp:2513] Adding slave 201311201513-2877626796-52576-3234-2 at vkone.local with cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] I1120 15:13:39.836804 1680728064 slave.cpp:542] Registered with master master@172.25.133.171:52576; given slave ID 201311201513-2877626796-52576-3234-2 I1120 15:13:39.837190 1680728064 slave.cpp:555] Checkpointing SlaveInfo to '/tmp/ExamplesTest_JavaFramework_wSc7u8/2/meta/slaves/201311201513-2877626796-52576-3234-2/slave.info' I1120 15:13:39.837569 1682874368 hierarchical_allocator_process.hpp:445] Added slave 201311201513-2877626796-52576-3234-2 (vkone.local) with cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] (and cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] available) I1120 15:13:39.852011 1679654912 authenticatee.hpp:124] Creating new client SASL connection I1120 15:13:39.852219 1680191488 master.cpp:1734] Authenticating framework at scheduler(1)@172.25.133.171:52576 I1120 15:13:39.852577 1682337792 authenticator.hpp:83] Initializing server SASL I1120 15:13:39.856160 1682337792 authenticator.hpp:140] Creating new server SASL connection I1120 15:13:39.856334 1681264640 authenticatee.hpp:212] Received SASL authentication mechanisms: CRAM-MD5 I1120 15:13:39.856360 1681264640 authenticatee.hpp:238] Attempting to authenticate with mechanism 'CRAM-MD5' I1120 15:13:39.856421 1681264640 authenticator.hpp:243] Received SASL authentication start I1120 15:13:39.856487 1681264640 authenticator.hpp:325] Authentication requires more steps I1120 15:13:39.856531 1681264640 authenticatee.hpp:258] Received SASL authentication step I1120 15:13:39.856576 1681264640 authenticator.hpp:271] Received SASL authentication step I1120 15:13:39.856643 1681264640 authenticator.hpp:317] Authentication success I1120 15:13:39.856724 1681264640 authenticatee.hpp:298] Authentication success I1120 15:13:39.856768 1681264640 master.cpp:1774] Successfully authenticated framework at scheduler(1)@172.25.133.171:52576 I1120 15:13:39.857028 1681264640 sched.cpp:334] Successfully authenticated with master master@172.25.133.171:52576 I1120 15:13:39.857139 1681264640 master.cpp:798] Received registration request from scheduler(1)@172.25.133.171:52576 I1120 15:13:39.857306 1681264640 master.cpp:816] Registering framework 201311201513-2877626796-52576-3234-0000 at scheduler(1)@172.25.133.171:52576 I1120 15:13:39.862296 1680191488 hierarchical_allocator_process.hpp:332] Added framework 201311201513-2877626796-52576-3234-0000 I1120 15:13:39.863867 1680191488 master.cpp:1700] Sending 3 offers to framework 201311201513-2877626796-52576-3234-0000 Registered! ID = 201311201513-2877626796-52576-3234-0000 Launching task 0 Launching task 1 Launching task 2 I1120 15:13:39.905390 1680191488 master.cpp:2026] Processing reply for offer 201311201513-2877626796-52576-3234-0 on slave 201311201513-2877626796-52576-3234-1 (vkone.local) for framework 201311201513-2877626796-52576-3234-0000 I1120 15:13:39.905825 1680191488 master.hpp:400] Adding task 0 with resources cpus(*):1; mem(*):128 on slave 201311201513-2877626796-52576-3234-1 (vkone.local) I1120 15:13:39.905886 1680191488 master.cpp:2150] Launching task 0 of framework 201311201513-2877626796-52576-3234-0000 with resources cpus(*):1; mem(*):128 on slave 201311201513-2877626796-52576-3234-1 (vkone.local) I1120 15:13:39.906422 1680191488 master.cpp:2026] Processing reply for offer 201311201513-2877626796-52576-3234-1 on slave 201311201513-2877626796-52576-3234-2 (vkone.local) for framework 201311201513-2877626796-52576-3234-0000 I1120 15:13:39.906664 1680191488 master.hpp:400] Adding task 1 with resources cpus(*):1; mem(*):128 on slave 201311201513-2877626796-52576-3234-2 (vkone.local) I1120 15:13:39.906721 1680191488 master.cpp:2150] Launching task 1 of framework 201311201513-2877626796-52576-3234-0000 with resources cpus(*):1; mem(*):128 on slave 201311201513-2877626796-52576-3234-2 (vkone.local) I1120 15:13:39.907171 1680191488 master.cpp:2026] Processing reply for offer 201311201513-2877626796-52576-3234-2 on slave 201311201513-2877626796-52576-3234-0 (vkone.local) for framework 201311201513-2877626796-52576-3234-0000 I1120 15:13:39.907419 1680191488 master.hpp:400] Adding task 2 with resources cpus(*):1; mem(*):128 on slave 201311201513-2877626796-52576-3234-0 (vkone.local) I1120 15:13:39.907480 1680191488 master.cpp:2150] Launching task 2 of framework 201311201513-2877626796-52576-3234-0000 with resources cpus(*):1; mem(*):128 on slave 201311201513-2877626796-52576-3234-0 (vkone.local) I1120 15:13:39.907938 1680191488 slave.cpp:722] Got assigned task 0 for framework 201311201513-2877626796-52576-3234-0000 I1120 15:13:39.908473 1680191488 slave.cpp:833] Launching task 0 for framework 201311201513-2877626796-52576-3234-0000 I1120 15:13:39.914427 1682874368 slave.cpp:722] Got assigned task 1 for framework 201311201513-2877626796-52576-3234-0000 I1120 15:13:39.914594 1680728064 slave.cpp:722] Got assigned task 2 for framework 201311201513-2877626796-52576-3234-0000 I1120 15:13:39.914844 1681801216 hierarchical_allocator_process.hpp:590] Framework 201311201513-2877626796-52576-3234-0000 filtered slave 201311201513-2877626796-52576-3234-1 for 1secs I1120 15:13:39.915292 1682874368 slave.cpp:833] Launching task 1 for framework 201311201513-2877626796-52576-3234-0000 I1120 15:13:39.915424 1681801216 hierarchical_allocator_process.hpp:590] Framework 201311201513-2877626796-52576-3234-0000 filtered slave 201311201513-2877626796-52576-3234-2 for 1secs I1120 15:13:39.915685 1681801216 hierarchical_allocator_process.hpp:590] Framework 201311201513-2877626796-52576-3234-0000 filtered slave 201311201513-2877626796-52576-3234-0 for 1secs I1120 15:13:39.915828 1680728064 slave.cpp:833] Launching task 2 for framework 201311201513-2877626796-52576-3234-0000 I1120 15:13:39.917840 1680191488 slave.cpp:943] Queuing task '0' for executor default of framework '201311201513-2877626796-52576-3234-0000 I1120 15:13:39.917935 1679118336 process_isolator.cpp:100] Launching default (/Users/vinod/workspace/apache/mesos/build/src/examples/java/test-executor) in /tmp/ExamplesTest_JavaFramework_wSc7u8/1/slaves/201311201513-2877626796-52576-3234-1/frameworks/201311201513-2877626796-52576-3234-0000/executors/default/runs/375b31a9-7093-4db1-964d-e6b425b1e4b4 with resources ' for framework 201311201513-2877626796-52576-3234-0000 I1120 15:13:39.922019 1679118336 process_isolator.cpp:163] Forked executor at 3268 I1120 15:13:39.922703 1679118336 slave.cpp:2073] Monitoring executor default of framework 201311201513-2877626796-52576-3234-0000 forked at pid 3268 I1120 15:13:39.929134 1682874368 slave.cpp:943] Queuing task '1' for executor default of framework '201311201513-2877626796-52576-3234-0000 I1120 15:13:39.929323 1682874368 process_isolator.cpp:100] Launching default (/Users/vinod/workspace/apache/mesos/build/src/examples/java/test-executor) in /tmp/ExamplesTest_JavaFramework_wSc7u8/2/slaves/201311201513-2877626796-52576-3234-2/frameworks/201311201513-2877626796-52576-3234-0000/executors/default/runs/2bd0e75d-a2b9-4ae6-be08-9782612309a5 with resources ' for framework 201311201513-2877626796-52576-3234-0000 I1120 15:13:39.931243 1682874368 process_isolator.cpp:163] Forked executor at 3269 I1120 15:13:39.931612 1681801216 slave.cpp:2073] Monitoring executor default of framework 201311201513-2877626796-52576-3234-0000 forked at pid 3269 E1120 15:13:39.931836 1681801216 slave.cpp:2099] Failed to watch executor default of framework 201311201513-2877626796-52576-3234-0000: Already watched I1120 15:13:39.936460 1680728064 slave.cpp:943] Queuing task '2' for executor default of framework '201311201513-2877626796-52576-3234-0000 I1120 15:13:39.936619 1681801216 process_isolator.cpp:100] Launching default (/Users/vinod/workspace/apache/mesos/build/src/examples/java/test-executor) in /tmp/ExamplesTest_JavaFramework_wSc7u8/0/slaves/201311201513-2877626796-52576-3234-0/frameworks/201311201513-2877626796-52576-3234-0000/executors/default/runs/16d600da-da86-4614-91cb-58a7b27ab534 with resources ' for framework 201311201513-2877626796-52576-3234-0000 I1120 15:13:39.941299 1681801216 process_isolator.cpp:163] Forked executor at 3270 I1120 15:13:39.942179 1681801216 slave.cpp:2073] Monitoring executor default of framework 201311201513-2877626796-52576-3234-0000 forked at pid 3270 E1120 15:13:39.942395 1681801216 slave.cpp:2099] Failed to watch executor default of framework 201311201513-2877626796-52576-3234-0000: Already watched Fetching resources into '/tmp/ExamplesTest_JavaFramework_wSc7u8/2/slaves/201311201513-2877626796-52576-3234-2/frameworks/201311201513-2877626796-52576-3234-0000/executors/default/runs/2bd0e75d-a2b9-4ae6-be08-9782612309a5' Fetching resources into '/tmp/ExamplesTest_JavaFramework_wSc7u8/1/slaves/201311201513-2877626796-52576-3234-1/frameworks/201311201513-2877626796-52576-3234-0000/executors/default/runs/375b31a9-7093-4db1-964d-e6b425b1e4b4' Fetching resources into '/tmp/ExamplesTest_JavaFramework_wSc7u8/0/slaves/201311201513-2877626796-52576-3234-0/frameworks/201311201513-2877626796-52576-3234-0000/executors/default/runs/16d600da-da86-4614-91cb-58a7b27ab534' I1120 15:13:40.372573 1681801216 slave.cpp:1406] Got registration for executor 'default' of framework 201311201513-2877626796-52576-3234-0000 I1120 15:13:40.373258 1681801216 slave.cpp:1527] Flushing queued task 1 for executor 'default' of framework 201311201513-2877626796-52576-3234-0000 I1120 15:13:40.388317 1681801216 slave.cpp:1406] Got registration for executor 'default' of framework 201311201513-2877626796-52576-3234-0000 I1120 15:13:40.388983 1681801216 slave.cpp:1527] Flushing queued task 0 for executor 'default' of framework 201311201513-2877626796-52576-3234-0000 I1120 15:13:40.398084 1679654912 slave.cpp:1406] Got registration for executor 'default' of framework 201311201513-2877626796-52576-3234-0000 I1120 15:13:40.399344 1679654912 slave.cpp:1527] Flushing queued task 2 for executor 'default' of framework 201311201513-2877626796-52576-3234-0000 Registered executor on vkone.local I1120 15:13:40.491843 1679654912 slave.cpp:1740] Handling status update TASK_RUNNING (UUID: f04b1852-3669-444a-906f-3675f784c14f) for task 1 of framework 201311201513-2877626796-52576-3234-0000 from executor(1)@172.25.133.171:52577 I1120 15:13:40.492202 1679654912 status_update_manager.cpp:305] Received status update TASK_RUNNING (UUID: f04b1852-3669-444a-906f-3675f784c14f) for task 1 of framework 201311201513-2877626796-52576-3234-0000 I1120 15:13:40.492424 1679654912 status_update_manager.cpp:356] Forwarding status update TASK_RUNNING (UUID: f04b1852-3669-444a-906f-3675f784c14f) for task 1 of framework 201311201513-2877626796-52576-3234-0000 to master@172.25.133.171:52576 Registered executor on vkone.local I1120 15:13:40.492671 1682337792 master.cpp:1452] Status update TASK_RUNNING (UUID: f04b1852-3669-444a-906f-3675f784c14f) for task 1 of framework 201311201513-2877626796-52576-3234-0000 from slave(3)@172.25.133.171:52576 I1120 15:13:40.492735 1682337792 slave.cpp:1865] Sending acknowledgement for status update TASK_RUNNING (UUID: f04b1852-3669-444a-906f-3675f784c14f) for task 1 of framework 201311201513-2877626796-52576-3234-0000 to executor(1)@172.25.133.171:52577 Status update: task 1 is in state TASK_RUNNING I1120 15:13:40.502235 1679654912 status_update_manager.cpp:380] Received status update acknowledgement (UUID: f04b1852-3669-444a-906f-3675f784c14f) for task 1 of framework 201311201513-2877626796-52576-3234-0000 Registered executor on vkone.local I1120 15:13:40.531292 1679654912 slave.cpp:1740] Handling status update TASK_RUNNING (UUID: c19b6a5a-19ce-4613-8a5a-08fe807ff27c) for task 2 of framework 201311201513-2877626796-52576-3234-0000 from executor(1)@172.25.133.171:52579 I1120 15:13:40.532091 1680728064 status_update_manager.cpp:305] Received status update TASK_RUNNING (UUID: c19b6a5a-19ce-4613-8a5a-08fe807ff27c) for task 2 of framework 201311201513-2877626796-52576-3234-0000 I1120 15:13:40.532305 1680728064 status_update_manager.cpp:356] Forwarding status update TASK_RUNNING (UUID: c19b6a5a-19ce-4613-8a5a-08fe807ff27c) for task 2 of framework 201311201513-2877626796-52576-3234-0000 to master@172.25.133.171:52576 I1120 15:13:40.532776 1682874368 slave.cpp:1865] Sending acknowledgement for status update TASK_RUNNING (UUID: c19b6a5a-19ce-4613-8a5a-08fe807ff27c) for task 2 of framework 201311201513-2877626796-52576-3234-0000 to executor(1)@172.25.133.171:52579 I1120 15:13:40.532951 1681801216 master.cpp:1452] Status update TASK_RUNNING (UUID: c19b6a5a-19ce-4613-8a5a-08fe807ff27c) for task 2 of framework 201311201513-2877626796-52576-3234-0000 from slave(1)@172.25.133.171:52576 Status update: task 2 is in state TASK_RUNNING I1120 15:13:40.538895 1682874368 status_update_manager.cpp:380] Received status update acknowledgement (UUID: c19b6a5a-19ce-4613-8a5a-08fe807ff27c) for task 2 of framework 201311201513-2877626796-52576-3234-0000 I1120 15:13:40.541267 1682874368 slave.cpp:1740] Handling status update TASK_RUNNING (UUID: c218b0c3-d77c-4901-8570-391c330ba117) for task 0 of framework 201311201513-2877626796-52576-3234-0000 from executor(1)@172.25.133.171:52578 I1120 15:13:40.541555 1682874368 status_update_manager.cpp:305] Received status update TASK_RUNNING (UUID: c218b0c3-d77c-4901-8570-391c330ba117) for task 0 of framework 201311201513-2877626796-52576-3234-0000 I1120 15:13:40.541725 1682874368 status_update_manager.cpp:356] Forwarding status update TASK_RUNNING (UUID: c218b0c3-d77c-4901-8570-391c330ba117) for task 0 of framework 201311201513-2877626796-52576-3234-0000 to master@172.25.133.171:52576 I1120 15:13:40.542196 1682874368 master.cpp:1452] Status update TASK_RUNNING (UUID: c218b0c3-d77c-4901-8570-391c330ba117) for task 0 of framework 201311201513-2877626796-52576-3234-0000 from slave(2)@172.25.133.171:52576 I1120 15:13:40.542251 1682874368 slave.cpp:1865] Sending acknowledgement for status update TASK_RUNNING (UUID: c218b0c3-d77c-4901-8570-391c330ba117) for task 0 of framework 201311201513-2877626796-52576-3234-0000 to executor(1)@172.25.133.171:52578 Status update: task 0 is in state TASK_RUNNING I1120 15:13:40.545537 1682874368 status_update_manager.cpp:380] Received status update acknowledgement (UUID: c218b0c3-d77c-4901-8570-391c330ba117) for task 0 of...",8 MESOS-920,"Set GLOG_drop_log_memory=false in environment prior to logging initialization.","We've observed issues where the masters are slow to respond. Two perf traces collected while the masters were slow to respond: {noformat} 25.84% [kernel] [k] default_send_IPI_mask_sequence_phys 20.44% [kernel] [k] native_write_msr_safe 4.54% [kernel] [k] _raw_spin_lock 2.95% libc-2.5.so [.] _int_malloc 1.82% libc-2.5.so [.] malloc 1.55% [kernel] [k] apic_timer_interrupt 1.36% libc-2.5.so [.] _int_free {noformat} {noformat} 29.03% [kernel] [k] default_send_IPI_mask_sequence_phys 9.64% [kernel] [k] _raw_spin_lock 7.38% [kernel] [k] native_write_msr_safe 2.43% libc-2.5.so [.] _int_malloc 2.05% libc-2.5.so [.] _int_free 1.67% [kernel] [k] apic_timer_interrupt 1.58% libc-2.5.so [.] malloc {noformat} These have been found to be attributed to the posix_fadvise calls made by glog. We can disable these via the environment: {noformat} GLOG_DEFINE_bool(drop_log_memory, true, ""Drop in-memory buffers of log contents. "" ""Logs can grow very quickly and they are rarely read before they "" ""need to be evicted from memory. Instead, drop them from memory "" ""as soon as they are flushed to disk.""); {noformat} {code} if (FLAGS_drop_log_memory) { if (file_length_ >= logging::kPageSize) { // don't evict the most recent page uint32 len = file_length_ & ~(logging::kPageSize - 1); posix_fadvise(fileno(file_), 0, len, POSIX_FADV_DONTNEED); } } {code} We should set GLOG_drop_log_memory=false prior to making our call to google::InitGoogleLogging, to avoid others running into this issue.",2 MESOS-934,"'Logging and Debugging' document is out-of-date.","The following is no longer correct: http://mesos.apache.org/documentation/latest/logging-and-debugging/ We should either delete this document or re-write it entirely.",1 MESOS-976,"SlaveRecoveryTest/1.SchedulerFailover is flaky","[==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from SlaveRecoveryTest/1, where TypeParam = mesos::internal::slave::CgroupsIsolator [ RUN ] SlaveRecoveryTest/1.SchedulerFailover I0206 20:18:31.525116 56447 master.cpp:239] Master ID: 2014-02-06-20:18:31-1740121354-55566-56447 Hostname: smfd-bkq-03-sr4.devel.twitter.com I0206 20:18:31.525295 56481 master.cpp:321] Master started on 10.37.184.103:55566 I0206 20:18:31.525315 56481 master.cpp:324] Master only allowing authenticated frameworks to register! I0206 20:18:31.527093 56481 master.cpp:756] The newly elected leader is master@10.37.184.103:55566 I0206 20:18:31.527122 56481 master.cpp:764] Elected as the leading master! I0206 20:18:31.530642 56473 slave.cpp:112] Slave started on 9)@10.37.184.103:55566 I0206 20:18:31.530802 56473 slave.cpp:212] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0206 20:18:31.531203 56473 slave.cpp:240] Slave hostname: smfd-bkq-03-sr4.devel.twitter.com I0206 20:18:31.531221 56473 slave.cpp:241] Slave checkpoint: true I0206 20:18:31.531991 56482 cgroups_isolator.cpp:225] Using /tmp/mesos_test_cgroup as cgroups hierarchy root I0206 20:18:31.532470 56478 state.cpp:33] Recovering state from '/tmp/SlaveRecoveryTest_1_SchedulerFailover_7dC2N1/meta' I0206 20:18:31.532698 56469 status_update_manager.cpp:188] Recovering status update manager I0206 20:18:31.533962 56472 sched.cpp:265] Authenticating with master master@10.37.184.103:55566 I0206 20:18:31.534102 56472 sched.cpp:234] Detecting new master I0206 20:18:31.534124 56484 authenticatee.hpp:124] Creating new client SASL connection I0206 20:18:31.534299 56473 master.cpp:2317] Authenticating framework at scheduler(9)@10.37.184.103:55566 I0206 20:18:31.534459 56461 authenticator.hpp:140] Creating new server SASL connection I0206 20:18:31.534572 56466 authenticatee.hpp:212] Received SASL authentication mechanisms: CRAM-MD5 I0206 20:18:31.534595 56466 authenticatee.hpp:238] Attempting to authenticate with mechanism 'CRAM-MD5' I0206 20:18:31.534667 56474 authenticator.hpp:243] Received SASL authentication start I0206 20:18:31.534732 56474 authenticator.hpp:325] Authentication requires more steps I0206 20:18:31.534814 56468 authenticatee.hpp:258] Received SASL authentication step I0206 20:18:31.534946 56466 authenticator.hpp:271] Received SASL authentication step I0206 20:18:31.535007 56466 authenticator.hpp:317] Authentication success I0206 20:18:31.535084 56471 authenticatee.hpp:298] Authentication success I0206 20:18:31.535107 56461 master.cpp:2357] Successfully authenticated framework at scheduler(9)@10.37.184.103:55566 I0206 20:18:31.535392 56476 sched.cpp:339] Successfully authenticated with master master@10.37.184.103:55566 I0206 20:18:31.535512 56465 master.cpp:812] Received registration request from scheduler(9)@10.37.184.103:55566 I0206 20:18:31.535570 56465 master.cpp:830] Registering framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 at scheduler(9)@10.37.184.103:55566 I0206 20:18:31.535856 56465 hierarchical_allocator_process.hpp:332] Added framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 I0206 20:18:31.537802 56482 cgroups_isolator.cpp:840] Recovering isolator I0206 20:18:31.538462 56472 slave.cpp:2760] Finished recovery I0206 20:18:31.538910 56472 slave.cpp:508] New master detected at master@10.37.184.103:55566 I0206 20:18:31.539036 56478 status_update_manager.cpp:162] New master detected at master@10.37.184.103:55566 I0206 20:18:31.539223 56464 master.cpp:1834] Attempting to register slave on smfd-bkq-03-sr4.devel.twitter.com at slave(9)@10.37.184.103:55566 I0206 20:18:31.539271 56472 slave.cpp:533] Detecting new master I0206 20:18:31.539330 56464 master.cpp:2804] Adding slave 2014-02-06-20:18:31-1740121354-55566-56447-0 at smfd-bkq-03-sr4.devel.twitter.com with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0206 20:18:31.539454 56472 slave.cpp:551] Registered with master master@10.37.184.103:55566; given slave ID 2014-02-06-20:18:31-1740121354-55566-56447-0 I0206 20:18:31.539620 56472 slave.cpp:564] Checkpointing SlaveInfo to '/tmp/SlaveRecoveryTest_1_SchedulerFailover_7dC2N1/meta/slaves/2014-02-06-20:18:31-1740121354-55566-56447-0/slave.info' I0206 20:18:31.539834 56475 hierarchical_allocator_process.hpp:445] Added slave 2014-02-06-20:18:31-1740121354-55566-56447-0 (smfd-bkq-03-sr4.devel.twitter.com) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (and cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] available) I0206 20:18:31.540341 56472 master.cpp:2272] Sending 1 offers to framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 I0206 20:18:31.543433 56472 master.cpp:1568] Processing reply for offers: [ 2014-02-06-20:18:31-1740121354-55566-56447-0 ] on slave 2014-02-06-20:18:31-1740121354-55566-56447-0 (smfd-bkq-03-sr4.devel.twitter.com) for framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 I0206 20:18:31.543642 56472 master.hpp:411] Adding task d045a0bd-2ed2-410a-bd1f-5bd9219896e3 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 2014-02-06-20:18:31-1740121354-55566-56447-0 (smfd-bkq-03-sr4.devel.twitter.com) I0206 20:18:31.543781 56472 master.cpp:2441] Launching task d045a0bd-2ed2-410a-bd1f-5bd9219896e3 of framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 2014-02-06-20:18:31-1740121354-55566-56447-0 (smfd-bkq-03-sr4.devel.twitter.com) I0206 20:18:31.544002 56484 slave.cpp:736] Got assigned task d045a0bd-2ed2-410a-bd1f-5bd9219896e3 for framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 I0206 20:18:31.544097 56484 slave.cpp:2899] Checkpointing FrameworkInfo to '/tmp/SlaveRecoveryTest_1_SchedulerFailover_7dC2N1/meta/slaves/2014-02-06-20:18:31-1740121354-55566-56447-0/frameworks/2014-02-06-20:18:31-1740121354-55566-56447-0000/framework.info' I0206 20:18:31.544272 56484 slave.cpp:2906] Checkpointing framework pid 'scheduler(9)@10.37.184.103:55566' to '/tmp/SlaveRecoveryTest_1_SchedulerFailover_7dC2N1/meta/slaves/2014-02-06-20:18:31-1740121354-55566-56447-0/frameworks/2014-02-06-20:18:31-1740121354-55566-56447-0000/framework.pid' I0206 20:18:31.544617 56484 slave.cpp:845] Launching task d045a0bd-2ed2-410a-bd1f-5bd9219896e3 for framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 I0206 20:18:31.546721 56484 slave.cpp:3169] Checkpointing ExecutorInfo to '/tmp/SlaveRecoveryTest_1_SchedulerFailover_7dC2N1/meta/slaves/2014-02-06-20:18:31-1740121354-55566-56447-0/frameworks/2014-02-06-20:18:31-1740121354-55566-56447-0000/executors/d045a0bd-2ed2-410a-bd1f-5bd9219896e3/executor.info' I0206 20:18:31.547317 56484 slave.cpp:3257] Checkpointing TaskInfo to '/tmp/SlaveRecoveryTest_1_SchedulerFailover_7dC2N1/meta/slaves/2014-02-06-20:18:31-1740121354-55566-56447-0/frameworks/2014-02-06-20:18:31-1740121354-55566-56447-0000/executors/d045a0bd-2ed2-410a-bd1f-5bd9219896e3/runs/9adabe16-5d84-45c9-bc83-1a72a6d1c986/tasks/d045a0bd-2ed2-410a-bd1f-5bd9219896e3/task.info' I0206 20:18:31.547514 56484 slave.cpp:955] Queuing task 'd045a0bd-2ed2-410a-bd1f-5bd9219896e3' for executor d045a0bd-2ed2-410a-bd1f-5bd9219896e3 of framework '2014-02-06-20:18:31-1740121354-55566-56447-0000 I0206 20:18:31.547590 56481 cgroups_isolator.cpp:517] Launching d045a0bd-2ed2-410a-bd1f-5bd9219896e3 (/home/vinod/mesos/build/src/mesos-executor) in /tmp/SlaveRecoveryTest_1_SchedulerFailover_7dC2N1/slaves/2014-02-06-20:18:31-1740121354-55566-56447-0/frameworks/2014-02-06-20:18:31-1740121354-55566-56447-0000/executors/d045a0bd-2ed2-410a-bd1f-5bd9219896e3/runs/9adabe16-5d84-45c9-bc83-1a72a6d1c986 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] for framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 in cgroup mesos_test/framework_2014-02-06-20:18:31-1740121354-55566-56447-0000_executor_d045a0bd-2ed2-410a-bd1f-5bd9219896e3_tag_9adabe16-5d84-45c9-bc83-1a72a6d1c986 I0206 20:18:31.548408 56481 cgroups_isolator.cpp:717] Changing cgroup controls for executor d045a0bd-2ed2-410a-bd1f-5bd9219896e3 of framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0206 20:18:31.548833 56481 cgroups_isolator.cpp:1007] Updated 'cpu.shares' to 2048 for executor d045a0bd-2ed2-410a-bd1f-5bd9219896e3 of framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 I0206 20:18:31.549294 56481 cgroups_isolator.cpp:1117] Updated 'memory.soft_limit_in_bytes' to 1GB for executor d045a0bd-2ed2-410a-bd1f-5bd9219896e3 of framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 I0206 20:18:31.550107 56481 cgroups_isolator.cpp:1147] Updated 'memory.limit_in_bytes' to 1GB for executor d045a0bd-2ed2-410a-bd1f-5bd9219896e3 of framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 I0206 20:18:31.550571 56481 cgroups_isolator.cpp:1174] Started listening for OOM events for executor d045a0bd-2ed2-410a-bd1f-5bd9219896e3 of framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 I0206 20:18:31.551553 56481 cgroups_isolator.cpp:569] Forked executor at = 56671 Checkpointing executor's forked pid 56671 to '/tmp/SlaveRecoveryTest_1_SchedulerFailover_7dC2N1/meta/slaves/2014-02-06-20:18:31-1740121354-55566-56447-0/frameworks/2014-02-06-20:18:31-1740121354-55566-56447-0000/executors/d045a0bd-2ed2-410a-bd1f-5bd9219896e3/runs/9adabe16-5d84-45c9-bc83-1a72a6d1c986/pids/forked.pid' I0206 20:18:31.552222 56472 slave.cpp:2098] Monitoring executor d045a0bd-2ed2-410a-bd1f-5bd9219896e3 of framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 forked at pid 56671 Fetching resources into '/tmp/SlaveRecoveryTest_1_SchedulerFailover_7dC2N1/slaves/2014-02-06-20:18:31-1740121354-55566-56447-0/frameworks/2014-02-06-20:18:31-1740121354-55566-56447-0000/executors/d045a0bd-2ed2-410a-bd1f-5bd9219896e3/runs/9adabe16-5d84-45c9-bc83-1a72a6d1c986' I0206 20:18:31.604012 56472 slave.cpp:1431] Got registration for executor 'd045a0bd-2ed2-410a-bd1f-5bd9219896e3' of framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 I0206 20:18:31.604167 56472 slave.cpp:1516] Checkpointing executor pid 'executor(1)@10.37.184.103:46181' to '/tmp/SlaveRecoveryTest_1_SchedulerFailover_7dC2N1/meta/slaves/2014-02-06-20:18:31-1740121354-55566-56447-0/frameworks/2014-02-06-20:18:31-1740121354-55566-56447-0000/executors/d045a0bd-2ed2-410a-bd1f-5bd9219896e3/runs/9adabe16-5d84-45c9-bc83-1a72a6d1c986/pids/libprocess.pid' I0206 20:18:31.605183 56472 slave.cpp:1552] Flushing queued task d045a0bd-2ed2-410a-bd1f-5bd9219896e3 for executor 'd045a0bd-2ed2-410a-bd1f-5bd9219896e3' of framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 Registered executor on smfd-bkq-03-sr4.devel.twitter.com Starting task d045a0bd-2ed2-410a-bd1f-5bd9219896e3 sh -c 'sleep 1000' Forked command at 56712 I0206 20:18:31.613098 56481 slave.cpp:1765] Handling status update TASK_RUNNING (UUID: fc151a46-751b-4c4b-b048-1727752f34e3) for task d045a0bd-2ed2-410a-bd1f-5bd9219896e3 of framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 from executor(1)@10.37.184.103:46181 I0206 20:18:31.613628 56469 status_update_manager.cpp:314] Received status update TASK_RUNNING (UUID: fc151a46-751b-4c4b-b048-1727752f34e3) for task d045a0bd-2ed2-410a-bd1f-5bd9219896e3 of framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 I0206 20:18:31.614006 56469 status_update_manager.hpp:342] Checkpointing UPDATE for status update TASK_RUNNING (UUID: fc151a46-751b-4c4b-b048-1727752f34e3) for task d045a0bd-2ed2-410a-bd1f-5bd9219896e3 of framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 I0206 20:18:31.795529 56469 status_update_manager.cpp:367] Forwarding status update TASK_RUNNING (UUID: fc151a46-751b-4c4b-b048-1727752f34e3) for task d045a0bd-2ed2-410a-bd1f-5bd9219896e3 of framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 to master@10.37.184.103:55566 I0206 20:18:31.795992 56480 slave.cpp:1890] Sending acknowledgement for status update TASK_RUNNING (UUID: fc151a46-751b-4c4b-b048-1727752f34e3) for task d045a0bd-2ed2-410a-bd1f-5bd9219896e3 of framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 to executor(1)@10.37.184.103:46181 I0206 20:18:31.796131 56471 master.cpp:2020] Status update TASK_RUNNING (UUID: fc151a46-751b-4c4b-b048-1727752f34e3) for task d045a0bd-2ed2-410a-bd1f-5bd9219896e3 of framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 from slave(9)@10.37.184.103:55566 I0206 20:18:31.797099 56483 status_update_manager.cpp:392] Received status update acknowledgement (UUID: fc151a46-751b-4c4b-b048-1727752f34e3) for task d045a0bd-2ed2-410a-bd1f-5bd9219896e3 of framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 I0206 20:18:31.797165 56483 status_update_manager.hpp:342] Checkpointing ACK for status update TASK_RUNNING (UUID: fc151a46-751b-4c4b-b048-1727752f34e3) for task d045a0bd-2ed2-410a-bd1f-5bd9219896e3 of framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 I0206 20:18:31.882767 56481 slave.cpp:394] Slave terminating I0206 20:18:31.883112 56481 master.cpp:641] Slave 2014-02-06-20:18:31-1740121354-55566-56447-0 (smfd-bkq-03-sr4.devel.twitter.com) disconnected I0206 20:18:31.883200 56476 hierarchical_allocator_process.hpp:484] Slave 2014-02-06-20:18:31-1740121354-55566-56447-0 disconnected I0206 20:18:31.888206 56473 sched.cpp:265] Authenticating with master master@10.37.184.103:55566 I0206 20:18:31.888473 56473 sched.cpp:234] Detecting new master I0206 20:18:31.888556 56469 authenticatee.hpp:124] Creating new client SASL connection I0206 20:18:31.888978 56484 master.cpp:2317] Authenticating framework at scheduler(10)@10.37.184.103:55566 I0206 20:18:31.889348 56469 authenticator.hpp:140] Creating new server SASL connection I0206 20:18:31.889925 56469 authenticatee.hpp:212] Received SASL authentication mechanisms: CRAM-MD5 I0206 20:18:31.889989 56469 authenticatee.hpp:238] Attempting to authenticate with mechanism 'CRAM-MD5' I0206 20:18:31.890059 56469 authenticator.hpp:243] Received SASL authentication start I0206 20:18:31.890233 56469 authenticator.hpp:325] Authentication requires more steps I0206 20:18:31.890399 56468 authenticatee.hpp:258] Received SASL authentication step I0206 20:18:31.890554 56484 authenticator.hpp:271] Received SASL authentication step I0206 20:18:31.890630 56484 authenticator.hpp:317] Authentication success I0206 20:18:31.890728 56470 authenticatee.hpp:298] Authentication success I0206 20:18:31.890748 56484 master.cpp:2357] Successfully authenticated framework at scheduler(10)@10.37.184.103:55566 I0206 20:18:31.892210 56469 sched.cpp:339] Successfully authenticated with master master@10.37.184.103:55566 I0206 20:18:31.892410 56473 master.cpp:900] Re-registering framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 at scheduler(10)@10.37.184.103:55566 I0206 20:18:31.892460 56473 master.cpp:926] Framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 failed over W0206 20:18:31.892691 56465 master.cpp:1048] Ignoring deactivate framework message for framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 from 'scheduler(9)@10.37.184.103:55566' because it is not from the registered framework 'scheduler(10)@10.37.184.103:55566' I0206 20:18:31.897049 56466 slave.cpp:112] Slave started on 10)@10.37.184.103:55566 I0206 20:18:31.897207 56466 slave.cpp:212] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0206 20:18:31.897536 56466 slave.cpp:240] Slave hostname: smfd-bkq-03-sr4.devel.twitter.com I0206 20:18:31.897554 56466 slave.cpp:241] Slave checkpoint: true I0206 20:18:31.898388 56463 cgroups_isolator.cpp:225] Using /tmp/mesos_test_cgroup as cgroups hierarchy root I0206 20:18:31.898936 56472 state.cpp:33] Recovering state from '/tmp/SlaveRecoveryTest_1_SchedulerFailover_7dC2N1/meta' I0206 20:18:31.901702 56465 slave.cpp:2828] Recovering framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 I0206 20:18:31.901759 56465 slave.cpp:3020] Recovering executor 'd045a0bd-2ed2-410a-bd1f-5bd9219896e3' of framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 I0206 20:18:31.902716 56464 status_update_manager.cpp:188] Recovering status update manager I0206 20:18:31.902884 56464 status_update_manager.cpp:196] Recovering executor 'd045a0bd-2ed2-410a-bd1f-5bd9219896e3' of framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 I0206 20:18:34.475915 56463 cgroups_isolator.cpp:840] Recovering isolator I0206 20:18:34.476066 56463 cgroups_isolator.cpp:847] Recovering executor 'd045a0bd-2ed2-410a-bd1f-5bd9219896e3' of framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 I0206 20:18:34.477478 56463 cgroups_isolator.cpp:1174] Started listening for OOM events for executor d045a0bd-2ed2-410a-bd1f-5bd9219896e3 of framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 I0206 20:18:34.478728 56463 slave.cpp:2700] Sending reconnect request to executor d045a0bd-2ed2-410a-bd1f-5bd9219896e3 of framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 at executor(1)@10.37.184.103:46181 I0206 20:18:34.480114 56476 slave.cpp:1597] Re-registering executor d045a0bd-2ed2-410a-bd1f-5bd9219896e3 of framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 I0206 20:18:34.480566 56476 cgroups_isolator.cpp:717] Changing cgroup controls for executor d045a0bd-2ed2-410a-bd1f-5bd9219896e3 of framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0206 20:18:34.481370 56476 cgroups_isolator.cpp:1007] Updated 'cpu.shares' to 2048 for executor d045a0bd-2ed2-410a-bd1f-5bd9219896e3 of framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 I0206 20:18:34.481827 56476 cgroups_isolator.cpp:1117] Updated 'memory.soft_limit_in_bytes' to 1GB for executor d045a0bd-2ed2-410a-bd1f-5bd9219896e3 of framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 Re-registered executor on smfd-bkq-03-sr4.devel.twitter.com I0206 20:18:34.489497 56471 slave.cpp:1713] Cleaning up un-reregistered executors I0206 20:18:34.489588 56471 slave.cpp:2760] Finished recovery I0206 20:18:34.490048 56463 slave.cpp:508] New master detected at master@10.37.184.103:55566 I0206 20:18:34.490257 56475 status_update_manager.cpp:162] New master detected at master@10.37.184.103:55566 I0206 20:18:34.490357 56463 slave.cpp:533] Detecting new master W0206 20:18:34.490603 56480 master.cpp:1878] Slave at slave(10)@10.37.184.103:55566 (smfd-bkq-03-sr4.devel.twitter.com) is being allowed to re-register with an already in use id (2014-02-06-20:18:31-1740121354-55566-56447-0) I0206 20:18:34.490927 56479 slave.cpp:601] Re-registered with master master@10.37.184.103:55566 I0206 20:18:34.491322 56461 hierarchical_allocator_process.hpp:498] Slave 2014-02-06-20:18:31-1740121354-55566-56447-0 reconnected I0206 20:18:34.491421 56468 slave.cpp:1312] Updating framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 pid to scheduler(10)@10.37.184.103:55566 I0206 20:18:34.491444 56480 master.cpp:1673] Asked to kill task d045a0bd-2ed2-410a-bd1f-5bd9219896e3 of framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 I0206 20:18:34.491488 56468 slave.cpp:1320] Checkpointing framework pid 'scheduler(10)@10.37.184.103:55566' to '/tmp/SlaveRecoveryTest_1_SchedulerFailover_7dC2N1/meta/slaves/2014-02-06-20:18:31-1740121354-55566-56447-0/frameworks/2014-02-06-20:18:31-1740121354-55566-56447-0000/framework.pid' I0206 20:18:34.491497 56480 master.cpp:1707] Telling slave 2014-02-06-20:18:31-1740121354-55566-56447-0 (smfd-bkq-03-sr4.devel.twitter.com) to kill task d045a0bd-2ed2-410a-bd1f-5bd9219896e3 of framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 I0206 20:18:34.491657 56468 slave.cpp:1013] Asked to kill task d045a0bd-2ed2-410a-bd1f-5bd9219896e3 of framework 2014-02-06-20:18:31-1740121354-55566-56447-0000 Shutting down Killing process tree at pid 56712...",1 MESOS-988,"ExamplesTest.PythonFramework is flaky","Looks like a SEGFAULT during shutdown. {noformat} [ RUN ] ExamplesTest.PythonFramework Using temporary directory '/tmp/ExamplesTest_PythonFramework_RZ4yaf' WARNING: Logging before InitGoogleLogging() is written to STDERR I0211 21:14:47.861803 21045 process.cpp:1591] libprocess is initialized on 67.195.138.9:53443 for 8 cpus I0211 21:14:47.861884 21045 logging.cpp:140] Logging to STDERR I0211 21:14:47.862761 21045 master.cpp:240] Master ID: 2014-02-11-21:14:47-160088899-53443-21045 Hostname: vesta.apache.org I0211 21:14:47.862897 21054 master.cpp:322] Master started on 67.195.138.9:53443 I0211 21:14:47.862908 21054 master.cpp:325] Master only allowing authenticated frameworks to register! I0211 21:14:47.864362 21053 master.cpp:86] No whitelist given. Advertising offers for all slaves I0211 21:14:47.864506 21055 slave.cpp:112] Slave started on 1)@67.195.138.9:53443 I0211 21:14:47.864522 21059 slave.cpp:112] Slave started on 2)@67.195.138.9:53443 I0211 21:14:47.864749 21055 slave.cpp:212] Slave resources: cpus(*):8; mem(*):6961; disk(*):1.38501e+06; ports(*):[31000-32000] I0211 21:14:47.864778 21059 slave.cpp:212] Slave resources: cpus(*):8; mem(*):6961; disk(*):1.38501e+06; ports(*):[31000-32000] I0211 21:14:47.864819 21055 slave.cpp:240] Slave hostname: vesta.apache.org I0211 21:14:47.864827 21055 slave.cpp:241] Slave checkpoint: true I0211 21:14:47.864850 21059 slave.cpp:240] Slave hostname: vesta.apache.org I0211 21:14:47.864858 21059 slave.cpp:241] Slave checkpoint: true I0211 21:14:47.865329 21055 master.cpp:760] The newly elected leader is master@67.195.138.9:53443 with id 2014-02-11-21:14:47-160088899-53443-21045 I0211 21:14:47.865350 21055 master.cpp:770] Elected as the leading master! I0211 21:14:47.865399 21055 state.cpp:33] Recovering state from '/tmp/mesos-Z8v6cu/1/meta' I0211 21:14:47.865407 21059 state.cpp:33] Recovering state from '/tmp/mesos-Z8v6cu/0/meta' I0211 21:14:47.865502 21052 hierarchical_allocator_process.hpp:302] Initializing hierarchical allocator process with master : master@67.195.138.9:53443 I0211 21:14:47.865540 21054 status_update_manager.cpp:188] Recovering status update manager I0211 21:14:47.865619 21053 process_isolator.cpp:319] Recovering isolator I0211 21:14:47.865674 21057 status_update_manager.cpp:188] Recovering status update manager I0211 21:14:47.865699 21059 slave.cpp:2760] Finished recovery I0211 21:14:47.865733 21053 process_isolator.cpp:319] Recovering isolator I0211 21:14:47.865789 21053 slave.cpp:2760] Finished recovery I0211 21:14:47.865921 21059 slave.cpp:508] New master detected at master@67.195.138.9:53443 I0211 21:14:47.865958 21053 status_update_manager.cpp:162] New master detected at master@67.195.138.9:53443 I0211 21:14:47.865978 21059 slave.cpp:533] Detecting new master I0211 21:14:47.866019 21053 slave.cpp:508] New master detected at master@67.195.138.9:53443 I0211 21:14:47.866063 21053 slave.cpp:533] Detecting new master I0211 21:14:47.866070 21055 status_update_manager.cpp:162] New master detected at master@67.195.138.9:53443 I0211 21:14:47.866077 21059 master.cpp:1840] Attempting to register slave on vesta.apache.org at slave(2)@67.195.138.9:53443 I0211 21:14:47.866092 21059 master.cpp:2810] Adding slave 2014-02-11-21:14:47-160088899-53443-21045-0 at vesta.apache.org with cpus(*):8; mem(*):6961; disk(*):1.38501e+06; ports(*):[31000-32000] I0211 21:14:47.866216 21059 master.cpp:1840] Attempting to register slave on vesta.apache.org at slave(1)@67.195.138.9:53443 I0211 21:14:47.866225 21053 slave.cpp:551] Registered with master master@67.195.138.9:53443; given slave ID 2014-02-11-21:14:47-160088899-53443-21045-0 I0211 21:14:47.866228 21059 master.cpp:2810] Adding slave 2014-02-11-21:14:47-160088899-53443-21045-1 at vesta.apache.org with cpus(*):8; mem(*):6961; disk(*):1.38501e+06; ports(*):[31000-32000] I0211 21:14:47.866278 21055 hierarchical_allocator_process.hpp:445] Added slave 2014-02-11-21:14:47-160088899-53443-21045-0 (vesta.apache.org) with cpus(*):8; mem(*):6961; disk(*):1.38501e+06; ports(*):[31000-32000] (and cpus(*):8; mem(*):6961; disk(*):1.38501e+06; ports(*):[31000-32000] available) I0211 21:14:47.866297 21059 slave.cpp:551] Registered with master master@67.195.138.9:53443; given slave ID 2014-02-11-21:14:47-160088899-53443-21045-1 I0211 21:14:47.866327 21055 hierarchical_allocator_process.hpp:708] Performed allocation for slave 2014-02-11-21:14:47-160088899-53443-21045-0 in 11us I0211 21:14:47.866330 21053 slave.cpp:564] Checkpointing SlaveInfo to '/tmp/mesos-Z8v6cu/1/meta/slaves/2014-02-11-21:14:47-160088899-53443-21045-0/slave.info' I0211 21:14:47.866400 21059 slave.cpp:564] Checkpointing SlaveInfo to '/tmp/mesos-Z8v6cu/0/meta/slaves/2014-02-11-21:14:47-160088899-53443-21045-1/slave.info' I0211 21:14:47.866399 21055 hierarchical_allocator_process.hpp:445] Added slave 2014-02-11-21:14:47-160088899-53443-21045-1 (vesta.apache.org) with cpus(*):8; mem(*):6961; disk(*):1.38501e+06; ports(*):[31000-32000] (and cpus(*):8; mem(*):6961; disk(*):1.38501e+06; ports(*):[31000-32000] available) I0211 21:14:47.866423 21055 hierarchical_allocator_process.hpp:708] Performed allocation for slave 2014-02-11-21:14:47-160088899-53443-21045-1 in 2505ns I0211 21:14:47.866636 21059 slave.cpp:112] Slave started on 3)@67.195.138.9:53443 I0211 21:14:47.866727 21059 slave.cpp:212] Slave resources: cpus(*):8; mem(*):6961; disk(*):1.38501e+06; ports(*):[31000-32000] I0211 21:14:47.866766 21059 slave.cpp:240] Slave hostname: vesta.apache.org I0211 21:14:47.866772 21059 slave.cpp:241] Slave checkpoint: true I0211 21:14:47.867300 21052 state.cpp:33] Recovering state from '/tmp/mesos-Z8v6cu/2/meta' I0211 21:14:47.867368 21052 status_update_manager.cpp:188] Recovering status update manager I0211 21:14:47.867419 21055 process_isolator.cpp:319] Recovering isolator I0211 21:14:47.867544 21052 slave.cpp:2760] Finished recovery I0211 21:14:47.867729 21052 slave.cpp:508] New master detected at master@67.195.138.9:53443 I0211 21:14:47.867770 21054 status_update_manager.cpp:162] New master detected at master@67.195.138.9:53443 I0211 21:14:47.867777 21052 slave.cpp:533] Detecting new master I0211 21:14:47.867815 21055 master.cpp:1840] Attempting to register slave on vesta.apache.org at slave(3)@67.195.138.9:53443 I0211 21:14:47.867827 21055 master.cpp:2810] Adding slave 2014-02-11-21:14:47-160088899-53443-21045-2 at vesta.apache.org with cpus(*):8; mem(*):6961; disk(*):1.38501e+06; ports(*):[31000-32000] I0211 21:14:47.867885 21052 slave.cpp:551] Registered with master master@67.195.138.9:53443; given slave ID 2014-02-11-21:14:47-160088899-53443-21045-2 I0211 21:14:47.867961 21055 hierarchical_allocator_process.hpp:445] Added slave 2014-02-11-21:14:47-160088899-53443-21045-2 (vesta.apache.org) with cpus(*):8; mem(*):6961; disk(*):1.38501e+06; ports(*):[31000-32000] (and cpus(*):8; mem(*):6961; disk(*):1.38501e+06; ports(*):[31000-32000] available) I0211 21:14:47.867985 21052 slave.cpp:564] Checkpointing SlaveInfo to '/tmp/mesos-Z8v6cu/2/meta/slaves/2014-02-11-21:14:47-160088899-53443-21045-2/slave.info' I0211 21:14:47.867987 21055 hierarchical_allocator_process.hpp:708] Performed allocation for slave 2014-02-11-21:14:47-160088899-53443-21045-2 in 3308ns I0211 21:14:47.868468 21045 sched.cpp:121] Version: 0.18.0 I0211 21:14:47.868633 21055 sched.cpp:217] New master detected at master@67.195.138.9:53443 I0211 21:14:47.868651 21055 sched.cpp:268] Authenticating with master master@67.195.138.9:53443 I0211 21:14:47.868696 21055 sched.cpp:237] Detecting new master I0211 21:14:47.868708 21054 authenticatee.hpp:100] Initializing client SASL I0211 21:14:47.869549 21054 authenticatee.hpp:124] Creating new client SASL connection I0211 21:14:47.869633 21055 master.cpp:2323] Authenticating framework at scheduler(1)@67.195.138.9:53443 I0211 21:14:47.869818 21059 authenticator.hpp:83] Initializing server SASL I0211 21:14:47.870029 21059 auxprop.cpp:45] Initialized in-memory auxiliary property plugin I0211 21:14:47.870040 21059 authenticator.hpp:140] Creating new server SASL connection I0211 21:14:47.870144 21057 authenticatee.hpp:212] Received SASL authentication mechanisms: CRAM-MD5 I0211 21:14:47.870174 21057 authenticatee.hpp:238] Attempting to authenticate with mechanism 'CRAM-MD5' I0211 21:14:47.870203 21057 authenticator.hpp:243] Received SASL authentication start I0211 21:14:47.870256 21057 authenticator.hpp:325] Authentication requires more steps I0211 21:14:47.870282 21057 authenticatee.hpp:258] Received SASL authentication step I0211 21:14:47.870348 21057 authenticator.hpp:271] Received SASL authentication step I0211 21:14:47.870376 21057 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'vesta.apache.org' server FQDN: 'vesta.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0211 21:14:47.870384 21057 auxprop.cpp:153] Looking up auxiliary property '*userPassword' I0211 21:14:47.870396 21057 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0211 21:14:47.870405 21057 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'vesta.apache.org' server FQDN: 'vesta.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0211 21:14:47.870411 21057 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0211 21:14:47.870415 21057 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0211 21:14:47.870425 21057 authenticator.hpp:317] Authentication success I0211 21:14:47.870445 21057 master.cpp:2363] Successfully authenticated framework at scheduler(1)@67.195.138.9:53443 I0211 21:14:47.870448 21055 authenticatee.hpp:298] Authentication success I0211 21:14:47.870492 21055 sched.cpp:342] Successfully authenticated with master master@67.195.138.9:53443 I0211 21:14:47.870538 21057 master.cpp:818] Received registration request from scheduler(1)@67.195.138.9:53443 I0211 21:14:47.870590 21057 master.cpp:836] Registering framework 2014-02-11-21:14:47-160088899-53443-21045-0000 at scheduler(1)@67.195.138.9:53443 I0211 21:14:47.870661 21055 sched.cpp:391] Framework registered with 2014-02-11-21:14:47-160088899-53443-21045-0000 I0211 21:14:47.870661 21057 hierarchical_allocator_process.hpp:332] Added framework 2014-02-11-21:14:47-160088899-53443-21045-0000 I0211 21:14:47.870707 21057 hierarchical_allocator_process.hpp:752] Offering cpus(*):8; mem(*):6961; disk(*):1.38501e+06; ports(*):[31000-32000] on slave 2014-02-11-21:14:47-160088899-53443-21045-0 to framework 2014-02-11-21:14:47-160088899-53443-21045-0000 I0211 21:14:47.870798 21057 hierarchical_allocator_process.hpp:752] Offering cpus(*):8; mem(*):6961; disk(*):1.38501e+06; ports(*):[31000-32000] on slave 2014-02-11-21:14:47-160088899-53443-21045-1 to framework 2014-02-11-21:14:47-160088899-53443-21045-0000 I0211 21:14:47.870869 21057 hierarchical_allocator_process.hpp:752] Offering cpus(*):8; mem(*):6961; disk(*):1.38501e+06; ports(*):[31000-32000] on slave 2014-02-11-21:14:47-160088899-53443-21045-2 to framework 2014-02-11-21:14:47-160088899-53443-21045-0000 I0211 21:14:47.870894 21055 sched.cpp:405] Scheduler::registered took 222149ns I0211 21:14:47.871038 21057 hierarchical_allocator_process.hpp:688] Performed allocation for 3 slaves in 351098ns I0211 21:14:47.871106 21058 master.hpp:439] Adding offer 2014-02-11-21:14:47-160088899-53443-21045-0 with resources cpus(*):8; mem(*):6961; disk(*):1.38501e+06; ports(*):[31000-32000] on slave 2014-02-11-21:14:47-160088899-53443-21045-2 (vesta.apache.org) I0211 21:14:47.871215 21058 master.hpp:439] Adding offer 2014-02-11-21:14:47-160088899-53443-21045-1 with resources cpus(*):8; mem(*):6961; disk(*):1.38501e+06; ports(*):[31000-32000] on slave 2014-02-11-21:14:47-160088899-53443-21045-1 (vesta.apache.org) I0211 21:14:47.871296 21058 master.hpp:439] Adding offer 2014-02-11-21:14:47-160088899-53443-21045-2 with resources cpus(*):8; mem(*):6961; disk(*):1.38501e+06; ports(*):[31000-32000] on slave 2014-02-11-21:14:47-160088899-53443-21045-0 (vesta.apache.org) I0211 21:14:47.871333 21058 master.cpp:2278] Sending 3 offers to framework 2014-02-11-21:14:47-160088899-53443-21045-0000 I0211 21:14:47.873667 21055 sched.cpp:525] Scheduler::resourceOffers took 2.150843ms I0211 21:14:47.873884 21053 master.hpp:449] Removing offer 2014-02-11-21:14:47-160088899-53443-21045-0 with resources cpus(*):8; mem(*):6961; disk(*):1.38501e+06; ports(*):[31000-32000] on slave 2014-02-11-21:14:47-160088899-53443-21045-2 (vesta.apache.org) I0211 21:14:47.873934 21053 master.cpp:1574] Processing reply for offers: [ 2014-02-11-21:14:47-160088899-53443-21045-0 ] on slave 2014-02-11-21:14:47-160088899-53443-21045-2 (vesta.apache.org) for framework 2014-02-11-21:14:47-160088899-53443-21045-0000 I0211 21:14:47.874035 21053 master.hpp:411] Adding task 0 with resources cpus(*):1; mem(*):32 on slave 2014-02-11-21:14:47-160088899-53443-21045-2 (vesta.apache.org) I0211 21:14:47.874059 21053 master.cpp:2447] Launching task 0 of framework 2014-02-11-21:14:47-160088899-53443-21045-0000 with resources cpus(*):1; mem(*):32 on slave 2014-02-11-21:14:47-160088899-53443-21045-2 (vesta.apache.org) I0211 21:14:47.874150 21059 slave.cpp:736] Got assigned task 0 for framework 2014-02-11-21:14:47-160088899-53443-21045-0000 I0211 21:14:47.874200 21058 hierarchical_allocator_process.hpp:547] Framework 2014-02-11-21:14:47-160088899-53443-21045-0000 left cpus(*):7; mem(*):6929; disk(*):1.38501e+06; ports(*):[31000-32000] unused on slave 2014-02-11-21:14:47-160088899-53443-21045-2 I0211 21:14:47.874250 21053 master.hpp:449] Removing offer 2014-02-11-21:14:47-160088899-53443-21045-1 with resources cpus(*):8; mem(*):6961; disk(*):1.38501e+06; ports(*):[31000-32000] on slave 2014-02-11-21:14:47-160088899-53443-21045-1 (vesta.apache.org) I0211 21:14:47.874307 21053 master.cpp:1574] Processing reply for offers: [ 2014-02-11-21:14:47-160088899-53443-21045-1 ] on slave 2014-02-11-21:14:47-160088899-53443-21045-1 (vesta.apache.org) for framework 2014-02-11-21:14:47-160088899-53443-21045-0000 I0211 21:14:47.874322 21058 hierarchical_allocator_process.hpp:590] Framework 2014-02-11-21:14:47-160088899-53443-21045-0000 filtered slave 2014-02-11-21:14:47-160088899-53443-21045-2 for 5secs I0211 21:14:47.874354 21059 slave.cpp:845] Launching task 0 for framework 2014-02-11-21:14:47-160088899-53443-21045-0000 I0211 21:14:47.874404 21053 master.hpp:411] Adding task 1 with resources cpus(*):1; mem(*):32 on slave 2014-02-11-21:14:47-160088899-53443-21045-1 (vesta.apache.org) I0211 21:14:47.874428 21053 master.cpp:2447] Launching task 1 of framework 2014-02-11-21:14:47-160088899-53443-21045-0000 with resources cpus(*):1; mem(*):32 on slave 2014-02-11-21:14:47-160088899-53443-21045-1 (vesta.apache.org) I0211 21:14:47.874479 21058 slave.cpp:736] Got assigned task 1 for framework 2014-02-11-21:14:47-160088899-53443-21045-0000 I0211 21:14:47.874586 21053 master.hpp:449] Removing offer 2014-02-11-21:14:47-160088899-53443-21045-2 with resources cpus(*):8; mem(*):6961; disk(*):1.38501e+06; ports(*):[31000-32000] on slave 2014-02-11-21:14:47-160088899-53443-21045-0 (vesta.apache.org) I0211 21:14:47.874646 21053 master.cpp:1574] Processing reply for offers: [ 2014-02-11-21:14:47-160088899-53443-21045-2 ] on slave 2014-02-11-21:14:47-160088899-53443-21045-0 (vesta.apache.org) for framework 2014-02-11-21:14:47-160088899-53443-21045-0000 I0211 21:14:47.874690 21058 slave.cpp:845] Launching task 1 for framework 2014-02-11-21:14:47-160088899-53443-21045-0000 I0211 21:14:47.874694 21053 master.hpp:411] Adding task 2 with resources cpus(*):1; mem(*):32 on slave 2014-02-11-21:14:47-160088899-53443-21045-0 (vesta.apache.org) I0211 21:14:47.874716 21053 master.cpp:2447] Launching task 2 of framework 2014-02-11-21:14:47-160088899-53443-21045-0000 with resources cpus(*):1; mem(*):32 on slave 2014-02-11-21:14:47-160088899-53443-21045-0 (vesta.apache.org) I0211 21:14:47.874820 21053 hierarchical_allocator_process.hpp:547] Framework 2014-02-11-21:14:47-160088899-53443-21045-0000 left cpus(*):7; mem(*):6929; disk(*):1.38501e+06; ports(*):[31000-32000] unused on slave 2014-02-11-21:14:47-160088899-53443-21045-1 I0211 21:14:47.874892 21053 hierarchical_allocator_process.hpp:590] Framework 2014-02-11-21:14:47-160088899-53443-21045-0000 filtered slave 2014-02-11-21:14:47-160088899-53443-21045-1 for 5secs I0211 21:14:47.874922 21053 hierarchical_allocator_process.hpp:547] Framework 2014-02-11-21:14:47-160088899-53443-21045-0000 left cpus(*):7; mem(*):6929; disk(*):1.38501e+06; ports(*):[31000-32000] unused on slave 2014-02-11-21:14:47-160088899-53443-21045-0 I0211 21:14:47.874980 21053 hierarchical_allocator_process.hpp:590] Framework 2014-02-11-21:14:47-160088899-53443-21045-0000 filtered slave 2014-02-11-21:14:47-160088899-53443-21045-0 for 5secs I0211 21:14:47.875012 21053 slave.cpp:736] Got assigned task 2 for framework 2014-02-11-21:14:47-160088899-53443-21045-0000 I0211 21:14:47.875151 21053 slave.cpp:845] Launching task 2 for framework 2014-02-11-21:14:47-160088899-53443-21045-0000 I0211 21:14:47.875527 21059 slave.cpp:955] Queuing task '0' for executor default of framework '2014-02-11-21:14:47-160088899-53443-21045-0000 I0211 21:14:47.875608 21059 process_isolator.cpp:102] Launching default (/home/hudson/jenkins-slave/workspace/Mesos-Trunk-Ubuntu-Build-In-Src-Set-JAVA_HOME/src/examples/python/test-executor) in /tmp/mesos-Z8v6cu/2/slaves/2014-02-11-21:14:47-160088899-53443-21045-2/frameworks/2014-02-11-21:14:47-160088899-53443-21045-0000/executors/default/runs/02cdf8bd-0757-4a40-8e77-af60bb202d71 with resources cpus(*):1; mem(*):32' for framework 2014-02-11-21:14:47-160088899-53443-21045-0000 I0211 21:14:47.876787 21054 slave.cpp:469] Successfully attached file '/tmp/mesos-Z8v6cu/2/slaves/2014-02-11-21:14:47-160088899-53443-21045-2/frameworks/2014-02-11-21:14:47-160088899-53443-21045-0000/executors/default/runs/02cdf8bd-0757-4a40-8e77-af60bb202d71' I0211 21:14:47.876852 21059 process_isolator.cpp:165] Forked executor at 21061 I0211 21:14:47.876940 21058 slave.cpp:955] Queuing task '1' for executor default of framework '2014-02-11-21:14:47-160088899-53443-21045-0000 I0211 21:14:47.877095 21057 process_isolator.cpp:102] Launching default (/home/hudson/jenkins-slave/workspace/Mesos-Trunk-Ubuntu-Build-In-Src-Set-JAVA_HOME/src/examples/python/test-executor) in /tmp/mesos-Z8v6cu/0/slaves/2014-02-11-21:14:47-160088899-53443-21045-1/frameworks/2014-02-11-21:14:47-160088899-53443-21045-0000/executors/default/runs/568b657d-839d-483f-aff1-4872fbfc27dc with resources cpus(*):1; mem(*):32' for framework 2014-02-11-21:14:47-160088899-53443-21045-0000 I0211 21:14:47.877102 21052 slave.cpp:469] Successfully attached file '/tmp/mesos-Z8v6cu/0/slaves/2014-02-11-21:14:47-160088899-53443-21045-1/frameworks/2014-02-11-21:14:47-160088899-53443-21045-0000/executors/default/runs/568b657d-839d-483f-aff1-4872fbfc27dc' I0211 21:14:47.878783 21057 process_isolator.cpp:165] Forked executor at 21062 I0211 21:14:47.879032 21053 slave.cpp:955] Queuing task '2' for executor default of framework '2014-02-11-21:14:47-160088899-53443-21045-0000 I0211 21:14:47.879192 21054 slave.cpp:2098] Monitoring executor default of framework 2014-02-11-21:14:47-160088899-53443-21045-0000 forked at pid 21062 I0211 21:14:47.879192 21058 slave.cpp:469] Successfully attached file '/tmp/mesos-Z8v6cu/1/slaves/2014-02-11-21:14:47-160088899-53443-21045-0/frameworks/2014-02-11-21:14:47-160088899-53443-21045-0000/executors/default/runs/a7c4170a-f40b-4493-81b3-0ea8c70e3977' I0211 21:14:47.879166 21052 process_isolator.cpp:102] Launching default (/home/hudson/jenkins-slave/workspace/Mesos-Trunk-Ubuntu-Build-In-Src-Set-JAVA_HOME/src/examples/p...",3 MESOS-998,"Slave should wait until Containerizer::update() completes successfully","Container resources are updated in several places in the slave and we don't check the update was successful or even wait until it completes.",5 MESOS-999,"Slave should wait() and start executor registration timeout after launch ","The current code will start launch a container and wait on it before the launch is complete. We should do this only after the container has successfully launched. Likewise for the executor registration timeout.",3 MESOS-1010,"Python extension build is broken if gflags-dev is installed","In my environment mesos build from master results in broken python api module {{_mesos.so}}: {noformat} nekto0n@ya-darkstar ~/workspace/mesos/src/python $ PYTHONPATH=build/lib.linux-x86_64-2.7/ python -c ""import _mesos"" Traceback (most recent call last): File """", line 1, in ImportError: /home/nekto0n/workspace/mesos/src/python/build/lib.linux-x86_64-2.7/_mesos.so: undefined symbol: _ZN6google14FlagRegistererC1EPKcS2_S2_S2_PvS3_ {noformat} Unmangled version of symbol looks like this: {noformat} google::FlagRegisterer::FlagRegisterer(char const*, char const*, char const*, char const*, void*, void*) {noformat} During {{./configure}} step {{glog}} finds {{gflags}} development files and starts using them, thus *implicitly* adding dependency on {{libgflags.so}}. This breaks Python extensions module and perhaps can break other mesos subsystems when moved to hosts without {{gflags}} installed. This task is done when the ExamplesTest.PythonFramework test will pass on a system with gflags installed.",3 MESOS-1013,"ExamplesTest.JavaLog is flaky","The {{ExamplesTest.JavaLog}} test framework is flaky, possibly related to a race condition between mutexes. {noformat} [ RUN ] ExamplesTest.JavaLog Using temporary directory '/tmp/ExamplesTest_JavaLog_WBWEb9' Feb 18, 2014 12:10:57 PM TestLog main INFO: Starting a local ZooKeeper server ... F0218 12:10:58.575036 17450 coordinator.cpp:394] Check failed: !missing Not expecting local replica to be missing position 3 after the writing is done *** Check failure stack trace: *** tests/script.cpp:81: Failure Failed java_log_test.sh terminated with signal 'Aborted' [ FAILED ] ExamplesTest.JavaLog (2166 ms) {noformat} Full logs attached.",2 MESOS-1081,"Master should not deactivate authenticated framework/slave on new AuthenticateMessage unless new authentication succeeds.","Master should not deactivate an authenticated framework/slave upon receiving a new AuthenticateMessage unless new authentication succeeds. As it stands now, a malicious user could spoof the pid of an authenticated framework/slave and send an AuthenticateMessage to knock a valid framework/slave off the authenticated list, forcing the valid framework/slave to re-authenticate and re-register. This could be used in a DoS attack. But how should we handle the scenario when the actual authenticated framework/slave sends an AuthenticateMessage that fails authentication?",1 MESOS-1114,"Authorize task/executor launches",NULL,8 MESOS-1119,"Allocator should make an allocation decision per slave instead of per framework/role.","Currently the Allocator::allocate() code loops through roles and frameworks (based on DRF sort) and allocates *all* slaves resources to the first framework. This logic should be a bit inversed. Instead, the slave should go through each slave, allocate it a role/framework and update the DRF shares.",2 MESOS-1120,"HTTP auth for CLI","Integrate HTTP auth into the CLI programs",3 MESOS-1127,"Implement the protobufs for the scheduler API","The default scheduler/executor interface and implementation in Mesos have a few drawbacks: (1) The interface is fairly high-level which makes it hard to do certain things, for example, handle events (callbacks) in batch. This can have a big impact on the performance of schedulers (for example, writing task updates that need to be persisted). (2) The implementation requires writing a lot of boilerplate JNI and native Python wrappers when adding additional API components. The plan is to provide a lower-level API that can easily be used to implement the higher-level API that is currently provided. This will also open the door to more easily building native-language Mesos libraries (i.e., not needing the C++ shim layer) and building new higher-level abstractions on top of the lower-level API.",8 MESOS-1143,"Add a TASK_ERROR task status.","During task validation we drop tasks that have errors and send TASK_LOST status updates. In most circumstances a framework will want to relaunch a task that has gone lost, and in the event the task is actually malformed (thus invalid) this will result in an infinite loop of sending a task and having it go lost.",2 MESOS-1148,"Add support for rate limiting slave removal","To safeguard against unforeseen bugs leading to widespread slave removal, it would be nice to allow for rate limiting of the decision to remove slaves and/or send TASK_LOST messages for tasks on those slaves. Ideally this would allow an operator to be notified soon enough to intervene before causing cluster impact.",3 MESOS-1195,"systemd.slice + cgroup enablement fails in multiple ways. ","When attempting to configure mesos to use systemd slices on a 'rawhide/f21' machine, it fails creating the isolator: I0407 12:39:28.035354 14916 containerizer.cpp:180] Using isolation: cgroups/cpu,cgroups/mem Failed to create a containerizer: Could not create isolator cgroups/cpu: Failed to create isolator: The cpu subsystem is co-mounted at /sys/fs/cgroup/cpu with other subsytems ------ details ------ /sys/fs/cgroup total 0 drwxr-xr-x. 12 root root 280 Mar 18 08:47 . drwxr-xr-x. 6 root root 0 Mar 18 08:47 .. drwxr-xr-x. 2 root root 0 Mar 18 08:47 blkio lrwxrwxrwx. 1 root root 11 Mar 18 08:47 cpu -> cpu,cpuacct lrwxrwxrwx. 1 root root 11 Mar 18 08:47 cpuacct -> cpu,cpuacct drwxr-xr-x. 2 root root 0 Mar 18 08:47 cpu,cpuacct drwxr-xr-x. 2 root root 0 Mar 18 08:47 cpuset drwxr-xr-x. 2 root root 0 Mar 18 08:47 devices drwxr-xr-x. 2 root root 0 Mar 18 08:47 freezer drwxr-xr-x. 2 root root 0 Mar 18 08:47 hugetlb drwxr-xr-x. 3 root root 0 Apr 3 11:26 memory drwxr-xr-x. 2 root root 0 Mar 18 08:47 net_cls drwxr-xr-x. 2 root root 0 Mar 18 08:47 perf_event drwxr-xr-x. 4 root root 0 Mar 18 08:47 systemd ",3 MESOS-1199,"Subprocess is ""slow"" -> gated by process::reap poll interval","Subprocess uses process::reap to wait on the subprocess pid and set the exit status. However, process::reap polls with a one second interval resulting in a delay up to the interval duration before the status future is set. This means if you need to wait for the subprocess to complete you get hit with E(delay) = 0.5 seconds, independent of the execution time. For example, the MesosContainerizer uses mesos-fetcher in a Subprocess to fetch the executor during launch. At Twitter we fetch a local file, i.e., a very fast operation, but the launch is blocked until the mesos-fetcher pid is reaped -> adding 0 to 1 seconds for every launch! The problem is even worse with a chain of short Subprocesses because after the first Subprocess completes you'll be synchronized with the reap interval and you'll see nearly the full interval before notification, i.e., 10 Subprocesses each of << 1 second duration with take ~10 seconds! This has become particularly apparent in some new tests I'm working on where test durations are now greatly extended with each taking several seconds.",1 MESOS-1219,"Master should disallow frameworks that reconnect after failover timeout.","When a scheduler reconnects after the failover timeout has exceeded, the framework id is usually reused because the scheduler doesn't know that the timeout exceeded and it is actually handled as a new framework. The /framework/:framework_id route of the Web UI doesn't handle those cases very well because its key is reused. It only shows the terminated one. Would it make sense to ignore the provided framework id when a scheduler reconnects to a terminated framework and generate a new id to make sure it's unique?",2 MESOS-1236,"stout's os module uses a mix of Try and bool returns","stout's os module should use Try for return values throughout.",2 MESOS-1237,"stout's os::ls should return a Try<>","stout's os::ls returns a list that can be empty - instead it should return a Try to be consistent.",2 MESOS-1303,"ExamplesTest.{TestFramework, NoExecutorFramework} flaky","I'm having trouble reproducing this but I did observe it once on my OSX system: {noformat} [==========] Running 2 tests from 1 test case. [----------] Global test environment set-up. [----------] 2 tests from ExamplesTest [ RUN ] ExamplesTest.TestFramework ../../src/tests/script.cpp:81: Failure Failed test_framework_test.sh terminated with signal 'Abort trap: 6' [ FAILED ] ExamplesTest.TestFramework (953 ms) [ RUN ] ExamplesTest.NoExecutorFramework [ OK ] ExamplesTest.NoExecutorFramework (10162 ms) [----------] 2 tests from ExamplesTest (11115 ms total) [----------] Global test environment tear-down [==========] 2 tests from 1 test case ran. (11121 ms total) [ PASSED ] 1 test. [ FAILED ] 1 test, listed below: [ FAILED ] ExamplesTest.TestFramework {noformat} when investigating a failed make check for https://reviews.apache.org/r/20971/ {noformat} [----------] 6 tests from ExamplesTest [ RUN ] ExamplesTest.TestFramework [ OK ] ExamplesTest.TestFramework (8643 ms) [ RUN ] ExamplesTest.NoExecutorFramework tests/script.cpp:81: Failure Failed no_executor_framework_test.sh terminated with signal 'Aborted' [ FAILED ] ExamplesTest.NoExecutorFramework (7220 ms) [ RUN ] ExamplesTest.JavaFramework [ OK ] ExamplesTest.JavaFramework (11181 ms) [ RUN ] ExamplesTest.JavaException [ OK ] ExamplesTest.JavaException (5624 ms) [ RUN ] ExamplesTest.JavaLog [ OK ] ExamplesTest.JavaLog (6472 ms) [ RUN ] ExamplesTest.PythonFramework [ OK ] ExamplesTest.PythonFramework (14467 ms) [----------] 6 tests from ExamplesTest (53607 ms total) {noformat}",1 MESOS-1307,"Authorize offer allocations","When frameworks register or reregister they should authorize their roles. Split register framework / reregister framework. ",8 MESOS-1316,"Implement decent unit test coverage for the mesos-fetcher tool","There are current no tests that cover the {{mesos-fetcher}} tool itself, and hence bugs like MESOS-1313 have accidentally slipped though.",2 MESOS-1332,"Improve Master and Slave metric names","As we move the metrics to a new endpoint, we should consider revisiting the names of some of the current metrics to make them clearer. It may also be worth considering changing some existing counter-style metrics to gauges. ",3 MESOS-1339,"Add ""per-framework-principal"" counters for all messages from a scheduler on Master","Framework::principal is used identify one or more frameworks. If multiple frameworks use the same principal they'll have one counter showing their combined message count.",3 MESOS-1344,"Add flags support for JSON",NULL,2 MESOS-1347,"GarbageCollectorIntegrationTest.DiskUsage is flaky.","From Jenkins: https://builds.apache.org/job/Mesos-Ubuntu-distcheck/79/consoleFull {noformat} [ RUN ] GarbageCollectorIntegrationTest.DiskUsage Using temporary directory '/tmp/GarbageCollectorIntegrationTest_DiskUsage_pU3Ym7' I0507 03:27:38.775058 5758 leveldb.cpp:174] Opened db in 44.343989ms I0507 03:27:38.787498 5758 leveldb.cpp:181] Compacted db in 12.411065ms I0507 03:27:38.787533 5758 leveldb.cpp:196] Created db iterator in 4008ns I0507 03:27:38.787545 5758 leveldb.cpp:202] Seeked to beginning of db in 598ns I0507 03:27:38.787552 5758 leveldb.cpp:271] Iterated through 0 keys in the db in 173ns I0507 03:27:38.787564 5758 replica.cpp:741] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0507 03:27:38.787858 5777 recover.cpp:425] Starting replica recovery I0507 03:27:38.788352 5793 master.cpp:267] Master 20140507-032738-453759884-58462-5758 (hemera.apache.org) started on 140.211.11.27:58462 I0507 03:27:38.788377 5793 master.cpp:304] Master only allowing authenticated frameworks to register I0507 03:27:38.788383 5793 master.cpp:309] Master only allowing authenticated slaves to register I0507 03:27:38.788389 5793 credentials.hpp:35] Loading credentials for authentication I0507 03:27:38.789064 5779 recover.cpp:451] Replica is in EMPTY status W0507 03:27:38.789115 5793 credentials.hpp:48] Failed to stat credentials file 'file:///tmp/GarbageCollectorIntegrationTest_DiskUsage_pU3Ym7/credentials': No such file or directory I0507 03:27:38.789489 5779 master.cpp:104] No whitelist given. Advertising offers for all slaves I0507 03:27:38.789531 5778 hierarchical_allocator_process.hpp:301] Initializing hierarchical allocator process with master : master@140.211.11.27:58462 I0507 03:27:38.791007 5788 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I0507 03:27:38.791177 5780 master.cpp:921] The newly elected leader is master@140.211.11.27:58462 with id 20140507-032738-453759884-58462-5758 I0507 03:27:38.791198 5780 master.cpp:931] Elected as the leading master! I0507 03:27:38.791205 5780 master.cpp:752] Recovering from registrar I0507 03:27:38.791251 5796 recover.cpp:188] Received a recover response from a replica in EMPTY status I0507 03:27:38.791323 5797 registrar.cpp:313] Recovering registrar I0507 03:27:38.792137 5795 recover.cpp:542] Updating replica status to STARTING I0507 03:27:38.807531 5781 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 15.124092ms I0507 03:27:38.807559 5781 replica.cpp:320] Persisted replica status to STARTING I0507 03:27:38.807621 5781 recover.cpp:451] Replica is in STARTING status I0507 03:27:38.809319 5799 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I0507 03:27:38.809983 5795 recover.cpp:188] Received a recover response from a replica in STARTING status I0507 03:27:38.811204 5778 recover.cpp:542] Updating replica status to VOTING I0507 03:27:38.827595 5795 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 16.011355ms I0507 03:27:38.827627 5795 replica.cpp:320] Persisted replica status to VOTING I0507 03:27:38.827683 5795 recover.cpp:556] Successfully joined the Paxos group I0507 03:27:38.827775 5795 recover.cpp:440] Recover process terminated I0507 03:27:38.828966 5780 log.cpp:656] Attempting to start the writer I0507 03:27:38.831114 5782 replica.cpp:474] Replica received implicit promise request with proposal 1 I0507 03:27:38.847708 5782 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 16.573137ms I0507 03:27:38.847739 5782 replica.cpp:342] Persisted promised to 1 I0507 03:27:38.848141 5797 coordinator.cpp:230] Coordinator attemping to fill missing position I0507 03:27:38.849684 5790 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2 I0507 03:27:38.863777 5790 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 14.076775ms I0507 03:27:38.863801 5790 replica.cpp:676] Persisted action at 0 I0507 03:27:38.864915 5798 replica.cpp:508] Replica received write request for position 0 I0507 03:27:38.864949 5798 leveldb.cpp:436] Reading position from leveldb took 11807ns I0507 03:27:38.879945 5798 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 14.978446ms I0507 03:27:38.879976 5798 replica.cpp:676] Persisted action at 0 I0507 03:27:38.880491 5797 replica.cpp:655] Replica received learned notice for position 0 I0507 03:27:38.895969 5797 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 15.459949ms I0507 03:27:38.895992 5797 replica.cpp:676] Persisted action at 0 I0507 03:27:38.896003 5797 replica.cpp:661] Replica learned NOP action at position 0 I0507 03:27:38.896411 5783 log.cpp:672] Writer started with ending position 0 I0507 03:27:38.898058 5798 leveldb.cpp:436] Reading position from leveldb took 11910ns I0507 03:27:38.899749 5777 registrar.cpp:346] Successfully fetched the registry (0B) I0507 03:27:38.899766 5777 registrar.cpp:422] Attempting to update the 'registry' I0507 03:27:38.901458 5791 log.cpp:680] Attempting to append 137 bytes to the log I0507 03:27:38.901666 5780 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 1 I0507 03:27:38.902773 5783 replica.cpp:508] Replica received write request for position 1 I0507 03:27:38.916127 5783 leveldb.cpp:341] Persisting action (156 bytes) to leveldb took 13.225715ms I0507 03:27:38.916152 5783 replica.cpp:676] Persisted action at 1 I0507 03:27:38.916534 5790 replica.cpp:655] Replica received learned notice for position 1 I0507 03:27:38.928203 5790 leveldb.cpp:341] Persisting action (158 bytes) to leveldb took 11.652434ms I0507 03:27:38.928225 5790 replica.cpp:676] Persisted action at 1 I0507 03:27:38.928236 5790 replica.cpp:661] Replica learned APPEND action at position 1 I0507 03:27:38.928546 5790 registrar.cpp:479] Successfully updated 'registry' I0507 03:27:38.928642 5790 registrar.cpp:372] Successfully recovered registrar I0507 03:27:38.929044 5783 master.cpp:779] Recovered 0 slaves from the Registry (99B) ; allowing 10mins for slaves to re-register I0507 03:27:38.929502 5799 log.cpp:699] Attempting to truncate the log to 1 I0507 03:27:38.929888 5797 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 2 I0507 03:27:38.930161 5781 replica.cpp:508] Replica received write request for position 2 I0507 03:27:38.932977 5789 slave.cpp:140] Slave started on 56)@140.211.11.27:58462 I0507 03:27:38.932991 5789 credentials.hpp:35] Loading credentials for authentication W0507 03:27:38.933567 5789 credentials.hpp:48] Failed to stat credentials file 'file:///tmp/GarbageCollectorIntegrationTest_DiskUsage_A9Pxks/credential': No such file or directory I0507 03:27:38.933585 5789 slave.cpp:230] Slave using credential for: test-principal I0507 03:27:38.933765 5789 slave.cpp:243] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0507 03:27:38.933854 5789 slave.cpp:271] Slave hostname: hemera.apache.org I0507 03:27:38.933863 5789 slave.cpp:272] Slave checkpoint: false I0507 03:27:38.934239 5778 state.cpp:33] Recovering state from '/tmp/GarbageCollectorIntegrationTest_DiskUsage_A9Pxks/meta' I0507 03:27:38.934960 5792 status_update_manager.cpp:193] Recovering status update manager I0507 03:27:38.935123 5779 slave.cpp:2945] Finished recovery I0507 03:27:38.936998 5779 slave.cpp:526] New master detected at master@140.211.11.27:58462 I0507 03:27:38.937021 5779 slave.cpp:586] Authenticating with master master@140.211.11.27:58462 I0507 03:27:38.937077 5798 status_update_manager.cpp:167] New master detected at master@140.211.11.27:58462 I0507 03:27:38.937306 5779 slave.cpp:559] Detecting new master I0507 03:27:38.937335 5800 authenticatee.hpp:128] Creating new client SASL connection I0507 03:27:38.938030 5778 master.cpp:2798] Authenticating slave(56)@140.211.11.27:58462 I0507 03:27:38.938742 5783 authenticator.hpp:148] Creating new server SASL connection I0507 03:27:38.939312 5786 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5 I0507 03:27:38.939340 5786 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5' I0507 03:27:38.939390 5786 authenticator.hpp:254] Received SASL authentication start I0507 03:27:38.939553 5786 authenticator.hpp:342] Authentication requires more steps I0507 03:27:38.939592 5786 authenticatee.hpp:265] Received SASL authentication step I0507 03:27:38.939715 5786 authenticator.hpp:282] Received SASL authentication step I0507 03:27:38.939803 5786 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'hemera.apache.org' server FQDN: 'hemera.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0507 03:27:38.939821 5786 auxprop.cpp:153] Looking up auxiliary property '*userPassword' I0507 03:27:38.939831 5786 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0507 03:27:38.939841 5786 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'hemera.apache.org' server FQDN: 'hemera.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0507 03:27:38.939851 5786 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0507 03:27:38.939857 5786 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0507 03:27:38.939870 5786 authenticator.hpp:334] Authentication success I0507 03:27:38.939937 5786 authenticatee.hpp:305] Authentication success I0507 03:27:38.940016 5778 master.cpp:2838] Successfully authenticated slave(56)@140.211.11.27:58462 I0507 03:27:38.940449 5799 slave.cpp:643] Successfully authenticated with master master@140.211.11.27:58462 I0507 03:27:38.940513 5799 slave.cpp:872] Will retry registration in 5.176207635secs if necessary I0507 03:27:38.940625 5794 master.cpp:2134] Registering slave at slave(56)@140.211.11.27:58462 (hemera.apache.org) with id 20140507-032738-453759884-58462-5758-0 I0507 03:27:38.940800 5796 registrar.cpp:422] Attempting to update the 'registry' I0507 03:27:38.940850 5781 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 10.659152ms I0507 03:27:38.940871 5781 replica.cpp:676] Persisted action at 2 I0507 03:27:38.941843 5788 replica.cpp:655] Replica received learned notice for position 2 I0507 03:27:38.953193 5788 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 11.291343ms I0507 03:27:38.953258 5788 leveldb.cpp:399] Deleting ~1 keys from leveldb took 33725ns I0507 03:27:38.953274 5788 replica.cpp:676] Persisted action at 2 I0507 03:27:38.953282 5788 replica.cpp:661] Replica learned TRUNCATE action at position 2 I0507 03:27:38.953541 5797 log.cpp:680] Attempting to append 330 bytes to the log I0507 03:27:38.953614 5797 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 3 I0507 03:27:38.954731 5789 replica.cpp:508] Replica received write request for position 3 I0507 03:27:38.965240 5789 leveldb.cpp:341] Persisting action (349 bytes) to leveldb took 10.489719ms I0507 03:27:38.965261 5789 replica.cpp:676] Persisted action at 3 I0507 03:27:38.966253 5780 replica.cpp:655] Replica received learned notice for position 3 I0507 03:27:38.977375 5780 leveldb.cpp:341] Persisting action (351 bytes) to leveldb took 11.098798ms I0507 03:27:38.977408 5780 replica.cpp:676] Persisted action at 3 I0507 03:27:38.977421 5780 replica.cpp:661] Replica learned APPEND action at position 3 I0507 03:27:38.977859 5792 registrar.cpp:479] Successfully updated 'registry' I0507 03:27:38.977926 5780 log.cpp:699] Attempting to truncate the log to 3 I0507 03:27:38.978060 5792 master.cpp:2174] Registered slave 20140507-032738-453759884-58462-5758-0 at slave(56)@140.211.11.27:58462 (hemera.apache.org) I0507 03:27:38.978112 5792 master.cpp:3283] Adding slave 20140507-032738-453759884-58462-5758-0 at slave(56)@140.211.11.27:58462 (hemera.apache.org) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0507 03:27:38.978134 5784 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 4 I0507 03:27:38.978508 5785 slave.cpp:676] Registered with master master@140.211.11.27:58462; given slave ID 20140507-032738-453759884-58462-5758-0 I0507 03:27:38.978631 5786 hierarchical_allocator_process.hpp:444] Added slave 20140507-032738-453759884-58462-5758-0 (hemera.apache.org) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (and cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] available) I0507 03:27:38.978677 5786 hierarchical_allocator_process.hpp:707] Performed allocation for slave 20140507-032738-453759884-58462-5758-0 in 5421ns I0507 03:27:38.979872 5796 replica.cpp:508] Replica received write request for position 4 I0507 03:27:38.982084 5758 sched.cpp:121] Version: 0.19.0 I0507 03:27:38.982213 5789 sched.cpp:217] New master detected at master@140.211.11.27:58462 I0507 03:27:38.982228 5789 sched.cpp:268] Authenticating with master master@140.211.11.27:58462 I0507 03:27:38.982347 5788 authenticatee.hpp:128] Creating new client SASL connection I0507 03:27:38.982676 5788 master.cpp:2798] Authenticating scheduler(59)@140.211.11.27:58462 I0507 03:27:38.983100 5788 authenticator.hpp:148] Creating new server SASL connection I0507 03:27:38.983294 5788 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5 I0507 03:27:38.983312 5788 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5' I0507 03:27:38.983360 5788 authenticator.hpp:254] Received SASL authentication start I0507 03:27:38.983505 5788 authenticator.hpp:342] Authentication requires more steps I0507 03:27:38.984220 5782 authenticatee.hpp:265] Received SASL authentication step I0507 03:27:38.984275 5782 authenticator.hpp:282] Received SASL authentication step I0507 03:27:38.984315 5782 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'hemera.apache.org' server FQDN: 'hemera.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0507 03:27:38.984347 5782 auxprop.cpp:153] Looking up auxiliary property '*userPassword' I0507 03:27:38.984359 5782 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0507 03:27:38.984370 5782 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'hemera.apache.org' server FQDN: 'hemera.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0507 03:27:38.984377 5782 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0507 03:27:38.984383 5782 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0507 03:27:38.984397 5782 authenticator.hpp:334] Authentication success I0507 03:27:38.984429 5782 authenticatee.hpp:305] Authentication success I0507 03:27:38.984469 5795 master.cpp:2838] Successfully authenticated scheduler(59)@140.211.11.27:58462 I0507 03:27:38.985110 5782 sched.cpp:342] Successfully authenticated with master master@140.211.11.27:58462 I0507 03:27:38.985133 5782 sched.cpp:461] Sending registration request to master@140.211.11.27:58462 I0507 03:27:38.985326 5795 master.cpp:980] Received registration request from scheduler(59)@140.211.11.27:58462 I0507 03:27:38.985357 5795 master.cpp:998] Registering framework 20140507-032738-453759884-58462-5758-0000 at scheduler(59)@140.211.11.27:58462 I0507 03:27:38.985424 5795 sched.cpp:392] Framework registered with 20140507-032738-453759884-58462-5758-0000 I0507 03:27:38.985471 5792 hierarchical_allocator_process.hpp:331] Added framework 20140507-032738-453759884-58462-5758-0000 I0507 03:27:38.985610 5795 sched.cpp:406] Scheduler::registered took 36702ns I0507 03:27:38.985646 5792 hierarchical_allocator_process.hpp:751] Offering cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140507-032738-453759884-58462-5758-0 to framework 20140507-032738-453759884-58462-5758-0000 I0507 03:27:38.985954 5792 hierarchical_allocator_process.hpp:687] Performed allocation for 1 slaves in 330895ns I0507 03:27:38.986001 5789 master.hpp:612] Adding offer 20140507-032738-453759884-58462-5758-0 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140507-032738-453759884-58462-5758-0 (hemera.apache.org) I0507 03:27:38.986090 5789 master.cpp:2747] Sending 1 offers to framework 20140507-032738-453759884-58462-5758-0000 I0507 03:27:38.986548 5792 sched.cpp:529] Scheduler::resourceOffers took 162873ns I0507 03:27:38.986721 5792 master.hpp:622] Removing offer 20140507-032738-453759884-58462-5758-0 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140507-032738-453759884-58462-5758-0 (hemera.apache.org) I0507 03:27:38.986781 5792 master.cpp:1812] Processing reply for offers: [ 20140507-032738-453759884-58462-5758-0 ] on slave 20140507-032738-453759884-58462-5758-0 at slave(56)@140.211.11.27:58462 (hemera.apache.org) for framework 20140507-032738-453759884-58462-5758-0000 I0507 03:27:38.986843 5792 master.hpp:584] Adding task 0 with resources cpus(*):2; mem(*):1024 on slave 20140507-032738-453759884-58462-5758-0 (hemera.apache.org) I0507 03:27:38.986876 5792 master.cpp:2922] Launching task 0 of framework 20140507-032738-453759884-58462-5758-0000 with resources cpus(*):2; mem(*):1024 on slave 20140507-032738-453759884-58462-5758-0 at slave(56)@140.211.11.27:58462 (hemera.apache.org) I0507 03:27:38.986981 5795 slave.cpp:906] Got assigned task 0 for framework 20140507-032738-453759884-58462-5758-0000 I0507 03:27:38.987180 5795 slave.cpp:1016] Launching task 0 for framework 20140507-032738-453759884-58462-5758-0000 I0507 03:27:38.987203 5787 hierarchical_allocator_process.hpp:546] Framework 20140507-032738-453759884-58462-5758-0000 left disk(*):1024; ports(*):[31000-32000] unused on slave 20140507-032738-453759884-58462-5758-0 I0507 03:27:38.987287 5787 hierarchical_allocator_process.hpp:589] Framework 20140507-032738-453759884-58462-5758-0000 filtered slave 20140507-032738-453759884-58462-5758-0 for 5secs I0507 03:27:38.991395 5795 exec.cpp:131] Version: 0.19.0 I0507 03:27:38.991497 5779 exec.cpp:181] Executor started at: executor(27)@140.211.11.27:58462 with pid 5758 I0507 03:27:38.991510 5795 slave.cpp:1126] Queuing task '0' for executor default of framework '20140507-032738-453759884-58462-5758-0000 I0507 03:27:38.991566 5795 slave.cpp:487] Successfully attached file '/tmp/GarbageCollectorIntegrationTest_DiskUsage_A9Pxks/slaves/20140507-032738-453759884-58462-5758-0/frameworks/20140507-032738-453759884-58462-5758-0000/executors/default/runs/de776bec-2822-4bbc-befc-eec40eb5f674' I0507 03:27:38.991595 5795 slave.cpp:2283] Monitoring executor 'default' of framework '20140507-032738-453759884-58462-5758-0000' in container 'de776bec-2822-4bbc-befc-eec40eb5f674' I0507 03:27:38.991778 5795 slave.cpp:1599] Got registration for executor 'default' of framework 20140507-032738-453759884-58462-5758-0000 I0507 03:27:38.991874 5795 slave.cpp:1718] Flushing queued task 0 for executor 'default' of framework 20140507-032738-453759884-58462-5758-0000 I0507 03:27:38.991935 5780 exec.cpp:205] Executor registered on slave 20140507-032738-453759884-58462-5758-0 I0507 03:27:38.993419 5796 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 13.489998ms I0507 03:27:38.993449 5796 replica.cpp:676] Persisted action at 4 I0507 03:27:38.994510 5777 replica.cpp:655] Replica received learned notice for position 4 I0507 03:27:...",2 MESOS-1358,"Show when the leading master was elected in the webui","This would be nice to have during debugging.",1 MESOS-1365,"SlaveRecoveryTest/0.MultipleFrameworks is flaky","--gtest_repeat=-1 --gtest_shuffle --gtest_break_on_failure {noformat} [ RUN ] SlaveRecoveryTest/0.MultipleFrameworks WARNING: Logging before InitGoogleLogging() is written to STDERR I0513 15:42:05.931761 4320 exec.cpp:131] Version: 0.19.0 I0513 15:42:05.936698 4340 exec.cpp:205] Executor registered on slave 20140513-154204-16842879-51872-13062-0 Registered executor on artoo Starting task 51991f97-f5fd-4905-ad0f-02668083af7c Forked command at 4367 sh -c 'sleep 1000' WARNING: Logging before InitGoogleLogging() is written to STDERR I0513 15:42:06.915061 4408 exec.cpp:131] Version: 0.19.0 I0513 15:42:06.931149 4435 exec.cpp:205] Executor registered on slave 20140513-154204-16842879-51872-13062-0 Registered executor on artoo Starting task eaf5d8d6-3a6c-4ee1-84c1-fae20fb1df83 sh -c 'sleep 1000' Forked command at 4439 I0513 15:42:06.998332 4340 exec.cpp:251] Received reconnect request from slave 20140513-154204-16842879-51872-13062-0 I0513 15:42:06.998414 4436 exec.cpp:251] Received reconnect request from slave 20140513-154204-16842879-51872-13062-0 I0513 15:42:07.006350 4437 exec.cpp:228] Executor re-registered on slave 20140513-154204-16842879-51872-13062-0 Re-registered executor on artoo I0513 15:42:07.027039 4337 exec.cpp:378] Executor asked to shutdown Shutting down Sending SIGTERM to process tree at pid 4367 Killing the following process trees: [ -+- 4367 sh -c sleep 1000 \--- 4368 sleep 1000 ] ../../src/tests/slave_recovery_tests.cpp:2807: Failure Value of: status1.get().state() Actual: TASK_FAILED Expected: TASK_KILLED Program received signal SIGSEGV, Segmentation fault. testing::UnitTest::AddTestPartResult (this=0x154dac0 , result_type=testing::TestPartResult::kFatalFailure, file_name=0xeb6b6c ""../../src/tests/slave_recovery_tests.cpp"", line_number=2807, message=..., os_stack_trace=...) at gmock-1.6.0/gtest/src/gtest.cc:3795 3795 *static_cast(NULL) = 1; (gdb) bt #0 testing::UnitTest::AddTestPartResult (this=0x154dac0 , result_type=testing::TestPartResult::kFatalFailure, file_name=0xeb6b6c ""../../src/tests/slave_recovery_tests.cpp"", line_number=2807, message=..., os_stack_trace=...) at gmock-1.6.0/gtest/src/gtest.cc:3795 #1 0x0000000000df98b9 in testing::internal::AssertHelper::operator= (this=0x7fffffffb860, message=...) at gmock-1.6.0/gtest/src/gtest.cc:356 #2 0x0000000000cdfa57 in SlaveRecoveryTest_MultipleFrameworks_Test::TestBody (this=0x1954db0) at ../../src/tests/slave_recovery_tests.cpp:2807 #3 0x0000000000e22583 in testing::internal::HandleSehExceptionsInMethodIfSupported (object=0x1954db0, method=&virtual testing::Test::TestBody(), location=0xed0af0 ""the test body"") at gmock-1.6.0/gtest/src/gtest.cc:2090 #4 0x0000000000e12467 in testing::internal::HandleExceptionsInMethodIfSupported (object=0x1954db0, method=&virtual testing::Test::TestBody(), location=0xed0af0 ""the test body"") at gmock-1.6.0/gtest/src/gtest.cc:2126 #5 0x0000000000e010d5 in testing::Test::Run (this=0x1954db0) at gmock-1.6.0/gtest/src/gtest.cc:2161 #6 0x0000000000e01ceb in testing::TestInfo::Run (this=0x158cf80) at gmock-1.6.0/gtest/src/gtest.cc:2338 #7 0x0000000000e02387 in testing::TestCase::Run (this=0x158a880) at gmock-1.6.0/gtest/src/gtest.cc:2445 #8 0x0000000000e079ed in testing::internal::UnitTestImpl::RunAllTests (this=0x1558b40) at gmock-1.6.0/gtest/src/gtest.cc:4237 #9 0x0000000000e1ec83 in testing::internal::HandleSehExceptionsInMethodIfSupported (object=0x1558b40, method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0xe07700 , location=0xed1219 ""auxiliary test code (environments or event listeners)"") at gmock-1.6.0/gtest/src/gtest.cc:2090 #10 0x0000000000e14217 in testing::internal::HandleExceptionsInMethodIfSupported (object=0x1558b40, method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0xe07700 , location=0xed1219 ""auxiliary test code (environments or event listeners)"") at gmock-1.6.0/gtest/src/gtest.cc:2126 #11 0x0000000000e076d7 in testing::UnitTest::Run (this=0x154dac0 ) at gmock-1.6.0/gtest/src/gtest.cc:3872 #12 0x0000000000b99887 in main (argc=1, argv=0x7fffffffd9f8) at ../../src/tests/main.cpp:107 (gdb) frame 2 #2 0x0000000000cdfa57 in SlaveRecoveryTest_MultipleFrameworks_Test::TestBody (this=0x1954db0) at ../../src/tests/slave_recovery_tests.cpp:2807 2807 ASSERT_EQ(TASK_KILLED, status1.get().state()); (gdb) p status1 $1 = {data = {::Data, 2>> = {_M_ptr = 0x1963140, _M_refcount = {_M_pi = 0x198a620}}, }} (gdb) p status1.get() $2 = (const mesos::TaskStatus &) @0x7fffdc5bf5f0: { = { = {_vptr$MessageLite = 0x7ffff74bc940 }, }, static kTaskIdFieldNumber = 1, static kStateFieldNumber = 2, static kMessageFieldNumber = 4, static kDataFieldNumber = 3, static kSlaveIdFieldNumber = 5, static kTimestampFieldNumber = 6, _unknown_fields_ = {fields_ = 0x0}, task_id_ = 0x7fffdc5ce9a0, message_ = 0x7fffdc5f5880, data_ = 0x154b4b0 , slave_id_ = 0x7fffdc59c4f0, timestamp_ = 1429688582.046252, state_ = 3, _cached_size_ = 0, _has_bits_ = {55}, static default_instance_ = 0x0} (gdb) p status1.get().state() $3 = mesos::TASK_FAILED (gdb) list 2802 // Kill task 1. 2803 driver1.killTask(task1.task_id()); 2804 2805 // Wait for TASK_KILLED update. 2806 AWAIT_READY(status1); 2807 ASSERT_EQ(TASK_KILLED, status1.get().state()); 2808 2809 // Kill task 2. 2810 driver2.killTask(task2.task_id()); 2811 {noformat}",1 MESOS-1371,"Expose libprocess queue length from scheduler driver to metrics endpoint","We expose the master's event queue length and we should do the same for the scheduler driver.",1 MESOS-1373,"Keep track of the principals for authenticated pids in Master.","Need to add a 'principal' field to FrameworkInfo and verify if the Framework has the claimed principal during registration.",3 MESOS-1374,"Verify static libprocess scheduler port works with Mesos Master",NULL,5 MESOS-1392,"Failure when znode is removed before we can read its contents.","Looks like the following can occur when a znode goes away right before we can read it's contents: {noformat: title=Slave exit} I0520 16:33:45.721727 29155 group.cpp:382] Trying to create path '/home/mesos/test/master' in ZooKeeper I0520 16:33:48.600837 29155 detector.cpp:134] Detected a new leader: (id='2617') I0520 16:33:48.601428 29147 group.cpp:655] Trying to get '/home/mesos/test/master/info_0000002617' in ZooKeeper Failed to detect a master: Failed to get data for ephemeral node '/home/mesos/test/master/info_0000002617' in ZooKeeper: no node Slave Exit Status: 1 {noformat}",3 MESOS-1393,"Write parser for perf output.","1. Should support output from pid and cgroup targets. 2. Should support output for the same events from >= 1 cgroup 3. Should return as PerfStatistics protobuf. ",3 MESOS-1394,"Test different versions of perf","Test across different kernel versions (at least 2.6.XX and 3.X) and across different distributions. Test input flags and parsing output.",3 MESOS-1395,"Test perf isolator for slave roll forward/roll back","Test that changes to add/remove perf isolator will be handled through slave recovery, e.g., containers started without the perf isolator continue to report resource statistics and containers started with the perf isolator will include perf statistics.",2 MESOS-1396,"Introduce a PerfStatistics protobuf","Field names from `perf list` normalized to convert hyphens to underscores and down-cased. Start with just the hardware and software events, not raw hardware, breakpoints or tracepoints, All fields should be optional. Include as an optional field to ResourceStatistics.",2 MESOS-1397,"Rename ResourceStatistics for containers","Rename ContainerStatistics which includes optional ResourceStatistics and optional PerfStatistics.",8 MESOS-1398,"Document perf isolator flags","Document interval, duration and the event flags. Document event name normalization for the protobuf.",1 MESOS-1410,"Keep terminal unacknowledged tasks in the master's state.","Once we are sending acknowledgments through the master as per MESOS-1409, we need to keep terminal tasks that are *unacknowledged* in the Master's memory. This will allow us to identify these tasks to frameworks when we haven't yet forwarded them an update. Without this, we're susceptible to MESOS-1389.",5 MESOS-1424,"Mesos tests should not rely on echo","Triggered by MESOS-1413 I would like to propose changing our tests to not rely on {{echo}} but to use {{printf}} instead. This seems to be useful as {{echo}} is introducing an extra linefeed after the supplied string whereas {{printf}} does not. The {{-n}} switch preventing that extra linefeed is unfortunately not portable - it is not supported by the builtin {{echo}} of the BSD / OSX {{/bin/sh}}. ",1 MESOS-1425,"LogZooKeeperTest.WriteRead test is flaky","{code} [ RUN ] LogZooKeeperTest.WriteRead I0527 23:23:48.286031 1352 zookeeper_test_server.cpp:158] Started ZooKeeperTestServer on port 39446 I0527 23:23:48.293916 1352 log_tests.cpp:1945] Using temporary directory '/tmp/LogZooKeeperTest_WriteRead_Vyty8g' I0527 23:23:48.296430 1352 leveldb.cpp:176] Opened db in 2.459713ms I0527 23:23:48.296740 1352 leveldb.cpp:183] Compacted db in 286843ns I0527 23:23:48.296761 1352 leveldb.cpp:198] Created db iterator in 3083ns I0527 23:23:48.296772 1352 leveldb.cpp:204] Seeked to beginning of db in 4541ns I0527 23:23:48.296777 1352 leveldb.cpp:273] Iterated through 0 keys in the db in 87ns I0527 23:23:48.296788 1352 replica.cpp:741] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0527 23:23:48.297499 1383 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 505340ns I0527 23:23:48.297513 1383 replica.cpp:320] Persisted replica status to VOTING I0527 23:23:48.299492 1352 leveldb.cpp:176] Opened db in 1.73582ms I0527 23:23:48.299773 1352 leveldb.cpp:183] Compacted db in 263937ns I0527 23:23:48.299793 1352 leveldb.cpp:198] Created db iterator in 7494ns I0527 23:23:48.299806 1352 leveldb.cpp:204] Seeked to beginning of db in 235ns I0527 23:23:48.299813 1352 leveldb.cpp:273] Iterated through 0 keys in the db in 93ns I0527 23:23:48.299821 1352 replica.cpp:741] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0527 23:23:48.300503 1380 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 492309ns I0527 23:23:48.300516 1380 replica.cpp:320] Persisted replica status to VOTING I0527 23:23:48.302500 1352 leveldb.cpp:176] Opened db in 1.793829ms I0527 23:23:48.303642 1352 leveldb.cpp:183] Compacted db in 1.123929ms I0527 23:23:48.303669 1352 leveldb.cpp:198] Created db iterator in 5865ns I0527 23:23:48.303689 1352 leveldb.cpp:204] Seeked to beginning of db in 8811ns I0527 23:23:48.303705 1352 leveldb.cpp:273] Iterated through 1 keys in the db in 9545ns I0527 23:23:48.303715 1352 replica.cpp:741] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned 2014-05-27 23:23:48,303:1352(0x2b1173a29700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5 2014-05-27 23:23:48,303:1352(0x2b1173a29700):ZOO_INFO@log_env@716: Client environment:host.name=minerva 2014-05-27 23:23:48,303:1352(0x2b1173a29700):ZOO_INFO@log_env@723: Client environment:os.name=Linux 2014-05-27 23:23:48,303:1352(0x2b1173a29700):ZOO_INFO@log_env@724: Client environment:os.arch=3.2.0-57-generic 2014-05-27 23:23:48,303:1352(0x2b1173a29700):ZOO_INFO@log_env@725: Client environment:os.version=#87-Ubuntu SMP Tue Nov 12 21:35:10 UTC 2013 2014-05-27 23:23:48,303:1352(0x2b1173e2b700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5 2014-05-27 23:23:48,304:1352(0x2b1173e2b700):ZOO_INFO@log_env@716: Client environment:host.name=minerva 2014-05-27 23:23:48,304:1352(0x2b1173e2b700):ZOO_INFO@log_env@723: Client environment:os.name=Linux 2014-05-27 23:23:48,304:1352(0x2b1173e2b700):ZOO_INFO@log_env@724: Client environment:os.arch=3.2.0-57-generic 2014-05-27 23:23:48,304:1352(0x2b1173e2b700):ZOO_INFO@log_env@725: Client environment:os.version=#87-Ubuntu SMP Tue Nov 12 21:35:10 UTC 2013 2014-05-27 23:23:48,304:1352(0x2b1173a29700):ZOO_INFO@log_env@733: Client environment:user.name=(null) I0527 23:23:48.303988 1380 log.cpp:238] Attempting to join replica to ZooKeeper group 2014-05-27 23:23:48,304:1352(0x2b1173e2b700):ZOO_INFO@log_env@733: Client environment:user.name=(null) 2014-05-27 23:23:48,304:1352(0x2b1173a29700):ZOO_INFO@log_env@741: Client environment:user.home=/home/jenkins I0527 23:23:48.304198 1385 recover.cpp:425] Starting replica recovery 2014-05-27 23:23:48,304:1352(0x2b1173e2b700):ZOO_INFO@log_env@741: Client environment:user.home=/home/jenkins 2014-05-27 23:23:48,304:1352(0x2b1173a29700):ZOO_INFO@log_env@753: Client environment:user.dir=/tmp/LogZooKeeperTest_WriteRead_Vyty8g 2014-05-27 23:23:48,304:1352(0x2b1173a29700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=127.0.0.1:39446 sessionTimeout=5000 watcher=0x2b11708e98d0 sessionId=0 sessionPasswd= context=0x2b118002f4e0 flags=0 2014-05-27 23:23:48,304:1352(0x2b1173e2b700):ZOO_INFO@log_env@753: Client environment:user.dir=/tmp/LogZooKeeperTest_WriteRead_Vyty8g I0527 23:23:48.304352 1385 recover.cpp:451] Replica is in VOTING status 2014-05-27 23:23:48,304:1352(0x2b1173e2b700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=127.0.0.1:39446 sessionTimeout=5000 watcher=0x2b11708e98d0 sessionId=0 sessionPasswd= context=0x2b1198015ca0 flags=0 I0527 23:23:48.304417 1385 recover.cpp:440] Recover process terminated 2014-05-27 23:23:48,304:1352(0x2b12897b8700):ZOO_INFO@check_events@1703: initiated connection to server [127.0.0.1:39446] 2014-05-27 23:23:48,304:1352(0x2b12891b5700):ZOO_INFO@check_events@1703: initiated connection to server [127.0.0.1:39446] I0527 23:23:48.311262 1352 leveldb.cpp:176] Opened db in 7.261703ms 2014-05-27 23:23:48,311:1352(0x2b12897b8700):ZOO_INFO@check_events@1750: session establishment complete on server [127.0.0.1:39446], sessionId=0x1463fff34bd0000, negotiated timeout=6000 I0527 23:23:48.312379 1381 group.cpp:310] Group process ((614)@67.195.138.8:35151) connected to ZooKeeper I0527 23:23:48.312407 1381 group.cpp:784] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0) I0527 23:23:48.312417 1381 group.cpp:382] Trying to create path '/log' in ZooKeeper I0527 23:23:48.312422 1352 leveldb.cpp:183] Compacted db in 1.119843ms I0527 23:23:48.312505 1352 leveldb.cpp:198] Created db iterator in 3901ns I0527 23:23:48.312526 1352 leveldb.cpp:204] Seeked to beginning of db in 7398ns I0527 23:23:48.312541 1352 leveldb.cpp:273] Iterated through 1 keys in the db in 6345ns I0527 23:23:48.312553 1352 replica.cpp:741] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned 2014-05-27 23:23:48,312:1352(0x2b1173627700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5 2014-05-27 23:23:48,312:1352(0x2b1173627700):ZOO_INFO@log_env@716: Client environment:host.name=minerva 2014-05-27 23:23:48,312:1352(0x2b1173627700):ZOO_INFO@log_env@723: Client environment:os.name=Linux 2014-05-27 23:23:48,312:1352(0x2b1173627700):ZOO_INFO@log_env@724: Client environment:os.arch=3.2.0-57-generic 2014-05-27 23:23:48,312:1352(0x2b1173627700):ZOO_INFO@log_env@725: Client environment:os.version=#87-Ubuntu SMP Tue Nov 12 21:35:10 UTC 2013 2014-05-27 23:23:48,312:1352(0x2b1173627700):ZOO_INFO@log_env@733: Client environment:user.name=(null) 2014-05-27 23:23:48,312:1352(0x2b12891b5700):ZOO_INFO@check_events@1750: session establishment complete on server [127.0.0.1:39446], sessionId=0x1463fff34bd0001, negotiated timeout=6000 2014-05-27 23:23:48,313:1352(0x2b1173627700):ZOO_INFO@log_env@741: Client environment:user.home=/home/jenkins 2014-05-27 23:23:48,313:1352(0x2b1173627700):ZOO_INFO@log_env@753: Client environment:user.dir=/tmp/LogZooKeeperTest_WriteRead_Vyty8g 2014-05-27 23:23:48,313:1352(0x2b1173627700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=127.0.0.1:39446 sessionTimeout=5000 watcher=0x2b11708e98d0 sessionId=0 sessionPasswd= context=0x2b119001fd20 flags=0 I0527 23:23:48.313247 1380 group.cpp:310] Group process ((616)@67.195.138.8:35151) connected to ZooKeeper I0527 23:23:48.313266 1380 group.cpp:784] Syncing group operations: queue size (joins, cancels, datas) = (1, 0, 0) I0527 23:23:48.313273 1380 group.cpp:382] Trying to create path '/log' in ZooKeeper 2014-05-27 23:23:48,313:1352(0x2b12889b0700):ZOO_INFO@check_events@1703: initiated connection to server [127.0.0.1:39446] 2014-05-27 23:23:48,313:1352(0x2b1173828700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5 2014-05-27 23:23:48,313:1352(0x2b1173828700):ZOO_INFO@log_env@716: Client environment:host.name=minerva 2014-05-27 23:23:48,313:1352(0x2b1173828700):ZOO_INFO@log_env@723: Client environment:os.name=Linux 2014-05-27 23:23:48,313:1352(0x2b1173828700):ZOO_INFO@log_env@724: Client environment:os.arch=3.2.0-57-generic 2014-05-27 23:23:48,313:1352(0x2b1173828700):ZOO_INFO@log_env@725: Client environment:os.version=#87-Ubuntu SMP Tue Nov 12 21:35:10 UTC 2013 I0527 23:23:48.313436 1387 log.cpp:238] Attempting to join replica to ZooKeeper group 2014-05-27 23:23:48,313:1352(0x2b1173828700):ZOO_INFO@log_env@733: Client environment:user.name=(null) 2014-05-27 23:23:48,313:1352(0x2b1173828700):ZOO_INFO@log_env@741: Client environment:user.home=/home/jenkins 2014-05-27 23:23:48,313:1352(0x2b1173828700):ZOO_INFO@log_env@753: Client environment:user.dir=/tmp/LogZooKeeperTest_WriteRead_Vyty8g 2014-05-27 23:23:48,313:1352(0x2b1173828700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=127.0.0.1:39446 sessionTimeout=5000 watcher=0x2b11708e98d0 sessionId=0 sessionPasswd= context=0x2b1190011ea0 flags=0 I0527 23:23:48.313601 1387 recover.cpp:425] Starting replica recovery I0527 23:23:48.313721 1382 recover.cpp:451] Replica is in VOTING status I0527 23:23:48.313794 1382 recover.cpp:440] Recover process terminated 2014-05-27 23:23:48,313:1352(0x2b1288bb1700):ZOO_INFO@check_events@1703: initiated connection to server [127.0.0.1:39446] I0527 23:23:48.313973 1383 log.cpp:656] Attempting to start the writer 2014-05-27 23:23:48,315:1352(0x2b12889b0700):ZOO_INFO@check_events@1750: session establishment complete on server [127.0.0.1:39446], sessionId=0x1463fff34bd0002, negotiated timeout=6000 I0527 23:23:48.315682 1387 group.cpp:310] Group process ((619)@67.195.138.8:35151) connected to ZooKeeper 2014-05-27 23:23:48,315:1352(0x2b1288bb1700):ZOO_INFO@check_events@1750: session establishment complete on server [127.0.0.1:39446], sessionId=0x1463fff34bd0003, negotiated timeout=6000 I0527 23:23:48.315709 1387 group.cpp:784] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0) I0527 23:23:48.315738 1387 group.cpp:382] Trying to create path '/log' in ZooKeeper I0527 23:23:48.315964 1386 group.cpp:310] Group process ((621)@67.195.138.8:35151) connected to ZooKeeper I0527 23:23:48.315981 1386 group.cpp:784] Syncing group operations: queue size (joins, cancels, datas) = (1, 0, 0) I0527 23:23:48.315989 1386 group.cpp:382] Trying to create path '/log' in ZooKeeper I0527 23:23:48.317881 1385 network.hpp:423] ZooKeeper group memberships changed I0527 23:23:48.317937 1381 group.cpp:655] Trying to get '/log/0000000000' in ZooKeeper I0527 23:23:48.318205 1382 network.hpp:423] ZooKeeper group memberships changed I0527 23:23:48.318317 1383 group.cpp:655] Trying to get '/log/0000000000' in ZooKeeper I0527 23:23:48.319154 1382 network.hpp:461] ZooKeeper group PIDs: { log-replica(22)@67.195.138.8:35151 } I0527 23:23:48.319541 1386 network.hpp:461] ZooKeeper group PIDs: { log-replica(22)@67.195.138.8:35151 } I0527 23:23:48.319851 1381 replica.cpp:474] Replica received implicit promise request with proposal 1 I0527 23:23:48.319905 1387 replica.cpp:474] Replica received implicit promise request with proposal 1 I0527 23:23:48.319907 1384 network.hpp:423] ZooKeeper group memberships changed I0527 23:23:48.320091 1385 group.cpp:655] Trying to get '/log/0000000000' in ZooKeeper I0527 23:23:48.320384 1383 network.hpp:423] ZooKeeper group memberships changed I0527 23:23:48.320441 1381 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 568396ns I0527 23:23:48.320456 1384 group.cpp:655] Trying to get '/log/0000000000' in ZooKeeper I0527 23:23:48.320461 1381 replica.cpp:342] Persisted promised to 1 I0527 23:23:48.320446 1387 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 516015ns I0527 23:23:48.320497 1387 replica.cpp:342] Persisted promised to 1 I0527 23:23:48.320814 1383 coordinator.cpp:230] Coordinator attemping to fill missing position I0527 23:23:48.321050 1384 group.cpp:655] Trying to get '/log/0000000001' in ZooKeeper I0527 23:23:48.321063 1385 group.cpp:655] Trying to get '/log/0000000001' in ZooKeeper I0527 23:23:48.321341 1387 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2 I0527 23:23:48.321375 1381 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2 I0527 23:23:48.321506 1387 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 89us I0527 23:23:48.321530 1387 replica.cpp:676] Persisted action at 0 I0527 23:23:48.321584 1381 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 122910ns I0527 23:23:48.321602 1381 replica.cpp:676] Persisted action at 0 I0527 23:23:48.321775 1383 network.hpp:461] ZooKeeper group PIDs: { log-replica(22)@67.195.138.8:35151, log-replica(23)@67.195.138.8:35151 } I0527 23:23:48.321961 1381 replica.cpp:508] Replica received write request for position 0 I0527 23:23:48.321984 1381 leveldb.cpp:438] Reading position from leveldb took 7813ns I0527 23:23:48.322064 1380 network.hpp:461] ZooKeeper group PIDs: { log-replica(22)@67.195.138.8:35151, log-replica(23)@67.195.138.8:35151 } I0527 23:23:48.322073 1381 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 78683ns I0527 23:23:48.322077 1383 replica.cpp:508] Replica received write request for position 0 I0527 23:23:48.322084 1381 replica.cpp:676] Persisted action at 0 I0527 23:23:48.322111 1383 leveldb.cpp:438] Reading position from leveldb took 17416ns I0527 23:23:48.322330 1383 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 157199ns I0527 23:23:48.322345 1383 replica.cpp:676] Persisted action at 0 I0527 23:23:48.322522 1386 replica.cpp:655] Replica received learned notice for position 0 I0527 23:23:48.322523 1382 replica.cpp:655] Replica received learned notice for position 0 I0527 23:23:48.322638 1386 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 86907ns I0527 23:23:48.322661 1386 replica.cpp:676] Persisted action at 0 I0527 23:23:48.322670 1386 replica.cpp:661] Replica learned NOP action at position 0 I0527 23:23:48.322682 1382 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 85031ns I0527 23:23:48.322693 1382 replica.cpp:676] Persisted action at 0 I0527 23:23:48.322700 1382 replica.cpp:661] Replica learned NOP action at position 0 I0527 23:23:48.322790 1380 log.cpp:672] Writer started with ending position 0 I0527 23:23:48.322898 1380 log.cpp:680] Attempting to append 11 bytes to the log I0527 23:23:48.322978 1383 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 1 I0527 23:23:48.323122 1380 replica.cpp:508] Replica received write request for position 1 I0527 23:23:48.323158 1381 replica.cpp:508] Replica received write request for position 1 I0527 23:23:48.323202 1380 leveldb.cpp:343] Persisting action (27 bytes) to leveldb took 66527ns I0527 23:23:48.323215 1380 replica.cpp:676] Persisted action at 1 I0527 23:23:48.323238 1381 leveldb.cpp:343] Persisting action (27 bytes) to leveldb took 67074ns I0527 23:23:48.323252 1381 replica.cpp:676] Persisted action at 1 I0527 23:23:48.323354 1380 replica.cpp:655] Replica received learned notice for position 1 I0527 23:23:48.323362 1382 replica.cpp:655] Replica received learned notice for position 1 I0527 23:23:48.323443 1380 leveldb.cpp:343] Persisting action (29 bytes) to leveldb took 77398ns I0527 23:23:48.323461 1380 replica.cpp:676] Persisted action at 1 I0527 23:23:48.323463 1382 leveldb.cpp:343] Persisting action (29 bytes) to leveldb took 90567ns I0527 23:23:48.323467 1380 replica.cpp:661] Replica learned APPEND action at position 1 I0527 23:23:48.323477 1382 replica.cpp:676] Persisted action at 1 I0527 23:23:48.323484 1382 replica.cpp:661] Replica learned APPEND action at position 1 I0527 23:23:48.323729 1380 leveldb.cpp:438] Reading position from leveldb took 7224ns 2014-05-27 23:23:48,324:1352(0x2b1173c2a700):ZOO_INFO@zookeeper_close@2505: Closing zookeeper sessionId=0x1463fff34bd0003 to [127.0.0.1:39446] 2014-05-27 23:23:48,324:1352(0x2b117301ff80):ZOO_INFO@zookeeper_close@2505: Closing zookeeper sessionId=0x1463fff34bd0002 to [127.0.0.1:39446] I0527 23:23:48.326591 1386 network.hpp:423] ZooKeeper group memberships changed I0527 23:23:48.326690 1382 group.cpp:655] Trying to get '/log/0000000000' in ZooKeeper I0527 23:23:48.327450 1384 network.hpp:461] ZooKeeper group PIDs: { log-replica(22)@67.195.138.8:35151 } 2014-05-27 23:23:48,446:1352(0x2b12bc401700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51020] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2014-05-27 23:23:51,782:1352(0x2b12bc401700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51020] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2014-05-27 23:23:55,118:1352(0x2b12bc401700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51020] zk retcode=-4, errno=111(Connection refused): server refused to accept the client I0527 23:23:57.002908 1381 network.hpp:423] ZooKeeper group memberships changed I0527 23:23:57.003042 1381 network.hpp:461] ZooKeeper group PIDs: { } 2014-05-27 23:23:58,455:1352(0x2b12bc401700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51020] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2014-05-27 23:24:01,791:1352(0x2b12bc401700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51020] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2014-05-27 23:24:05,127:1352(0x2b12bc401700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51020] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2014-05-27 23:24:08,464:1352(0x2b12bc401700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51020] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2014-05-27 23:24:11,800:1352(0x2b12bc401700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51020] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2014-05-27 23:24:15,136:1352(0x2b12bc401700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51020] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2014-05-27 23:24:18,473:1352(0x2b12bc401700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51020] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2014-05-27 23:24:21,809:1352(0x2b12bc401700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51020] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2014-05-27 23:24:25,146:1352(0x2b12bc401700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51020] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2014-05-27 23:24:28,482:1352(0x2b12bc401700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51020] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2014-05-27 23:24:31,818:1352(0x2b12bc401700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51020] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2014-05-27 23:24:35,155:1352(0x2b12bc401700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51020] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2014-05-27 23:24:38,491:1352(0x2b12bc401700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51020] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2014-05-27 23:24:41,827:1352(0x2b12bc401700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51020] zk retc...",1 MESOS-1443,"Create a protobuf for framework rate limit configuration and load it as JSON through master flags",NULL,2 MESOS-1444,"Integrate rate limiter into the master",NULL,5 MESOS-1445,"Add new tests for framework rate limiting",NULL,3 MESOS-1459,"Build failure: Ubuntu 13.10/clang due to missing virtual destructor","In file included from launcher/main.cpp:19: In file included from ./launcher/launcher.hpp:24: In file included from ../3rdparty/libprocess/include/process/future.hpp:23: ../3rdparty/libprocess/include/process/owned.hpp:188:5: error: delete called on 'mesos::internal::launcher::Operation' that is abstract but has non-virtual destructor [-Werror,-Wdelete-non-virtual-dtor] delete t; ^ /usr/bin/../lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/bits/shared_ptr_base.h:456:8: note: in instantiation of member function 'process::Owned::Data::~Data' requested here delete __p; ^ /usr/bin/../lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/bits/shared_ptr_base.h:768:24: note: in instantiation of function template specialization 'std::__shared_count<2>::__shared_count::Data *>' requested here : _M_ptr(__p), _M_refcount(__p) ^ /usr/bin/../lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/bits/shared_ptr_base.h:919:4: note: in instantiation of function template specialization 'std::__shared_ptr::Data, 2>::__shared_ptr::Data>' requested here __shared_ptr(__p).swap(*this); ^ ../3rdparty/libprocess/include/process/owned.hpp:68:10: note: in instantiation of function template specialization 'std::__shared_ptr::Data, 2>::reset::Data>' requested here data.reset(new Data(t)); ^ ./launcher/launcher.hpp:101:7: note: in instantiation of member function 'process::Owned::Owned' requested here add(process::Owned(new T())); ^ launcher/main.cpp:26:3: note: in instantiation of function template specialization 'mesos::internal::launcher::add' requested here launcher::add(); ^ 1 error generated.",1 MESOS-1466,"Race between executor exited event and launch task can cause overcommit of resources","The following sequence of events can cause an overcommit --> Launch task is called for a task whose executor is already running --> Executor's resources are not accounted for on the master --> Executor exits and the event is enqueued behind launch tasks on the master --> Master sends the task to the slave which needs to commit for resources for task and the (new) executor. --> Master processes the executor exited event and re-offers the executor's resources causing an overcommit of resources.",8 MESOS-1469,"No output from review bot on timeout","When the mesos review build times out, likely due to a long-running failing test, we have no output to debug. We should find a way to stream the output from the build instead of waiting for the build to finish.",1 MESOS-1471,"Document replicated log design/internals","The replicated log could benefit from some documentation. In particular, how does it work? What do operators need to know? Possibly there is some overlap with our future maintenance documentation in MESOS-1470. I believe [~jieyu] has some unpublished work that could be leveraged here!",5 MESOS-1472,"Improve child exit if slave dies during executor launch in MC","When restarting many slaves there's a reasonable chance that a slave will be restarted between the fork and exec stages of launching an executor in the MesosContainerizer. The forked child correctly detects this however rather than abort it should safely log and then exit non-zero cleanly.",1 MESOS-1518,"Update Rate Limiting Design doc to reflect the latest changes","- Usage - Design - Implementation Notes",2 MESOS-1527,"Choose containerizer at runtime","Currently you have to choose the containerizer at mesos-slave start time via the --isolation option. I'd like to be able to specify the containerizer in the request to launch the job. This could be specified by a new ""Provider"" field in the ContainerInfo proto buf.",3 MESOS-1529,"Handle a network partition between Master and Slave","If a network partition occurs between a Master and Slave, the Master will remove the Slave (as it fails health check) and mark the tasks being run there as LOST. However, the Slave is not aware that it has been removed so the tasks will continue to run. (To clarify a little bit: neither the master nor the slave receives 'exited' event, indicating that the connection between the master and slave is not closed). There are at least two possible approaches to solving this issue: 1. Introduce a health check from Slave to Master so they have a consistent view of a network partition. We may still see this issue should a one-way connection error occur. 2. Be less aggressive about marking tasks and Slaves as lost. Wait until the Slave reappears and reconcile then. We'd still need to mark Slaves and tasks as potentially lost (zombie state) but maybe the Scheduler can make a more intelligent decision.",5 MESOS-1545,"SlaveRecoveryTest/0.MultipleFrameworks is flaky","{code} [ RUN ] SlaveRecoveryTest/0.MultipleFrameworks Using temporary directory '/tmp/SlaveRecoveryTest_0_MultipleFrameworks_6dJqxr' I0626 00:04:39.557339 5450 leveldb.cpp:176] Opened db in 179.857593ms I0626 00:04:39.565433 5450 leveldb.cpp:183] Compacted db in 8.071041ms I0626 00:04:39.565457 5450 leveldb.cpp:198] Created db iterator in 4065ns I0626 00:04:39.565466 5450 leveldb.cpp:204] Seeked to beginning of db in 596ns I0626 00:04:39.565474 5450 leveldb.cpp:273] Iterated through 0 keys in the db in 396ns I0626 00:04:39.565490 5450 replica.cpp:741] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0626 00:04:39.565827 5476 recover.cpp:425] Starting replica recovery I0626 00:04:39.566033 5474 recover.cpp:451] Replica is in EMPTY status I0626 00:04:39.566504 5474 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I0626 00:04:39.566686 5477 recover.cpp:188] Received a recover response from a replica in EMPTY status I0626 00:04:39.566905 5472 recover.cpp:542] Updating replica status to STARTING I0626 00:04:39.568307 5471 master.cpp:288] Master 20140626-000439-1032504131-55423-5450 (juno.apache.org) started on 67.195.138.61:55423 I0626 00:04:39.568332 5471 master.cpp:325] Master only allowing authenticated frameworks to register I0626 00:04:39.568339 5471 master.cpp:330] Master only allowing authenticated slaves to register I0626 00:04:39.568348 5471 credentials.hpp:35] Loading credentials for authentication from '/tmp/SlaveRecoveryTest_0_MultipleFrameworks_6dJqxr/credentials' I0626 00:04:39.568461 5471 master.cpp:356] Authorization enabled I0626 00:04:39.568739 5478 master.cpp:122] No whitelist given. Advertising offers for all slaves I0626 00:04:39.568814 5475 hierarchical_allocator_process.hpp:301] Initializing hierarchical allocator process with master : master@67.195.138.61:55423 I0626 00:04:39.569206 5478 master.cpp:1122] The newly elected leader is master@67.195.138.61:55423 with id 20140626-000439-1032504131-55423-5450 I0626 00:04:39.569223 5478 master.cpp:1135] Elected as the leading master! I0626 00:04:39.569231 5478 master.cpp:953] Recovering from registrar I0626 00:04:39.569286 5475 registrar.cpp:313] Recovering registrar I0626 00:04:39.600639 5477 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 33.682136ms I0626 00:04:39.600661 5477 replica.cpp:320] Persisted replica status to STARTING I0626 00:04:39.600790 5476 recover.cpp:451] Replica is in STARTING status I0626 00:04:39.601184 5474 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I0626 00:04:39.601274 5477 recover.cpp:188] Received a recover response from a replica in STARTING status I0626 00:04:39.601465 5471 recover.cpp:542] Updating replica status to VOTING I0626 00:04:39.610605 5471 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 9.076262ms I0626 00:04:39.610638 5471 replica.cpp:320] Persisted replica status to VOTING I0626 00:04:39.610683 5471 recover.cpp:556] Successfully joined the Paxos group I0626 00:04:39.610780 5471 recover.cpp:440] Recover process terminated I0626 00:04:39.610946 5474 log.cpp:656] Attempting to start the writer I0626 00:04:39.611486 5475 replica.cpp:474] Replica received implicit promise request with proposal 1 I0626 00:04:39.618924 5475 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 7.418789ms I0626 00:04:39.618942 5475 replica.cpp:342] Persisted promised to 1 I0626 00:04:39.619220 5476 coordinator.cpp:230] Coordinator attemping to fill missing position I0626 00:04:39.619763 5476 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2 I0626 00:04:39.627267 5476 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 7.485492ms I0626 00:04:39.627295 5476 replica.cpp:676] Persisted action at 0 I0626 00:04:39.627822 5473 replica.cpp:508] Replica received write request for position 0 I0626 00:04:39.627861 5473 leveldb.cpp:438] Reading position from leveldb took 17132ns I0626 00:04:39.635592 5473 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 7.714322ms I0626 00:04:39.635612 5473 replica.cpp:676] Persisted action at 0 I0626 00:04:39.635797 5473 replica.cpp:655] Replica received learned notice for position 0 I0626 00:04:39.643941 5473 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 8.129347ms I0626 00:04:39.643960 5473 replica.cpp:676] Persisted action at 0 I0626 00:04:39.643970 5473 replica.cpp:661] Replica learned NOP action at position 0 I0626 00:04:39.644207 5473 log.cpp:672] Writer started with ending position 0 I0626 00:04:39.644625 5471 leveldb.cpp:438] Reading position from leveldb took 9128ns I0626 00:04:39.646010 5476 registrar.cpp:346] Successfully fetched the registry (0B) I0626 00:04:39.646044 5476 registrar.cpp:422] Attempting to update the 'registry' I0626 00:04:39.647274 5471 log.cpp:680] Attempting to append 136 bytes to the log I0626 00:04:39.647337 5471 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 1 I0626 00:04:39.647687 5476 replica.cpp:508] Replica received write request for position 1 I0626 00:04:39.655206 5476 leveldb.cpp:343] Persisting action (155 bytes) to leveldb took 7.499736ms I0626 00:04:39.655225 5476 replica.cpp:676] Persisted action at 1 I0626 00:04:39.655467 5476 replica.cpp:655] Replica received learned notice for position 1 I0626 00:04:39.663534 5476 leveldb.cpp:343] Persisting action (157 bytes) to leveldb took 8.054929ms I0626 00:04:39.663554 5476 replica.cpp:676] Persisted action at 1 I0626 00:04:39.663563 5476 replica.cpp:661] Replica learned APPEND action at position 1 I0626 00:04:39.663890 5478 registrar.cpp:479] Successfully updated 'registry' I0626 00:04:39.663947 5478 registrar.cpp:372] Successfully recovered registrar I0626 00:04:39.663969 5476 log.cpp:699] Attempting to truncate the log to 1 I0626 00:04:39.664044 5478 master.cpp:980] Recovered 0 slaves from the Registry (98B) ; allowing 10mins for slaves to re-register I0626 00:04:39.664057 5476 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 2 I0626 00:04:39.664341 5476 replica.cpp:508] Replica received write request for position 2 I0626 00:04:39.664681 5450 containerizer.cpp:124] Using isolation: posix/cpu,posix/mem I0626 00:04:39.666721 5471 slave.cpp:168] Slave started on 173)@67.195.138.61:55423 I0626 00:04:39.666741 5471 credentials.hpp:35] Loading credentials for authentication from '/tmp/SlaveRecoveryTest_0_MultipleFrameworks_G6ObtK/credential' I0626 00:04:39.666806 5471 slave.cpp:268] Slave using credential for: test-principal I0626 00:04:39.666936 5471 slave.cpp:281] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0626 00:04:39.667000 5471 slave.cpp:326] Slave hostname: juno.apache.org I0626 00:04:39.667009 5471 slave.cpp:327] Slave checkpoint: true I0626 00:04:39.667572 5478 state.cpp:33] Recovering state from '/tmp/SlaveRecoveryTest_0_MultipleFrameworks_G6ObtK/meta' I0626 00:04:39.667703 5475 status_update_manager.cpp:193] Recovering status update manager I0626 00:04:39.667840 5475 containerizer.cpp:287] Recovering containerizer I0626 00:04:39.668478 5471 slave.cpp:3128] Finished recovery I0626 00:04:39.668712 5471 slave.cpp:601] New master detected at master@67.195.138.61:55423 I0626 00:04:39.668738 5471 slave.cpp:677] Authenticating with master master@67.195.138.61:55423 I0626 00:04:39.668802 5471 slave.cpp:650] Detecting new master I0626 00:04:39.668861 5471 status_update_manager.cpp:167] New master detected at master@67.195.138.61:55423 I0626 00:04:39.668916 5471 authenticatee.hpp:128] Creating new client SASL connection I0626 00:04:39.669087 5471 master.cpp:3499] Authenticating slave(173)@67.195.138.61:55423 I0626 00:04:39.669203 5471 authenticator.hpp:156] Creating new server SASL connection I0626 00:04:39.669340 5471 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5 I0626 00:04:39.669359 5471 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5' I0626 00:04:39.669386 5471 authenticator.hpp:262] Received SASL authentication start I0626 00:04:39.669414 5471 authenticator.hpp:384] Authentication requires more steps I0626 00:04:39.669457 5471 authenticatee.hpp:265] Received SASL authentication step I0626 00:04:39.669514 5471 authenticator.hpp:290] Received SASL authentication step I0626 00:04:39.669534 5471 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'juno.apache.org' server FQDN: 'juno.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0626 00:04:39.669543 5471 auxprop.cpp:153] Looking up auxiliary property '*userPassword' I0626 00:04:39.669567 5471 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0626 00:04:39.669580 5471 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'juno.apache.org' server FQDN: 'juno.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0626 00:04:39.669589 5471 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0626 00:04:39.669594 5471 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0626 00:04:39.669606 5471 authenticator.hpp:376] Authentication success I0626 00:04:39.669641 5471 authenticatee.hpp:305] Authentication success I0626 00:04:39.669669 5471 master.cpp:3539] Successfully authenticated principal 'test-principal' at slave(173)@67.195.138.61:55423 I0626 00:04:39.669761 5450 sched.cpp:139] Version: 0.20.0 I0626 00:04:39.669764 5478 slave.cpp:734] Successfully authenticated with master master@67.195.138.61:55423 I0626 00:04:39.669826 5478 slave.cpp:972] Will retry registration in 3.190666ms if necessary I0626 00:04:39.669950 5471 master.cpp:2781] Registering slave at slave(173)@67.195.138.61:55423 (juno.apache.org) with id 20140626-000439-1032504131-55423-5450-0 I0626 00:04:39.669960 5475 sched.cpp:235] New master detected at master@67.195.138.61:55423 I0626 00:04:39.669977 5475 sched.cpp:285] Authenticating with master master@67.195.138.61:55423 I0626 00:04:39.670073 5471 registrar.cpp:422] Attempting to update the 'registry' I0626 00:04:39.670114 5475 authenticatee.hpp:128] Creating new client SASL connection I0626 00:04:39.670263 5475 master.cpp:3499] Authenticating scheduler-e66c50d2-2790-4d20-bc77-a57af0e1780b@67.195.138.61:55423 I0626 00:04:39.670361 5474 authenticator.hpp:156] Creating new server SASL connection I0626 00:04:39.670506 5475 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5 I0626 00:04:39.670526 5475 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5' I0626 00:04:39.670559 5475 authenticator.hpp:262] Received SASL authentication start I0626 00:04:39.670590 5475 authenticator.hpp:384] Authentication requires more steps I0626 00:04:39.670619 5475 authenticatee.hpp:265] Received SASL authentication step I0626 00:04:39.670650 5475 authenticator.hpp:290] Received SASL authentication step I0626 00:04:39.670670 5475 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'juno.apache.org' server FQDN: 'juno.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0626 00:04:39.670677 5475 auxprop.cpp:153] Looking up auxiliary property '*userPassword' I0626 00:04:39.670687 5475 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0626 00:04:39.670697 5475 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'juno.apache.org' server FQDN: 'juno.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0626 00:04:39.670706 5475 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0626 00:04:39.670712 5475 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0626 00:04:39.670723 5475 authenticator.hpp:376] Authentication success I0626 00:04:39.670749 5475 authenticatee.hpp:305] Authentication success I0626 00:04:39.670773 5475 master.cpp:3539] Successfully authenticated principal 'test-principal' at scheduler-e66c50d2-2790-4d20-bc77-a57af0e1780b@67.195.138.61:55423 I0626 00:04:39.670845 5475 sched.cpp:359] Successfully authenticated with master master@67.195.138.61:55423 I0626 00:04:39.670858 5475 sched.cpp:478] Sending registration request to master@67.195.138.61:55423 I0626 00:04:39.670899 5475 master.cpp:1241] Received registration request from scheduler-e66c50d2-2790-4d20-bc77-a57af0e1780b@67.195.138.61:55423 I0626 00:04:39.670922 5475 master.cpp:1201] Authorizing framework principal 'test-principal' to receive offers for role '*' I0626 00:04:39.671052 5475 master.cpp:1300] Registering framework 20140626-000439-1032504131-55423-5450-0000 at scheduler-e66c50d2-2790-4d20-bc77-a57af0e1780b@67.195.138.61:55423 I0626 00:04:39.671159 5474 sched.cpp:409] Framework registered with 20140626-000439-1032504131-55423-5450-0000 I0626 00:04:39.671185 5474 sched.cpp:423] Scheduler::registered took 10223ns I0626 00:04:39.671226 5474 hierarchical_allocator_process.hpp:331] Added framework 20140626-000439-1032504131-55423-5450-0000 I0626 00:04:39.671241 5474 hierarchical_allocator_process.hpp:724] No resources available to allocate! I0626 00:04:39.671247 5474 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 8574ns I0626 00:04:39.671879 5476 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 7.48781ms I0626 00:04:39.671900 5476 replica.cpp:676] Persisted action at 2 I0626 00:04:39.672164 5471 replica.cpp:655] Replica received learned notice for position 2 I0626 00:04:39.674092 5472 slave.cpp:972] Will retry registration in 25.467893ms if necessary I0626 00:04:39.674108 5476 master.cpp:2769] Ignoring register slave message from slave(173)@67.195.138.61:55423 (juno.apache.org) as admission is already in progress I0626 00:04:39.680193 5471 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 8.01285ms I0626 00:04:39.680223 5471 leveldb.cpp:401] Deleting ~1 keys from leveldb took 11393ns I0626 00:04:39.680234 5471 replica.cpp:676] Persisted action at 2 I0626 00:04:39.680245 5471 replica.cpp:661] Replica learned TRUNCATE action at position 2 I0626 00:04:39.680585 5472 log.cpp:680] Attempting to append 326 bytes to the log I0626 00:04:39.680670 5477 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 3 I0626 00:04:39.680953 5474 replica.cpp:508] Replica received write request for position 3 I0626 00:04:39.688521 5474 leveldb.cpp:343] Persisting action (345 bytes) to leveldb took 7.548316ms I0626 00:04:39.688542 5474 replica.cpp:676] Persisted action at 3 I0626 00:04:39.688750 5474 replica.cpp:655] Replica received learned notice for position 3 I0626 00:04:39.696851 5474 leveldb.cpp:343] Persisting action (347 bytes) to leveldb took 8.088289ms I0626 00:04:39.696869 5474 replica.cpp:676] Persisted action at 3 I0626 00:04:39.696878 5474 replica.cpp:661] Replica learned APPEND action at position 3 I0626 00:04:39.697268 5474 registrar.cpp:479] Successfully updated 'registry' I0626 00:04:39.697350 5474 log.cpp:699] Attempting to truncate the log to 3 I0626 00:04:39.697412 5474 master.cpp:2821] Registered slave 20140626-000439-1032504131-55423-5450-0 at slave(173)@67.195.138.61:55423 (juno.apache.org) I0626 00:04:39.697423 5474 master.cpp:3967] Adding slave 20140626-000439-1032504131-55423-5450-0 at slave(173)@67.195.138.61:55423 (juno.apache.org) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0626 00:04:39.697535 5474 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 4 I0626 00:04:39.697618 5474 slave.cpp:768] Registered with master master@67.195.138.61:55423; given slave ID 20140626-000439-1032504131-55423-5450-0 I0626 00:04:39.697754 5474 slave.cpp:781] Checkpointing SlaveInfo to '/tmp/SlaveRecoveryTest_0_MultipleFrameworks_G6ObtK/meta/slaves/20140626-000439-1032504131-55423-5450-0/slave.info' I0626 00:04:39.697762 5471 hierarchical_allocator_process.hpp:444] Added slave 20140626-000439-1032504131-55423-5450-0 (juno.apache.org) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (and cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] available) I0626 00:04:39.697845 5471 hierarchical_allocator_process.hpp:750] Offering cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140626-000439-1032504131-55423-5450-0 to framework 20140626-000439-1032504131-55423-5450-0000 I0626 00:04:39.697854 5474 slave.cpp:2325] Received ping from slave-observer(142)@67.195.138.61:55423 I0626 00:04:39.698040 5471 hierarchical_allocator_process.hpp:706] Performed allocation for slave 20140626-000439-1032504131-55423-5450-0 in 231333ns I0626 00:04:39.698051 5474 replica.cpp:508] Replica received write request for position 4 I0626 00:04:39.698118 5471 master.hpp:794] Adding offer 20140626-000439-1032504131-55423-5450-0 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140626-000439-1032504131-55423-5450-0 (juno.apache.org) I0626 00:04:39.698170 5471 master.cpp:3446] Sending 1 offers to framework 20140626-000439-1032504131-55423-5450-0000 I0626 00:04:39.698318 5471 sched.cpp:546] Scheduler::resourceOffers took 24371ns I0626 00:04:39.699718 5477 master.hpp:804] Removing offer 20140626-000439-1032504131-55423-5450-0 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140626-000439-1032504131-55423-5450-0 (juno.apache.org) I0626 00:04:39.699787 5477 master.cpp:2125] Processing reply for offers: [ 20140626-000439-1032504131-55423-5450-0 ] on slave 20140626-000439-1032504131-55423-5450-0 at slave(173)@67.195.138.61:55423 (juno.apache.org) for framework 20140626-000439-1032504131-55423-5450-0000 I0626 00:04:39.699812 5477 master.cpp:2211] Authorizing framework principal 'test-principal' to launch task 897522cc-4ec5-4904-aed0-00b6b8c41028 as user 'jenkins' I0626 00:04:39.700160 5477 master.hpp:766] Adding task 897522cc-4ec5-4904-aed0-00b6b8c41028 with resources cpus(*):1; mem(*):512 on slave 20140626-000439-1032504131-55423-5450-0 (juno.apache.org) I0626 00:04:39.700188 5477 master.cpp:2277] Launching task 897522cc-4ec5-4904-aed0-00b6b8c41028 of framework 20140626-000439-1032504131-55423-5450-0000 with resources cpus(*):1; mem(*):512 on slave 20140626-000439-1032504131-55423-5450-0 at slave(173)@67.195.138.61:55423 (juno.apache.org) I0626 00:04:39.700392 5471 slave.cpp:1003] Got assigned task 897522cc-4ec5-4904-aed0-00b6b8c41028 for framework 20140626-000439-1032504131-55423-5450-0000 I0626 00:04:39.700479 5477 hierarchical_allocator_process.hpp:546] Framework 20140626-000439-1032504131-55423-5450-0000 left cpus(*):1; mem(*):512; disk(*):1024; ports(*):[31000-32000] unused on slave 20140626-000439-1032504131-55423-5450-0 I0626 00:04:39.700505 5471 slave.cpp:3400] Checkpointing FrameworkInfo to '/tmp/SlaveRecoveryTest_0_MultipleFrameworks_G6ObtK/meta/slaves/20140626-000439-1032504131-55423-5450-0/frameworks/20140626-000439-1032504131-55423-5450-0000/framework.info' I0626 00:04:39.700597 5477 hierarchical_allocator_process.hpp:588] Framework 20140626-000439-1032504131-55423-5450-0000 filtered slave 20140626-000439-1032504131-55423-5450-0 for 5secs I0626 00:04:39.700686 5471 slave.cpp:3407] Checkpointing framework pid 'scheduler-e66c50d2-2790-4d20-bc77-a57af0e1780b@67.195.138.61:55423' to '...",1 MESOS-1559,"Allow jenkins build machine to dump stack traces of all threads when timeout","Many of the time, when jenkins build times out, we know that some test freezes at some place. However, most of the time, it's very hard to reproduce the deadlock on dev machines. I would be cool if we can dump the stack traces of all threads when jenkins build times out. Some command like the following: {noformat} echo thread apply all bt > tmp; gdb attach `pgrep lt-mesos-tests` < tmp {noformat}",5 MESOS-1567,"Add logging of the user uid when receiving SIGTERM.","We currently do not log the user id when receiving a SIGTERM, this makes debugging a bit difficult. It's easy to get this information through sigaction.",1 MESOS-1571,"Signal escalation timeout is not configurable","Even though the executor shutdown grace period is set to a larger interval, the signal escalation timeout will still be 3 seconds. It should either be configurable or dependent on EXECUTOR_SHUTDOWN_GRACE_PERIOD. Thoughts?",2 MESOS-1578,"Improve framework rate limiting by imposing the max number of outstanding messages per framework principal","* Rate limits config takes a configurable *capacity* for each principal. * To ensure that Master maintain the message order of a framework it's important that Master sends an FrameworkErrorMessage back to the scheduler to ask it to abort.",5 MESOS-1586,"Isolate system directories, e.g., per-container /tmp","Ideally, tasks should not write outside their sandbox (executor work directory) but pragmatically they may need to write to /tmp, /var/tmp, or some other directory. 1) We should include any such files in disk usage and quota. 2) We should make these ""shared"" directories private, i.e., each container has their own. 3) We should make the lifetime of any such files the same as the executor work directory.",3 MESOS-1587,"Report disk usage from MesosContainerizer","We should report disk usage for the executor work directory from MesosContainerizer and include in the ResourceStatistics protobuf.",5 MESOS-1590,"Allow LoadGeneratorFramework to read password from a file","It currently just reads the flag as the value of the password.",1 MESOS-1592,"Design inverse resource offer support","An ""inverse"" resource offer means that Mesos is requesting resources back from the framework, possibly within some time interval. This can be leveraged initially to provide more automated cluster maintenance, by offering schedulers the opportunity to move tasks to compensate for planned maintenance. Operators can set a time limit on how long to wait for schedulers to relocate tasks before the tasks are forcibly terminated. Inverse resource offers have many other potential uses, as it opens the opportunity for the allocator to attempt to move tasks in the cluster through the co-operation of the framework, possibly providing better over-subscription, fairness, etc.",5 MESOS-1594,"SlaveRecoveryTest/0.ReconcileKillTask is flaky","Observed this on Jenkins. {code} [ RUN ] SlaveRecoveryTest/0.ReconcileKillTask Using temporary directory '/tmp/SlaveRecoveryTest_0_ReconcileKillTask_3zJ6DG' I0714 15:08:43.915114 27216 leveldb.cpp:176] Opened db in 474.695188ms I0714 15:08:43.933645 27216 leveldb.cpp:183] Compacted db in 18.068942ms I0714 15:08:43.934129 27216 leveldb.cpp:198] Created db iterator in 7860ns I0714 15:08:43.934439 27216 leveldb.cpp:204] Seeked to beginning of db in 2560ns I0714 15:08:43.934779 27216 leveldb.cpp:273] Iterated through 0 keys in the db in 1400ns I0714 15:08:43.935098 27216 replica.cpp:741] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0714 15:08:43.936027 27238 recover.cpp:425] Starting replica recovery I0714 15:08:43.936225 27238 recover.cpp:451] Replica is in EMPTY status I0714 15:08:43.936867 27238 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I0714 15:08:43.937049 27238 recover.cpp:188] Received a recover response from a replica in EMPTY status I0714 15:08:43.937232 27238 recover.cpp:542] Updating replica status to STARTING I0714 15:08:43.945600 27235 master.cpp:288] Master 20140714-150843-16842879-55850-27216 (quantal) started on 127.0.1.1:55850 I0714 15:08:43.945643 27235 master.cpp:325] Master only allowing authenticated frameworks to register I0714 15:08:43.945651 27235 master.cpp:330] Master only allowing authenticated slaves to register I0714 15:08:43.945658 27235 credentials.hpp:36] Loading credentials for authentication from '/tmp/SlaveRecoveryTest_0_ReconcileKillTask_3zJ6DG/credentials' I0714 15:08:43.945808 27235 master.cpp:359] Authorization enabled I0714 15:08:43.946369 27235 hierarchical_allocator_process.hpp:301] Initializing hierarchical allocator process with master : master@127.0.1.1:55850 I0714 15:08:43.946419 27235 master.cpp:122] No whitelist given. Advertising offers for all slaves I0714 15:08:43.946614 27235 master.cpp:1128] The newly elected leader is master@127.0.1.1:55850 with id 20140714-150843-16842879-55850-27216 I0714 15:08:43.946630 27235 master.cpp:1141] Elected as the leading master! I0714 15:08:43.946637 27235 master.cpp:959] Recovering from registrar I0714 15:08:43.946707 27235 registrar.cpp:313] Recovering registrar I0714 15:08:43.957895 27238 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 20.529301ms I0714 15:08:43.957978 27238 replica.cpp:320] Persisted replica status to STARTING I0714 15:08:43.958142 27238 recover.cpp:451] Replica is in STARTING status I0714 15:08:43.958664 27238 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I0714 15:08:43.958762 27238 recover.cpp:188] Received a recover response from a replica in STARTING status I0714 15:08:43.958945 27238 recover.cpp:542] Updating replica status to VOTING I0714 15:08:43.975685 27238 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 16.646136ms I0714 15:08:43.976367 27238 replica.cpp:320] Persisted replica status to VOTING I0714 15:08:43.976824 27241 recover.cpp:556] Successfully joined the Paxos group I0714 15:08:43.977072 27242 recover.cpp:440] Recover process terminated I0714 15:08:43.980590 27236 log.cpp:656] Attempting to start the writer I0714 15:08:43.981385 27236 replica.cpp:474] Replica received implicit promise request with proposal 1 I0714 15:08:43.999141 27236 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 17.705787ms I0714 15:08:43.999222 27236 replica.cpp:342] Persisted promised to 1 I0714 15:08:44.004451 27240 coordinator.cpp:230] Coordinator attemping to fill missing position I0714 15:08:44.004914 27240 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2 I0714 15:08:44.021456 27240 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 16.499775ms I0714 15:08:44.021533 27240 replica.cpp:676] Persisted action at 0 I0714 15:08:44.022006 27240 replica.cpp:508] Replica received write request for position 0 I0714 15:08:44.022043 27240 leveldb.cpp:438] Reading position from leveldb took 21376ns I0714 15:08:44.035969 27240 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 13.885907ms I0714 15:08:44.036365 27240 replica.cpp:676] Persisted action at 0 I0714 15:08:44.040156 27238 replica.cpp:655] Replica received learned notice for position 0 I0714 15:08:44.058082 27238 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 17.860707ms I0714 15:08:44.058161 27238 replica.cpp:676] Persisted action at 0 I0714 15:08:44.058176 27238 replica.cpp:661] Replica learned NOP action at position 0 I0714 15:08:44.058526 27238 log.cpp:672] Writer started with ending position 0 I0714 15:08:44.058872 27238 leveldb.cpp:438] Reading position from leveldb took 25660ns I0714 15:08:44.060556 27238 registrar.cpp:346] Successfully fetched the registry (0B) I0714 15:08:44.060845 27238 registrar.cpp:422] Attempting to update the 'registry' I0714 15:08:44.062304 27238 log.cpp:680] Attempting to append 120 bytes to the log I0714 15:08:44.062866 27236 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 1 I0714 15:08:44.063154 27236 replica.cpp:508] Replica received write request for position 1 I0714 15:08:44.082813 27236 leveldb.cpp:343] Persisting action (137 bytes) to leveldb took 19.61683ms I0714 15:08:44.082890 27236 replica.cpp:676] Persisted action at 1 I0714 15:08:44.083256 27236 replica.cpp:655] Replica received learned notice for position 1 I0714 15:08:44.097398 27236 leveldb.cpp:343] Persisting action (139 bytes) to leveldb took 14.104796ms I0714 15:08:44.097475 27236 replica.cpp:676] Persisted action at 1 I0714 15:08:44.097488 27236 replica.cpp:661] Replica learned APPEND action at position 1 I0714 15:08:44.098569 27236 registrar.cpp:479] Successfully updated 'registry' I0714 15:08:44.098906 27240 log.cpp:699] Attempting to truncate the log to 1 I0714 15:08:44.099608 27240 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 2 I0714 15:08:44.100005 27240 replica.cpp:508] Replica received write request for position 2 I0714 15:08:44.100566 27236 registrar.cpp:372] Successfully recovered registrar I0714 15:08:44.101227 27239 master.cpp:986] Recovered 0 slaves from the Registry (84B) ; allowing 10mins for slaves to re-register I0714 15:08:44.118376 27240 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 18.329495ms I0714 15:08:44.118455 27240 replica.cpp:676] Persisted action at 2 I0714 15:08:44.122258 27242 replica.cpp:655] Replica received learned notice for position 2 I0714 15:08:44.137336 27242 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 15.023553ms I0714 15:08:44.137460 27242 leveldb.cpp:401] Deleting ~1 keys from leveldb took 55049ns I0714 15:08:44.137480 27242 replica.cpp:676] Persisted action at 2 I0714 15:08:44.137492 27242 replica.cpp:661] Replica learned TRUNCATE action at position 2 I0714 15:08:44.143729 27216 containerizer.cpp:124] Using isolation: posix/cpu,posix/mem I0714 15:08:44.145934 27242 slave.cpp:168] Slave started on 43)@127.0.1.1:55850 I0714 15:08:44.145953 27242 credentials.hpp:84] Loading credential for authentication from '/tmp/SlaveRecoveryTest_0_ReconcileKillTask_Zl9DUt/credential' I0714 15:08:44.146040 27242 slave.cpp:266] Slave using credential for: test-principal I0714 15:08:44.146136 27242 slave.cpp:279] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0714 15:08:44.146198 27242 slave.cpp:324] Slave hostname: quantal I0714 15:08:44.146209 27242 slave.cpp:325] Slave checkpoint: true I0714 15:08:44.146708 27242 state.cpp:33] Recovering state from '/tmp/SlaveRecoveryTest_0_ReconcileKillTask_Zl9DUt/meta' I0714 15:08:44.146824 27242 status_update_manager.cpp:193] Recovering status update manager I0714 15:08:44.146901 27242 containerizer.cpp:287] Recovering containerizer I0714 15:08:44.147228 27242 slave.cpp:3126] Finished recovery I0714 15:08:44.147531 27242 slave.cpp:599] New master detected at master@127.0.1.1:55850 I0714 15:08:44.147562 27242 slave.cpp:675] Authenticating with master master@127.0.1.1:55850 I0714 15:08:44.147614 27242 slave.cpp:648] Detecting new master I0714 15:08:44.147652 27242 status_update_manager.cpp:167] New master detected at master@127.0.1.1:55850 I0714 15:08:44.147691 27242 authenticatee.hpp:128] Creating new client SASL connection I0714 15:08:44.148533 27235 master.cpp:3507] Authenticating slave(43)@127.0.1.1:55850 I0714 15:08:44.148666 27235 authenticator.hpp:156] Creating new server SASL connection I0714 15:08:44.149054 27242 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5 I0714 15:08:44.149447 27242 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5' I0714 15:08:44.149917 27236 authenticator.hpp:262] Received SASL authentication start I0714 15:08:44.149974 27236 authenticator.hpp:384] Authentication requires more steps I0714 15:08:44.150208 27242 authenticatee.hpp:265] Received SASL authentication step I0714 15:08:44.150720 27239 authenticator.hpp:290] Received SASL authentication step I0714 15:08:44.150749 27239 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'quantal' server FQDN: 'quantal' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0714 15:08:44.150758 27239 auxprop.cpp:153] Looking up auxiliary property '*userPassword' I0714 15:08:44.150771 27239 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0714 15:08:44.150781 27239 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'quantal' server FQDN: 'quantal' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0714 15:08:44.150787 27239 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0714 15:08:44.150792 27239 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0714 15:08:44.150804 27239 authenticator.hpp:376] Authentication success I0714 15:08:44.150848 27239 master.cpp:3547] Successfully authenticated principal 'test-principal' at slave(43)@127.0.1.1:55850 I0714 15:08:44.157696 27242 authenticatee.hpp:305] Authentication success I0714 15:08:44.158855 27242 slave.cpp:732] Successfully authenticated with master master@127.0.1.1:55850 I0714 15:08:44.158936 27242 slave.cpp:970] Will retry registration in 10.352612ms if necessary I0714 15:08:44.161813 27216 sched.cpp:139] Version: 0.20.0 I0714 15:08:44.162608 27236 sched.cpp:235] New master detected at master@127.0.1.1:55850 I0714 15:08:44.162637 27236 sched.cpp:285] Authenticating with master master@127.0.1.1:55850 I0714 15:08:44.162747 27236 authenticatee.hpp:128] Creating new client SASL connection I0714 15:08:44.163506 27239 master.cpp:2789] Registering slave at slave(43)@127.0.1.1:55850 (quantal) with id 20140714-150843-16842879-55850-27216-0 I0714 15:08:44.164086 27238 registrar.cpp:422] Attempting to update the 'registry' I0714 15:08:44.165694 27238 log.cpp:680] Attempting to append 295 bytes to the log I0714 15:08:44.166231 27240 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 3 I0714 15:08:44.166517 27240 replica.cpp:508] Replica received write request for position 3 I0714 15:08:44.167199 27239 master.cpp:3507] Authenticating scheduler-225679c4-a9fd-4119-9deb-c7712eba37e1@127.0.1.1:55850 I0714 15:08:44.167867 27241 authenticator.hpp:156] Creating new server SASL connection I0714 15:08:44.168058 27241 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5 I0714 15:08:44.168081 27241 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5' I0714 15:08:44.168107 27241 authenticator.hpp:262] Received SASL authentication start I0714 15:08:44.168149 27241 authenticator.hpp:384] Authentication requires more steps I0714 15:08:44.168176 27241 authenticatee.hpp:265] Received SASL authentication step I0714 15:08:44.168215 27241 authenticator.hpp:290] Received SASL authentication step I0714 15:08:44.168233 27241 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'quantal' server FQDN: 'quantal' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0714 15:08:44.168793 27241 auxprop.cpp:153] Looking up auxiliary property '*userPassword' I0714 15:08:44.168820 27241 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0714 15:08:44.168834 27241 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'quantal' server FQDN: 'quantal' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0714 15:08:44.168840 27241 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0714 15:08:44.168845 27241 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0714 15:08:44.168858 27241 authenticator.hpp:376] Authentication success I0714 15:08:44.168895 27241 authenticatee.hpp:305] Authentication success I0714 15:08:44.168970 27241 sched.cpp:359] Successfully authenticated with master master@127.0.1.1:55850 I0714 15:08:44.168987 27241 sched.cpp:478] Sending registration request to master@127.0.1.1:55850 I0714 15:08:44.169426 27239 master.cpp:1239] Queuing up registration request from scheduler-225679c4-a9fd-4119-9deb-c7712eba37e1@127.0.1.1:55850 because authentication is still in progress I0714 15:08:44.169958 27239 master.cpp:3547] Successfully authenticated principal 'test-principal' at scheduler-225679c4-a9fd-4119-9deb-c7712eba37e1@127.0.1.1:55850 I0714 15:08:44.170440 27241 slave.cpp:970] Will retry registration in 8.76707ms if necessary I0714 15:08:44.175359 27239 master.cpp:2777] Ignoring register slave message from slave(43)@127.0.1.1:55850 (quantal) as admission is already in progress I0714 15:08:44.175916 27239 master.cpp:1247] Received registration request from scheduler-225679c4-a9fd-4119-9deb-c7712eba37e1@127.0.1.1:55850 I0714 15:08:44.176298 27239 master.cpp:1207] Authorizing framework principal 'test-principal' to receive offers for role '*' I0714 15:08:44.176858 27239 master.cpp:1306] Registering framework 20140714-150843-16842879-55850-27216-0000 at scheduler-225679c4-a9fd-4119-9deb-c7712eba37e1@127.0.1.1:55850 I0714 15:08:44.177408 27236 sched.cpp:409] Framework registered with 20140714-150843-16842879-55850-27216-0000 I0714 15:08:44.177443 27236 sched.cpp:423] Scheduler::registered took 12527ns I0714 15:08:44.177727 27241 hierarchical_allocator_process.hpp:331] Added framework 20140714-150843-16842879-55850-27216-0000 I0714 15:08:44.177747 27241 hierarchical_allocator_process.hpp:724] No resources available to allocate! I0714 15:08:44.177753 27241 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 8120ns I0714 15:08:44.179908 27241 slave.cpp:970] Will retry registration in 66.781028ms if necessary I0714 15:08:44.180007 27241 master.cpp:2777] Ignoring register slave message from slave(43)@127.0.1.1:55850 (quantal) as admission is already in progress I0714 15:08:44.183082 27240 leveldb.cpp:343] Persisting action (314 bytes) to leveldb took 16.533189ms I0714 15:08:44.183125 27240 replica.cpp:676] Persisted action at 3 I0714 15:08:44.183465 27240 replica.cpp:655] Replica received learned notice for position 3 I0714 15:08:44.203276 27240 leveldb.cpp:343] Persisting action (316 bytes) to leveldb took 19.768951ms I0714 15:08:44.203376 27240 replica.cpp:676] Persisted action at 3 I0714 15:08:44.203392 27240 replica.cpp:661] Replica learned APPEND action at position 3 I0714 15:08:44.204033 27240 registrar.cpp:479] Successfully updated 'registry' I0714 15:08:44.204138 27240 log.cpp:699] Attempting to truncate the log to 3 I0714 15:08:44.204221 27240 master.cpp:2829] Registered slave 20140714-150843-16842879-55850-27216-0 at slave(43)@127.0.1.1:55850 (quantal) I0714 15:08:44.204241 27240 master.cpp:3975] Adding slave 20140714-150843-16842879-55850-27216-0 at slave(43)@127.0.1.1:55850 (quantal) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0714 15:08:44.204387 27240 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 4 I0714 15:08:44.204489 27240 slave.cpp:766] Registered with master master@127.0.1.1:55850; given slave ID 20140714-150843-16842879-55850-27216-0 I0714 15:08:44.204745 27240 slave.cpp:779] Checkpointing SlaveInfo to '/tmp/SlaveRecoveryTest_0_ReconcileKillTask_Zl9DUt/meta/slaves/20140714-150843-16842879-55850-27216-0/slave.info' I0714 15:08:44.204954 27240 hierarchical_allocator_process.hpp:444] Added slave 20140714-150843-16842879-55850-27216-0 (quantal) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (and cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] available) I0714 15:08:44.205023 27240 hierarchical_allocator_process.hpp:750] Offering cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140714-150843-16842879-55850-27216-0 to framework 20140714-150843-16842879-55850-27216-0000 I0714 15:08:44.205122 27240 hierarchical_allocator_process.hpp:706] Performed allocation for slave 20140714-150843-16842879-55850-27216-0 in 131192ns I0714 15:08:44.205189 27240 slave.cpp:2323] Received ping from slave-observer(32)@127.0.1.1:55850 I0714 15:08:44.205258 27240 master.hpp:801] Adding offer 20140714-150843-16842879-55850-27216-0 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140714-150843-16842879-55850-27216-0 (quantal) I0714 15:08:44.205303 27240 master.cpp:3454] Sending 1 offers to framework 20140714-150843-16842879-55850-27216-0000 I0714 15:08:44.205469 27240 sched.cpp:546] Scheduler::resourceOffers took 23591ns I0714 15:08:44.206351 27241 replica.cpp:508] Replica received write request for position 4 I0714 15:08:44.208353 27237 master.hpp:811] Removing offer 20140714-150843-16842879-55850-27216-0 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140714-150843-16842879-55850-27216-0 (quantal) I0714 15:08:44.208436 27237 master.cpp:2133] Processing reply for offers: [ 20140714-150843-16842879-55850-27216-0 ] on slave 20140714-150843-16842879-55850-27216-0 at slave(43)@127.0.1.1:55850 (quantal) for framework 20140714-150843-16842879-55850-27216-0000 I0714 15:08:44.208472 27237 master.cpp:2219] Authorizing framework principal 'test-principal' to launch task 4a6783aa-8d07-46e3-8399-2a5d047f0021 as user 'jenkins' I0714 15:08:44.208909 27237 master.hpp:773] Adding task 4a6783aa-8d07-46e3-8399-2a5d047f0021 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140714-150843-16842879-55850-27216-0 (quantal) I0714 15:08:44.208947 27237 master.cpp:2285] Launching task 4a6783aa-8d07-46e3-8399-2a5d047f0021 of framework 20140714-150843-16842879-55850-27216-0000 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140714-150843-16842879-55850-27216-0 at slave(43)@127.0.1.1:55850 (quantal) I0714 15:08:44.209090 27237 slave.cpp:1001] Got assigned task 4a6783aa-8d07-46e3-8399-2a5d047f0021 for framework 20140714-150843-16842879-55850-27216-0000 I0714 15:08:44.209190 27237 slave.cpp:3398] Checkpointing FrameworkInfo to '/tmp/SlaveRecoveryTest_0_ReconcileKillTask_Zl9DUt/meta/slaves/20140714-150843-16842879-55850-27216-0/frameworks/20140714-150843-16842879-55850-27216-0000/framework.info' I0714 15:08:44.209413 27237 slave.cpp:3405] Checkpointing framework pid 'scheduler-225679c4-a9fd-4119-9deb-c7712eba37e1@127.0.1.1:55850' to '/tmp/SlaveRecoveryTest_0_ReconcileKillTask_Zl9DUt/meta/slaves/20140714-150843-16842879-55850-27216-0/frameworks/20140714-150843-16842879-55850-27216-0000/framework.pid' I0714 15:08...",1 MESOS-1605,"Cleanup stout build setup","While investigating stout build setup for making it installable, I came across some discrepancies. stout tests are included in libprocess's Makefile instead of stout Makefile. stout's 3rd party dependencies (e.g., picojson) live in libprocess's 3rdparty directory instead of living in stout's (non-existent) 3rd party directory. It would be nice to fix these issues before making stout installable.",3 MESOS-1615,"Create design document for Optimistic Offers","As a first step toward Optimistic Offers, take the description from the epic and build an implementation design doc that can be shared for comments. Note: the links to the working group notes and design doc are located in the [JIRA Epic|MESOS-1607].",8 MESOS-1620,"Reconciliation does not send back tasks pending validation / authorization.","Per Vinod's feedback on https://reviews.apache.org/r/23542/, we do not send back TASK_STAGING for those tasks that are pending in the Master (validation / authorization still in progress). For both implicit and explicit task reconciliation, the master could reply with TASK_STAGING for these tasks, as this provides additional information to the framework.",3 MESOS-1624,"Apache Jenkins build fails due to -lsnappy is set when building leveldb","The failed build: https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Set-JAVA_HOME/2261/consoleFull {noformat:title=the log where -lsnappy is used when compiling leveldb} gzip -d -c ../../3rdparty/leveldb.tar.gz | tar xf - test ! -e ../../3rdparty/leveldb.patch || patch -d leveldb -p1 <../../3rdparty/leveldb.patch touch leveldb-stamp cd leveldb && \ make CC=""gcc"" CXX=""g++"" OPT=""-g -g2 -O2 -Wno-unused-local-typedefs -std=c++11 -fPIC"" make[5]: Entering directory `/home/jenkins/jenkins-slave/workspace/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Set-JAVA_HOME/build/mesos-0.20.0/_build/3rdparty/leveldb' g++ -pthread -lsnappy -shared -Wl,-soname -Wl,/home/jenkins/jenkins-slave/workspace/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Set-JAVA_HOME/build/mesos-0.20.0/_build/3rdparty/leveldb/libleveldb.so.1 -I. -I./include -fno-builtin-memcmp -pthread -DOS_LINUX -DLEVELDB_PLATFORM_POSIX -DSNAPPY -g -g2 -O2 -Wno-unused-local-typedefs -std=c++11 -fPIC -fPIC db/builder.cc db/c.cc db/db_impl.cc db/db_iter.cc db/dbformat.cc db/filename.cc db/log_reader.cc db/log_writer.cc db/memtable.cc db/repair.cc db/table_cache.cc db/version_edit.cc db/version_set.cc db/write_batch.cc table/block.cc table/block_builder.cc table/filter_block.cc table/format.cc table/iterator.cc table/merger.cc table/table.cc table/table_builder.cc table/two_level_iterator.cc util/arena.cc util/bloom.cc util/cache.cc util/coding.cc util/comparator.cc util/crc32c.cc util/env.cc util/env_posix.cc util/filter_policy.cc util/hash.cc util/histogram.cc util/logging.cc util/options.cc util/status.cc port/port_posix.cc -o libleveldb.so.1.4 ln -fs libleveldb.so.1.4 libleveldb.so ln -fs libleveldb.so.1.4 libleveldb.so.1 g++ -I. -I./include -fno-builtin-memcmp -pthread -DOS_LINUX -DLEVELDB_PLATFORM_POSIX -DSNAPPY -g -g2 -O2 -Wno-unused-local-typedefs -std=c++11 -fPIC -c db/builder.cc -o db/builder.o {noformat} {noformat:title=the error} /bin/bash ../libtool --tag=CXX --mode=link g++ -pthread -g -g2 -O2 -Wno-unused-local-typedefs -std=c++11 -o mesos-local local/mesos_local-main.o libmesos.la -lsasl2 -lcurl -lz -lrt libtool: link: g++ -pthread -g -g2 -O2 -Wno-unused-local-typedefs -std=c++11 -o .libs/mesos-local local/mesos_local-main.o ./.libs/libmesos.so -lsasl2 /usr/lib/x86_64-linux-gnu/libcurl.so -lz -lrt -pthread -Wl,-rpath -Wl,/home/jenkins/jenkins-slave/workspace/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Set-JAVA_HOME/build/mesos-0.20.0/_inst/lib ./.libs/libmesos.so: undefined reference to `snappy::RawCompress(char const*, unsigned long, char*, unsigned long*)' ./.libs/libmesos.so: undefined reference to `snappy::RawUncompress(char const*, unsigned long, char*)' ./.libs/libmesos.so: undefined reference to `snappy::GetUncompressedLength(char const*, unsigned long, unsigned long*)' ./.libs/libmesos.so: undefined reference to `snappy::MaxCompressedLength(unsigned long)' {noformat}",1 MESOS-1627,"Installed protobuf header files include wrong path to mesos header file","Playing with installed mesos headers, realized that we expect users to include the path to mesos directory (e.g., /usr/local/include/mesos) even though it is on the system path. This is because scheduler.pb.h etc include ""mesos.pb.h"" instead of ""mesos/mesos.pb.h"".",2 MESOS-1629,"GLOG Initialized twice if the Framework Scheduler also uses GLOG","{noformat} Could not create logging file: No such file or directory COULD NOT CREATE A LOGGINGFILE 20140722-205220.31450!F0722 20:52:20.494424 31450 utilities.cc:317] Check failed: !IsGoogleLoggingInitialized() You called InitGoogleLogging() twice! *** Check failure stack trace: *** @ 0x4399ce google::LogMessage::Fail() @ 0x43991d google::LogMessage::SendToLog() @ 0x43932e google::LogMessage::Flush() @ 0x43c0e5 google::LogMessageFatal::~LogMessageFatal() @ 0x44089f google::glog_internal_namespace_::InitGoogleLoggingUtilities() @ 0x43c409 google::InitGoogleLogging() @ 0x7f0bdd43b55c mesos::internal::logging::initialize() @ 0x7f0bdcf9564d mesos::scheduler::MesosProcess::MesosProcess() @ 0x7f0bdcf92de0 mesos::scheduler::Mesos::Mesos() @ 0x421483 heron::mesos::Scheduler::Scheduler() @ 0x4305dc main @ 0x7f0bd97159c4 __libc_start_main @ 0x420869 (unknown) Aborted {noformat}",2 MESOS-1645,"0.20.0 Release","I would like to volunteer to be the release manager for 0.20.0, which will be releasing the following major features: - Docker support in Mesos (MESOS-1524) - Container level network monitoring for mesos containerizer (MESOS-1228) - Authorization (MESOS-1342) - Framework rate limiting (MESOS-1306) - Enable building against installed third-party dependencies (MESOS-1071) I would like to track blockers for the release on this ticket.",5 MESOS-1649,"Network isolator should tolerate slave crashes while doing isolate/cleanup.","A slave may crash while we are installing/removing filters. The slave recovery for the network isolator should tolerate those partially installed filters. Also, we want to avoid leaking a filter on host eth0 and host lo. The current code cannot tolerate that, thus may cause the following error: {noformat} Failed to perform recovery: Collect failed: Failed to recover container d409a100-2afb-497c-864f-fe3002cf65d9 with pid 50405: No ephemeral ports found To remedy this do as follows: Step 1: rm -f /var/lib/mesos/meta/slaves/latest This ensures slave doesn't recover old live executors. Step 2: Restart the slave. {noformat}",3 MESOS-1664,"Inform framework when rate limiting is active","When we rate-limit messages from a framework, we should let them know so they can proactively back-off to avoid putting extra pressure on the master.",3 MESOS-1666,"Set maximum executors per slave to avoid overcommit of ephemeral ports","With network isolation, we statically assign ephemeral port ranges. As such there is a upper bound on the number of containers each slave can support. We should avoid sending offers for slaves that have hit that limit as any tasks will fail to launch and will be LOST. ",1 MESOS-1668,"Handle a temporary one-way master --> slave socket closure.","In MESOS-1529, we realized that it's possible for a slave to remain disconnected in the master if the following occurs: → Master and Slave connected operating normally. → Temporary one-way network failure, master→slave link breaks. → Master marks slave as disconnected. → Network restored and health checking continues normally, slave is not removed as a result. Slave does not attempt to re-register since it is receiving pings once again. → Slave remains disconnected according to the master, and the slave does not try to re-register. Bad! We were originally thinking of using a failover timeout in the master to remove these slaves that don't re-register. However, it can be dangerous when ZooKeeper issues are preventing the slave from re-registering with the master; we do not want to remove a ton of slaves in this situation. Rather, when the slave is health checking correctly but does not re-register within a timeout, we could send a registration request from the master to the slave, telling the slave that it must re-register. This message could also be used when receiving status updates (or other messages) from slaves that are disconnected in the master.",2 MESOS-1671,"Expose executor metrics for slave.","Expose the following metrics: slave/executors_registering slave/executors_running slave/executors_terminating slave/executors_terminated",2 MESOS-1672,"Add filter to allocator resourcesRecovered method","The allocator already allows filters to be added when resources are unused. It is useful to also allow the same behaviour in {{resourcesRecovered}}.",2 MESOS-1673,"The value of MASTER_PING_TIMEOUT is non-deterministic","Right now, it is declared as follows: {noformat} const Duration MASTER_PING_TIMEOUT = master::SLAVE_PING_TIMEOUT * master::MAX_SLAVE_PING_TIMEOUTS {noformat} Since static initialization order in C++ is undefined, MASTER_PING_TIMEOUT's value is non-deterministic. We've already observed that in tests (where MASTER_PING_TIMEOUT == 0).",1 MESOS-1674,"Kill private_resources and treat 'ephemeral_ports' as a resource.","As the first step to solve MESOS-1654, we need to kill private_resources in SlaveInfo and add a 'ephemeral_ports' resource. For now, the slave and the port mapping isolator will simply ignore the 'ephemeral_ports' resource in ExecutorInfo and TaskInfo, and make allocation by itself. We will revisit this once the overcommit race (MESOS-1466) is fixed.",3 MESOS-1676,"ZooKeeperMasterContenderDetectorTest.MasterDetectorTimedoutSession is flaky","{noformat:title=} [ RUN ] ZooKeeperMasterContenderDetectorTest.MasterDetectorTimedoutSession I0806 01:18:37.648684 17458 zookeeper_test_server.cpp:158] Started ZooKeeperTestServer on port 42069 2014-08-06 01:18:37,650:17458(0x2b4679ca5700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5 2014-08-06 01:18:37,650:17458(0x2b4679ca5700):ZOO_INFO@log_env@716: Client environment:host.name=lucid 2014-08-06 01:18:37,650:17458(0x2b4679ca5700):ZOO_INFO@log_env@723: Client environment:os.name=Linux 2014-08-06 01:18:37,650:17458(0x2b4679ca5700):ZOO_INFO@log_env@724: Client environment:os.arch=2.6.32-64-generic 2014-08-06 01:18:37,650:17458(0x2b4679ca5700):ZOO_INFO@log_env@725: Client environment:os.version=#128-Ubuntu SMP Tue Jul 15 08:32:40 UTC 2014 2014-08-06 01:18:37,651:17458(0x2b4679ca5700):ZOO_INFO@log_env@733: Client environment:user.name=(null) 2014-08-06 01:18:37,651:17458(0x2b4679ca5700):ZOO_INFO@log_env@741: Client environment:user.home=/home/jenkins 2014-08-06 01:18:37,651:17458(0x2b4679ca5700):ZOO_INFO@log_env@753: Client environment:user.dir=/var/jenkins/workspace/mesos-ubuntu-10.04-gcc/src 2014-08-06 01:18:37,651:17458(0x2b4679ca5700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=127.0.0.1:42069 sessionTimeout=5000 watcher=0x2b467450bc00 sessionId=0 sessionPasswd= context=0x1682db0 flags=0 2014-08-06 01:18:37,656:17458(0x2b468638b700):ZOO_INFO@check_events@1703: initiated connection to server [127.0.0.1:42069] 2014-08-06 01:18:37,669:17458(0x2b468638b700):ZOO_INFO@check_events@1750: session establishment complete on server [127.0.0.1:42069], sessionId=0x147aa6601cf0000, negotiated timeout=6000 I0806 01:18:37.671725 17486 group.cpp:313] Group process (group(37)@127.0.1.1:55561) connected to ZooKeeper I0806 01:18:37.671758 17486 group.cpp:787] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0) I0806 01:18:37.671771 17486 group.cpp:385] Trying to create path '/mesos' in ZooKeeper 2014-08-06 01:18:39,101:17458(0x2b4687394700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:36197] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2014-08-06 01:18:42,441:17458(0x2b4687394700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:36197] zk retcode=-4, errno=111(Connection refused): server refused to accept the client I0806 01:18:42.656673 17481 contender.cpp:131] Joining the ZK group I0806 01:18:42.662484 17484 contender.cpp:247] New candidate (id='0') has entered the contest for leadership I0806 01:18:42.663754 17481 detector.cpp:138] Detected a new leader: (id='0') I0806 01:18:42.663884 17481 group.cpp:658] Trying to get '/mesos/info_0000000000' in ZooKeeper I0806 01:18:42.664788 17483 detector.cpp:426] A new leading master (UPID=@128.150.152.0:10000) is detected 2014-08-06 01:18:42,666:17458(0x2b4679ea6700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5 2014-08-06 01:18:42,666:17458(0x2b4679ea6700):ZOO_INFO@log_env@716: Client environment:host.name=lucid 2014-08-06 01:18:42,666:17458(0x2b4679ea6700):ZOO_INFO@log_env@723: Client environment:os.name=Linux 2014-08-06 01:18:42,666:17458(0x2b4679ea6700):ZOO_INFO@log_env@724: Client environment:os.arch=2.6.32-64-generic 2014-08-06 01:18:42,666:17458(0x2b4679ea6700):ZOO_INFO@log_env@725: Client environment:os.version=#128-Ubuntu SMP Tue Jul 15 08:32:40 UTC 2014 2014-08-06 01:18:42,666:17458(0x2b4679ea6700):ZOO_INFO@log_env@733: Client environment:user.name=(null) 2014-08-06 01:18:42,666:17458(0x2b4679ea6700):ZOO_INFO@log_env@741: Client environment:user.home=/home/jenkins 2014-08-06 01:18:42,666:17458(0x2b4679ea6700):ZOO_INFO@log_env@753: Client environment:user.dir=/var/jenkins/workspace/mesos-ubuntu-10.04-gcc/src 2014-08-06 01:18:42,666:17458(0x2b4679ea6700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=127.0.0.1:42069 sessionTimeout=5000 watcher=0x2b467450bc00 sessionId=0 sessionPasswd= context=0x15c00f0 flags=0 2014-08-06 01:18:42,668:17458(0x2b4686d91700):ZOO_INFO@check_events@1703: initiated connection to server [127.0.0.1:42069] 2014-08-06 01:18:42,672:17458(0x2b4686d91700):ZOO_INFO@check_events@1750: session establishment complete on server [127.0.0.1:42069], sessionId=0x147aa6601cf0001, negotiated timeout=6000 I0806 01:18:42.673542 17485 group.cpp:313] Group process (group(38)@127.0.1.1:55561) connected to ZooKeeper I0806 01:18:42.673570 17485 group.cpp:787] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0) I0806 01:18:42.673580 17485 group.cpp:385] Trying to create path '/mesos' in ZooKeeper 2014-08-06 01:18:46,796:17458(0x2b468638b700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 2131ms 2014-08-06 01:18:46,796:17458(0x2b468638b700):ZOO_ERROR@handle_socket_error_msg@1643: Socket [127.0.0.1:42069] zk retcode=-7, errno=110(Connection timed out): connection to 127.0.0.1:42069 timed out (exceeded timeout by 131ms) 2014-08-06 01:18:46,796:17458(0x2b468638b700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 2131ms 2014-08-06 01:18:46,796:17458(0x2b4686d91700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 2115ms 2014-08-06 01:18:46,796:17458(0x2b4686d91700):ZOO_ERROR@handle_socket_error_msg@1643: Socket [127.0.0.1:42069] zk retcode=-7, errno=110(Connection timed out): connection to 127.0.0.1:42069 timed out (exceeded timeout by 115ms) 2014-08-06 01:18:46,796:17458(0x2b4686d91700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 2115ms 2014-08-06 01:18:46,799:17458(0x2b4687394700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 1025ms 2014-08-06 01:18:46,800:17458(0x2b4687394700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:36197] zk retcode=-4, errno=111(Connection refused): server refused to accept the client I0806 01:18:46.806895 17486 group.cpp:418] Lost connection to ZooKeeper, attempting to reconnect ... I0806 01:18:46.807857 17479 group.cpp:418] Lost connection to ZooKeeper, attempting to reconnect ... I0806 01:18:47.669064 17482 contender.cpp:131] Joining the ZK group 2014-08-06 01:18:47,669:17458(0x2b4686d91700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 2989ms 2014-08-06 01:18:47,669:17458(0x2b4686d91700):ZOO_INFO@check_events@1703: initiated connection to server [127.0.0.1:42069] 2014-08-06 01:18:47,671:17458(0x2b4686d91700):ZOO_INFO@check_events@1750: session establishment complete on server [127.0.0.1:42069], sessionId=0x147aa6601cf0001, negotiated timeout=6000 I0806 01:18:47.682868 17485 contender.cpp:247] New candidate (id='1') has entered the contest for leadership I0806 01:18:47.683404 17482 group.cpp:313] Group process (group(38)@127.0.1.1:55561) reconnected to ZooKeeper I0806 01:18:47.683445 17482 group.cpp:787] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0) I0806 01:18:47.685998 17482 detector.cpp:138] Detected a new leader: (id='0') I0806 01:18:47.686142 17482 group.cpp:658] Trying to get '/mesos/info_0000000000' in ZooKeeper I0806 01:18:47.687289 17479 detector.cpp:426] A new leading master (UPID=@128.150.152.0:10000) is detected 2014-08-06 01:18:47,687:17458(0x2b467a2a8700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5 2014-08-06 01:18:47,687:17458(0x2b467a2a8700):ZOO_INFO@log_env@716: Client environment:host.name=lucid 2014-08-06 01:18:47,687:17458(0x2b467a2a8700):ZOO_INFO@log_env@723: Client environment:os.name=Linux 2014-08-06 01:18:47,687:17458(0x2b467a2a8700):ZOO_INFO@log_env@724: Client environment:os.arch=2.6.32-64-generic 2014-08-06 01:18:47,687:17458(0x2b467a2a8700):ZOO_INFO@log_env@725: Client environment:os.version=#128-Ubuntu SMP Tue Jul 15 08:32:40 UTC 2014 2014-08-06 01:18:47,687:17458(0x2b467a2a8700):ZOO_INFO@log_env@733: Client environment:user.name=(null) 2014-08-06 01:18:47,687:17458(0x2b467a2a8700):ZOO_INFO@log_env@741: Client environment:user.home=/home/jenkins 2014-08-06 01:18:47,687:17458(0x2b467a2a8700):ZOO_INFO@log_env@753: Client environment:user.dir=/var/jenkins/workspace/mesos-ubuntu-10.04-gcc/src 2014-08-06 01:18:47,687:17458(0x2b467a2a8700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=127.0.0.1:42069 sessionTimeout=5000 watcher=0x2b467450bc00 sessionId=0 sessionPasswd= context=0x2b467c0421c0 flags=0 2014-08-06 01:18:47,699:17458(0x2b4687de6700):ZOO_INFO@check_events@1703: initiated connection to server [127.0.0.1:42069] 2014-08-06 01:18:47,712:17458(0x2b4687de6700):ZOO_INFO@check_events@1750: session establishment complete on server [127.0.0.1:42069], sessionId=0x147aa6601cf0002, negotiated timeout=6000 I0806 01:18:47.712846 17479 group.cpp:313] Group process (group(39)@127.0.1.1:55561) connected to ZooKeeper I0806 01:18:47.712873 17479 group.cpp:787] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0) I0806 01:18:47.712882 17479 group.cpp:385] Trying to create path '/mesos' in ZooKeeper I0806 01:18:47.714648 17479 detector.cpp:138] Detected a new leader: (id='0') I0806 01:18:47.714759 17479 group.cpp:658] Trying to get '/mesos/info_0000000000' in ZooKeeper I0806 01:18:47.716130 17479 detector.cpp:426] A new leading master (UPID=@128.150.152.0:10000) is detected 2014-08-06 01:18:47,718:17458(0x2b4686d91700):ZOO_ERROR@handle_socket_error_msg@1721: Socket [127.0.0.1:42069] zk retcode=-4, errno=112(Host is down): failed while receiving a server response I0806 01:18:47.718889 17479 group.cpp:418] Lost connection to ZooKeeper, attempting to reconnect ... 2014-08-06 01:18:47,720:17458(0x2b4687de6700):ZOO_ERROR@handle_socket_error_msg@1721: Socket [127.0.0.1:42069] zk retcode=-4, errno=112(Host is down): failed while receiving a server response I0806 01:18:47.720788 17484 group.cpp:418] Lost connection to ZooKeeper, attempting to reconnect ... I0806 01:18:47.724663 17458 zookeeper_test_server.cpp:122] Shutdown ZooKeeperTestServer on port 42069 2014-08-06 01:18:48,798:17458(0x2b468638b700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 4133ms 2014-08-06 01:18:48,798:17458(0x2b468638b700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:42069] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2014-08-06 01:18:49,720:17458(0x2b4686d91700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 33ms 2014-08-06 01:18:49,721:17458(0x2b4686d91700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:42069] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2014-08-06 01:18:49,722:17458(0x2b4687de6700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:42069] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2014-08-06 01:18:50,136:17458(0x2b4687394700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:36197] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2014-08-06 01:18:50,800:17458(0x2b468638b700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:42069] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2014-08-06 01:18:51,723:17458(0x2b4686d91700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:42069] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2014-08-06 01:18:51,723:17458(0x2b4687de6700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:42069] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2014-08-06 01:18:52,801:17458(0x2b468638b700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:42069] zk retcode=-4, errno=111(Connection refused): server refused to accept the client W0806 01:18:52.842553 17481 group.cpp:456] Timed out waiting to reconnect to ZooKeeper. Forcing ZooKeeper session (sessionId=147aa6601cf0000) expiration I0806 01:18:52.842911 17481 group.cpp:472] ZooKeeper session expired I0806 01:18:52.843468 17485 detector.cpp:126] The current leader (id=0) is lost I0806 01:18:52.843483 17485 detector.cpp:138] Detected a new leader: None I0806 01:18:52.843618 17485 contender.cpp:196] Membership cancelled: 0 2014-08-06 01:18:52,843:17458(0x2b4679aa4700):ZOO_INFO@zookeeper_close@2522: Freeing zookeeper resources for sessionId=0x147aa6601cf0000 2014-08-06 01:18:52,844:17458(0x2b4679aa4700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5 2014-08-06 01:18:52,844:17458(0x2b4679aa4700):ZOO_INFO@log_env@716: Client environment:host.name=lucid 2014-08-06 01:18:52,844:17458(0x2b4679aa4700):ZOO_INFO@log_env@723: Client environment:os.name=Linux 2014-08-06 01:18:52,844:17458(0x2b4679aa4700):ZOO_INFO@log_env@724: Client environment:os.arch=2.6.32-64-generic 2014-08-06 01:18:52,844:17458(0x2b4679aa4700):ZOO_INFO@log_env@725: Client environment:os.version=#128-Ubuntu SMP Tue Jul 15 08:32:40 UTC 2014 2014-08-06 01:18:52,844:17458(0x2b4679aa4700):ZOO_INFO@log_env@733: Client environment:user.name=(null) 2014-08-06 01:18:52,844:17458(0x2b4679aa4700):ZOO_INFO@log_env@741: Client environment:user.home=/home/jenkins 2014-08-06 01:18:52,844:17458(0x2b4679aa4700):ZOO_INFO@log_env@753: Client environment:user.dir=/var/jenkins/workspace/mesos-ubuntu-10.04-gcc/src 2014-08-06 01:18:52,844:17458(0x2b4679aa4700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=127.0.0.1:42069 sessionTimeout=5000 watcher=0x2b467450bc00 sessionId=0 sessionPasswd= context=0x1349ad0 flags=0 2014-08-06 01:18:52,844:17458(0x2b468698f700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:42069] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2014-08-06 01:18:53,473:17458(0x2b4687394700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:36197] zk retcode=-4, errno=111(Connection refused): server refused to accept the client W0806 01:18:53.720684 17480 group.cpp:456] Timed out waiting to reconnect to ZooKeeper. Forcing ZooKeeper session (sessionId=147aa6601cf0001) expiration I0806 01:18:53.721132 17480 group.cpp:472] ZooKeeper session expired I0806 01:18:53.721516 17479 detector.cpp:126] The current leader (id=0) is lost I0806 01:18:53.721534 17479 detector.cpp:138] Detected a new leader: None I0806 01:18:53.721696 17479 contender.cpp:196] Membership cancelled: 1 2014-08-06 01:18:53,721:17458(0x2b46798a3700):ZOO_INFO@zookeeper_close@2522: Freeing zookeeper resources for sessionId=0x147aa6601cf0001 2014-08-06 01:18:53,722:17458(0x2b46798a3700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5 2014-08-06 01:18:53,722:17458(0x2b46798a3700):ZOO_INFO@log_env@716: Client environment:host.name=lucid 2014-08-06 01:18:53,722:17458(0x2b46798a3700):ZOO_INFO@log_env@723: Client environment:os.name=Linux 2014-08-06 01:18:53,722:17458(0x2b46798a3700):ZOO_INFO@log_env@724: Client environment:os.arch=2.6.32-64-generic 2014-08-06 01:18:53,722:17458(0x2b46798a3700):ZOO_INFO@log_env@725: Client environment:os.version=#128-Ubuntu SMP Tue Jul 15 08:32:40 UTC 2014 2014-08-06 01:18:53,722:17458(0x2b46798a3700):ZOO_INFO@log_env@733: Client environment:user.name=(null) 2014-08-06 01:18:53,722:17458(0x2b46798a3700):ZOO_INFO@log_env@741: Client environment:user.home=/home/jenkins 2014-08-06 01:18:53,722:17458(0x2b46798a3700):ZOO_INFO@log_env@753: Client environment:user.dir=/var/jenkins/workspace/mesos-ubuntu-10.04-gcc/src 2014-08-06 01:18:53,722:17458(0x2b46798a3700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=127.0.0.1:42069 sessionTimeout=5000 watcher=0x2b467450bc00 sessionId=0 sessionPasswd= context=0x16a0550 flags=0 2014-08-06 01:18:53,723:17458(0x2b4686f92700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:42069] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2014-08-06 01:18:53,726:17458(0x2b4687de6700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:42069] zk retcode=-4, errno=111(Connection refused): server refused to accept the client W0806 01:18:53.730258 17479 group.cpp:456] Timed out waiting to reconnect to ZooKeeper. Forcing ZooKeeper session (sessionId=147aa6601cf0002) expiration I0806 01:18:53.730736 17479 group.cpp:472] ZooKeeper session expired I0806 01:18:53.731081 17481 detector.cpp:126] The current leader (id=0) is lost I0806 01:18:53.731132 17481 detector.cpp:138] Detected a new leader: None 2014-08-06 01:18:53,731:17458(0x2b46796a2700):ZOO_INFO@zookeeper_close@2522: Freeing zookeeper resources for sessionId=0x147aa6601cf0002 2014-08-06 01:18:53,731:17458(0x2b46796a2700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5 2014-08-06 01:18:53,731:17458(0x2b46796a2700):ZOO_INFO@log_env@716: Client environment:host.name=lucid 2014-08-06 01:18:53,731:17458(0x2b46796a2700):ZOO_INFO@log_env@723: Client environment:os.name=Linux 2014-08-06 01:18:53,731:17458(0x2b46796a2700):ZOO_INFO@log_env@724: Client environment:os.arch=2.6.32-64-generic 2014-08-06 01:18:53,731:17458(0x2b46796a2700):ZOO_INFO@log_env@725: Client environment:os.version=#128-Ubuntu SMP Tue Jul 15 08:32:40 UTC 2014 2014-08-06 01:18:53,731:17458(0x2b46796a2700):ZOO_INFO@log_env@733: Client environment:user.name=(null) 2014-08-06 01:18:53,731:17458(0x2b46796a2700):ZOO_INFO@log_env@741: Client environment:user.home=/home/jenkins 2014-08-06 01:18:53,732:17458(0x2b46796a2700):ZOO_INFO@log_env@753: Client environment:user.dir=/var/jenkins/workspace/mesos-ubuntu-10.04-gcc/src 2014-08-06 01:18:53,732:17458(0x2b46796a2700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=127.0.0.1:42069 sessionTimeout=5000 watcher=0x2b467450bc00 sessionId=0 sessionPasswd= context=0x2b467c035f30 flags=0 2014-08-06 01:18:53,733:17458(0x2b4687be5700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:42069] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2014-08-06 01:18:54,512:17458(0x2b468698f700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:42069] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2014-08-06 01:18:55,393:17458(0x2b4686f92700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:42069] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2014-08-06 01:18:55,403:17458(0x2b4687be5700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:42069] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2014-08-06 01:18:56,301:17458(0x2b468698f700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 122ms 2014-08-06 01:18:56,302:17458(0x2b468698f700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:42069] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2014-08-06 01:18:56,809:17458(0x2b4687394700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:36197] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2014-08-06 01:18:57,939:17458(0x2b4686f92700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 879ms 2014-08-06 01:18:57,940:17458(0x2b4686f92700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:42069] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2014-08-06 01:18:57,940:17458(0x2b4687be5700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 870ms 2014-08-06 01:18:57,940:17458(0x2b4687be5700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:42069] zk retcode=-4, errno=111(Connection refused): server refused to accept the client tests/master_contender_detector_tests.cpp:574: Failure Failed to wait 10secs for leaderReconnecting 2014-08-06 01:18:57,941:17458(0x2b46794a0120):ZOO_INFO@zookeeper_close@2522: Freeing zookeeper resources for sessionId=0 I0806 01:18:57.949972 17458 contender.cpp:186] Now cancelling the membership: 1 2014-08-06 01:1...",1 MESOS-1677,"AllocatorTest.FrameworkReregistersFirst is flaky.","{noformat} GMOCK WARNING: Uninteresting mock function call - taking default action specified at: ../../../mesos/src/tests/mesos.hpp:566: Function call: resourcesRecovered(@0x7f38f40043e8 20140806-190304-2081170186-36159-24511-0000, @0x7f38f40043c8 20140806-190304-2081170186-36159-24511-0, @0x7f38f40043b0 { cpus(*):2, mem(*):1024, disk(*):464204, ports(*):[31000-32000] }) {noformat}",2 MESOS-1683,"Create user doc for framework rate limiting feature","Create a Markdown doc under /docs",2 MESOS-1690,"Expose metric for container destroy failures","Increment counter when container destroy fails.",3 MESOS-1694,"Future::failure should return a const string&",NULL,1 MESOS-1695,"The stats.json endpoint on the slave exposes ""registered"" as a string.","The slave is currently exposing a string value for the ""registered"" statistic, this should be a number: {code} slave:5051/stats.json { ""recovery_errors"": 0, ""registered"": ""1"", ""slave/executors_registering"": 0, ... } {code} Should be a pretty straightforward fix, looks like this first originated back in 2013: {code} commit b8291304e1523eb67ea8dc5f195cdb0d8e7d8348 Author: Vinod Kone Date: Wed Jul 3 12:37:36 2013 -0700 Added a ""registered"" key/value pair to slave's stats.json. Review: https://reviews.apache.org/r/12256 diff --git a/src/slave/http.cpp b/src/slave/http.cpp index dc2955f..dd51516 100644 --- a/src/slave/http.cpp +++ b/src/slave/http.cpp @@ -281,6 +281,8 @@ Future Slave::Http::stats(const Request& request) object.values[""lost_tasks""] = slave.stats.tasks[TASK_LOST]; object.values[""valid_status_updates""] = slave.stats.validStatusUpdates; object.values[""invalid_status_updates""] = slave.stats.invalidStatusUpdates; + object.values[""registered""] = slave.master ? ""1"" : ""0""; + return OK(object, request.query.get(""jsonp"")); } {code}",1 MESOS-1696,"Improve reconciliation between master and slave.","As we update the Master to keep tasks in memory until they are both terminal and acknowledged (MESOS-1410), the lifetime of tasks in Mesos will look as follows: {code} Master Slave {} {} {Tn} {} // Master receives Task T, non-terminal. Forwards to slave. {Tn} {Tn} // Slave receives Task T, non-terminal. {Tn} {Tt} // Task becomes terminal on slave. Update forwarded. {Tt} {Tt} // Master receives update, forwards to framework. {} {Tt} // Master receives ack, forwards to slave. {} {} // Slave receives ack. {code} In the current form of reconciliation, the slave sends to the master all tasks that are not both terminal and acknowledged. At any point in the above lifecycle, the slave's re-registration message can reach the master. Note the following properties: *(1)* The master may have a non-terminal task, not present in the slave's re-registration message. *(2)* The master may have a non-terminal task, present in the slave's re-registration message but in a different state. *(3)* The slave's re-registration message may contain a terminal unacknowledged task unknown to the master. In the current master / slave [reconciliation|https://github.com/apache/mesos/blob/0.19.1/src/master/master.cpp#L3146] code, the master assumes that case (1) is because a launch task message was dropped, and it sends TASK_LOST. We've seen above that (1) can happen even when the task reaches the slave correctly, so this can lead to inconsistency! After chatting with [~vinodkone], we're considering updating the reconciliation to occur as follows: → Slave sends all tasks that are not both terminal and acknowledged, during re-registration. This is the same as before. → If the master sees tasks that are missing in the slave, the master sends the tasks that need to be reconciled to the slave for the tasks. This can be piggy-backed on the re-registration message. → The slave will send TASK_LOST if the task is not known to it. Preferably in a retried manner, unless we update socket closure on the slave to force a re-registration.",3 MESOS-1698,"make check segfaults","Observed this on Apache CI: https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Set-JAVA_HOME/2331/consoleFull It looks like the segfault happens before any tests are run. So I suspect somewhere in the setup phase of the tests. {code} mv -f .deps/tests-time_tests.Tpo .deps/tests-time_tests.Po /bin/bash ./libtool --tag=CXX --mode=link g++ -g -g2 -O2 -Wno-unused-local-typedefs -std=c++11 -o tests tests-decoder_tests.o tests-encoder_tests.o tests-http_tests.o tests-io_tests.o tests-main.o tests-mutex_tests.o tests-metrics_tests.o tests-owned_tests.o tests-process_tests.o tests-queue_tests.o tests-reap_tests.o tests-sequence_tests.o tests-shared_tests.o tests-statistics_tests.o tests-subprocess_tests.o tests-system_tests.o tests-timeseries_tests.o tests-time_tests.o 3rdparty/libgmock.la libprocess.la 3rdparty/glog-0.3.3/libglog.la 3rdparty/libry_http_parser.la 3rdparty/libev-4.15/libev.la -lz -lrt libtool: link: g++ -g -g2 -O2 -Wno-unused-local-typedefs -std=c++11 -o tests tests-decoder_tests.o tests-encoder_tests.o tests-http_tests.o tests-io_tests.o tests-main.o tests-mutex_tests.o tests-metrics_tests.o tests-owned_tests.o tests-process_tests.o tests-queue_tests.o tests-reap_tests.o tests-sequence_tests.o tests-shared_tests.o tests-statistics_tests.o tests-subprocess_tests.o tests-system_tests.o tests-timeseries_tests.o tests-time_tests.o 3rdparty/.libs/libgmock.a ./.libs/libprocess.a /home/jenkins/jenkins-slave/workspace/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Set-JAVA_HOME/build/3rdparty/libprocess/3rdparty/glog-0.3.3/.libs/libglog.a /home/jenkins/jenkins-slave/workspace/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Set-JAVA_HOME/build/3rdparty/libprocess/3rdparty/libev-4.15/.libs/libev.a 3rdparty/glog-0.3.3/.libs/libglog.a -lpthread 3rdparty/.libs/libry_http_parser.a 3rdparty/libev-4.15/.libs/libev.a -lm -lz -lrt make[5]: Leaving directory `/home/jenkins/jenkins-slave/workspace/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Set-JAVA_HOME/build/3rdparty/libprocess' make check-local make[5]: Entering directory `/home/jenkins/jenkins-slave/workspace/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Set-JAVA_HOME/build/3rdparty/libprocess' ./tests Note: Google Test filter = [==========] Running 0 tests from 0 test cases. [==========] 0 tests from 0 test cases ran. (0 ms total) [ PASSED ] 0 tests. YOU HAVE 3 DISABLED TESTS make[5]: *** [check-local] Segmentation fault make[5]: Leaving directory `/home/jenkins/jenkins-slave/workspace/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Set-JAVA_HOME/build/3rdparty/libprocess' make[4]: *** [check-am] Error 2 make[4]: Leaving directory `/home/jenkins/jenkins-slave/workspace/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Set-JAVA_HOME/build/3rdparty/libprocess' make[3]: *** [check-recursive] Error 1 make[3]: Leaving directory `/home/jenkins/jenkins-slave/workspace/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Set-JAVA_HOME/build/3rdparty/libprocess' make[2]: *** [check-recursive] Error 1 make[2]: Leaving directory `/home/jenkins/jenkins-slave/workspace/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Set-JAVA_HOME/build/3rdparty' make[1]: *** [check] Error 2 make[1]: Leaving directory `/home/jenkins/jenkins-slave/workspace/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Set-JAVA_HOME/build/3rdparty' make: *** [check-recursive] Error 1 Build step 'Execute shell' marked build as failure Sending e-mails to: dev@mesos.apache.org benjamin.hindman@gmail.com dhamon@twitter.com yujie.jay@gmail.com Finished: FAILURE {code}",2 MESOS-1702,"Add document for network monitoring.","The doc should tell the user how to use the new network monitoring feature.",2 MESOS-1703,"better error message when replicated log hasn't been initialized","Aurora uses the mesos replicated log. If you don't run ""mesos-log initialize"" before starting aurora you'll get INFO messages in your aurora log: {code} I0814 15:18:38.346638 25141 replica.cpp:633] Replica in EMPTY status received a broadcasted recover request I0814 15:18:38.346796 25132 recover.cpp:220] Received a recover response from a replica in EMPTY status {code} It is has been deemed too dangerous to automatically run mesos-log initialize for the user (see AURORA-243). It would be helpful if that error message was made more friendly and at the ERROR level. The message could explain what the user should do and the implications of doing so. Links to the docs would be helpful. See http://wilderness.apache.org/channels/?f=aurora/2014-08-14#1408055261 for context",1 MESOS-1705,"SubprocessTest.Status sometimes flakes out","It's a pretty rare event, but happened more then once. [ RUN ] SubprocessTest.Status *** Aborted at 1408023909 (unix time) try ""date -d @1408023909"" if you are using GNU date *** PC: @ 0x35700094b1 (unknown) *** SIGTERM (@0x3e8000041d8) received by PID 16872 (TID 0x7fa9ea426780) from PID 16856; stack trace: *** @ 0x3570435cb0 (unknown) @ 0x35700094b1 (unknown) @ 0x3570009d9f (unknown) @ 0x357000e726 (unknown) @ 0x3570015185 (unknown) @ 0x5ead42 process::childMain() @ 0x5ece8d std::_Function_handler<>::_M_invoke() @ 0x5eac9c process::defaultClone() @ 0x5ebbd4 process::subprocess() @ 0x55a229 process::subprocess() @ 0x55a846 process::subprocess() @ 0x54224c SubprocessTest_Status_Test::TestBody() @ 0x7fa9ea460323 (unknown) @ 0x7fa9ea455b67 (unknown) @ 0x7fa9ea455c0e (unknown) @ 0x7fa9ea455d15 (unknown) @ 0x7fa9ea4593a8 (unknown) @ 0x7fa9ea459647 (unknown) @ 0x422466 main @ 0x3570421d65 (unknown) @ 0x4260bd (unknown) [ OK ] SubprocessTest.Status (153 ms)",2 MESOS-1712,"Automate disallowing of commits mixing mesos/libprocess/stout","For various reasons, we don't want to mix mesos/libprocess/stout changes into a single commit. Typically, it is up to the reviewee/reviewer to catch this. It wold be nice to automate this via the pre-commit hook .",2 MESOS-1715,"The slave does not send pending tasks during re-registration.","In what looks like an oversight, the pending tasks and executors in the slave (Framework::pending) are not sent in the re-registration message. For tasks, this can lead to spurious TASK_LOST notifications being generated by the master when it falsely thinks the tasks are not present on the slave.",3 MESOS-1717,"The slave does not show pending tasks in the JSON endpoints.","The slave does not show pending tasks in the /state.json endpoint. This is a bit tricky to add since we rely on knowing the executor directory.",1 MESOS-1718,"Command executor can overcommit the slave.","Currently we give a small amount of resources to the command executor, in addition to resources used by the command task: https://github.com/apache/mesos/blob/0.20.0-rc1/src/slave/slave.cpp#L2448 {code: title=} ExecutorInfo Slave::getExecutorInfo( const FrameworkID& frameworkId, const TaskInfo& task) { ... // Add an allowance for the command executor. This does lead to a // small overcommit of resources. executor.mutable_resources()->MergeFrom( Resources::parse( ""cpus:"" + stringify(DEFAULT_EXECUTOR_CPUS) + "";"" + ""mem:"" + stringify(DEFAULT_EXECUTOR_MEM.megabytes())).get()); ... } {code} This leads to an overcommit of the slave. Ideally, for command tasks we can ""transfer"" all of the task resources to the executor at the slave / isolation level.",3 MESOS-1727,"Configure fails with ../configure: line 18439: syntax error near unexpected token `PROTOBUFPREFIX,'","I followed the ""Getting started"" documentation and did: {noformat} $ git clone http://git-wip-us.apache.org/repos/asf/mesos.git; cd mesos $ ./bootstrap $ mkdir build; cd build $ ../configure {noformat} which aborts with {noformat} .... .... checking whether we are using the GNU C compiler... (cached) yes checking whether gcc accepts -g... (cached) yes checking for gcc option to accept ISO C89... (cached) none needed checking dependency style of gcc... (cached) gcc3 ../configure: line 18439: syntax error near unexpected token `PROTOBUFPREFIX,' ../configure: line 18439: ` PKG_CHECK_MODULES(PROTOBUFPREFIX,' {noformat}",2 MESOS-1728,"Libprocess: report bind parameters on failure","When you attempt to start slave or master and there's another one already running there, it is nice to report what are the actual parameters to {{bind}} call that failed.",1 MESOS-1733,"Change the stout path utility to declare a single, variadic 'join' function instead of several separate declarations of various discrete arities",NULL,5 MESOS-1739,"Allow slave reconfiguration on restart","Make it so that either via a slave restart or a out of process ""reconfigure"" ping, the attributes and resources of a slave can be updated to be a superset of what they used to be.",3 MESOS-1748,"MasterZooKeeperTest.LostZooKeeperCluster is flaky","{noformat:title=} tests/master_tests.cpp:1795: Failure Failed to wait 10secs for slaveRegisteredMessage {noformat} Should have placed the FUTURE_MESSAGE that attempts to capture this messages before the slave starts...",1 MESOS-1749,"SlaveRecoveryTest.ShutdownSlave is flaky","{noformat} [ RUN ] SlaveRecoveryTest/0.ShutdownSlave Using temporary directory '/tmp/SlaveRecoveryTest_0_ShutdownSlave_3O5epS' I0828 21:21:46.206990 27625 leveldb.cpp:176] Opened db in 24.461837ms I0828 21:21:46.213706 27625 leveldb.cpp:183] Compacted db in 6.021499ms I0828 21:21:46.214047 27625 leveldb.cpp:198] Created db iterator in 5566ns I0828 21:21:46.214313 27625 leveldb.cpp:204] Seeked to beginning of db in 1433ns I0828 21:21:46.214515 27625 leveldb.cpp:273] Iterated through 0 keys in the db in 723ns I0828 21:21:46.214826 27625 replica.cpp:741] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0828 21:21:46.215409 27642 recover.cpp:425] Starting replica recovery I0828 21:21:46.215718 27642 recover.cpp:451] Replica is in EMPTY status I0828 21:21:46.216264 27642 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I0828 21:21:46.216557 27642 recover.cpp:188] Received a recover response from a replica in EMPTY status I0828 21:21:46.216917 27642 recover.cpp:542] Updating replica status to STARTING I0828 21:21:46.221271 27645 master.cpp:286] Master 20140828-212146-16842879-45613-27625 (saucy) started on 127.0.1.1:45613 I0828 21:21:46.221812 27645 master.cpp:332] Master only allowing authenticated frameworks to register I0828 21:21:46.222038 27645 master.cpp:337] Master only allowing authenticated slaves to register I0828 21:21:46.222250 27645 credentials.hpp:36] Loading credentials for authentication from '/tmp/SlaveRecoveryTest_0_ShutdownSlave_3O5epS/credentials' I0828 21:21:46.222585 27645 master.cpp:366] Authorization enabled I0828 21:21:46.222885 27642 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 5.596969ms I0828 21:21:46.223085 27642 replica.cpp:320] Persisted replica status to STARTING I0828 21:21:46.223424 27642 recover.cpp:451] Replica is in STARTING status I0828 21:21:46.223933 27642 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I0828 21:21:46.224984 27642 recover.cpp:188] Received a recover response from a replica in STARTING status I0828 21:21:46.225385 27642 recover.cpp:542] Updating replica status to VOTING I0828 21:21:46.224750 27646 master.cpp:1205] The newly elected leader is master@127.0.1.1:45613 with id 20140828-212146-16842879-45613-27625 I0828 21:21:46.226132 27646 master.cpp:1218] Elected as the leading master! I0828 21:21:46.226349 27646 master.cpp:1036] Recovering from registrar I0828 21:21:46.226637 27646 registrar.cpp:313] Recovering registrar I0828 21:21:46.224473 27641 master.cpp:120] No whitelist given. Advertising offers for all slaves I0828 21:21:46.224431 27645 hierarchical_allocator_process.hpp:299] Initializing hierarchical allocator process with master : master@127.0.1.1:45613 I0828 21:21:46.240932 27642 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 15.182422ms I0828 21:21:46.241453 27642 replica.cpp:320] Persisted replica status to VOTING I0828 21:21:46.241926 27643 recover.cpp:556] Successfully joined the Paxos group I0828 21:21:46.242228 27642 recover.cpp:440] Recover process terminated I0828 21:21:46.242501 27645 log.cpp:656] Attempting to start the writer I0828 21:21:46.243247 27645 replica.cpp:474] Replica received implicit promise request with proposal 1 I0828 21:21:46.253456 27645 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 9.95472ms I0828 21:21:46.253955 27645 replica.cpp:342] Persisted promised to 1 I0828 21:21:46.254518 27645 coordinator.cpp:230] Coordinator attemping to fill missing position I0828 21:21:46.255234 27641 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2 I0828 21:21:46.263128 27641 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 7.484042ms I0828 21:21:46.263536 27641 replica.cpp:676] Persisted action at 0 I0828 21:21:46.263806 27641 replica.cpp:508] Replica received write request for position 0 I0828 21:21:46.263834 27641 leveldb.cpp:438] Reading position from leveldb took 14063ns I0828 21:21:46.276149 27641 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 12.295476ms I0828 21:21:46.276178 27641 replica.cpp:676] Persisted action at 0 I0828 21:21:46.276319 27641 replica.cpp:655] Replica received learned notice for position 0 I0828 21:21:46.285523 27641 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 9.185244ms I0828 21:21:46.285552 27641 replica.cpp:676] Persisted action at 0 I0828 21:21:46.285560 27641 replica.cpp:661] Replica learned NOP action at position 0 I0828 21:21:46.289685 27642 log.cpp:672] Writer started with ending position 0 I0828 21:21:46.290166 27642 leveldb.cpp:438] Reading position from leveldb took 14463ns I0828 21:21:46.297260 27642 registrar.cpp:346] Successfully fetched the registry (0B) I0828 21:21:46.297622 27642 registrar.cpp:422] Attempting to update the 'registry' I0828 21:21:46.298893 27645 log.cpp:680] Attempting to append 118 bytes to the log I0828 21:21:46.299190 27645 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 1 I0828 21:21:46.299643 27645 replica.cpp:508] Replica received write request for position 1 I0828 21:21:46.310351 27645 leveldb.cpp:343] Persisting action (135 bytes) to leveldb took 10.349409ms I0828 21:21:46.310577 27645 replica.cpp:676] Persisted action at 1 I0828 21:21:46.311039 27645 replica.cpp:655] Replica received learned notice for position 1 I0828 21:21:46.322127 27645 leveldb.cpp:343] Persisting action (137 bytes) to leveldb took 10.858061ms I0828 21:21:46.322614 27645 replica.cpp:676] Persisted action at 1 I0828 21:21:46.322875 27645 replica.cpp:661] Replica learned APPEND action at position 1 I0828 21:21:46.323480 27645 registrar.cpp:479] Successfully updated 'registry' I0828 21:21:46.323874 27645 registrar.cpp:372] Successfully recovered registrar I0828 21:21:46.323649 27639 log.cpp:699] Attempting to truncate the log to 1 I0828 21:21:46.324465 27644 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 2 I0828 21:21:46.324988 27644 replica.cpp:508] Replica received write request for position 2 I0828 21:21:46.325335 27643 master.cpp:1063] Recovered 0 slaves from the Registry (82B) ; allowing 10mins for slaves to re-register I0828 21:21:46.335847 27644 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 10.651398ms I0828 21:21:46.336320 27644 replica.cpp:676] Persisted action at 2 I0828 21:21:46.336896 27644 replica.cpp:655] Replica received learned notice for position 2 I0828 21:21:46.345854 27644 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 8.540555ms I0828 21:21:46.346261 27644 leveldb.cpp:401] Deleting ~1 keys from leveldb took 30183ns I0828 21:21:46.346282 27644 replica.cpp:676] Persisted action at 2 I0828 21:21:46.346315 27644 replica.cpp:661] Replica learned TRUNCATE action at position 2 I0828 21:21:46.356840 27625 containerizer.cpp:89] Using isolation: posix/cpu,posix/mem I0828 21:21:46.361413 27644 slave.cpp:167] Slave started on 48)@127.0.1.1:45613 I0828 21:21:46.361753 27644 credentials.hpp:84] Loading credential for authentication from '/tmp/SlaveRecoveryTest_0_ShutdownSlave_umhraW/credential' I0828 21:21:46.362046 27644 slave.cpp:274] Slave using credential for: test-principal I0828 21:21:46.362810 27644 slave.cpp:287] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0828 21:21:46.363088 27644 slave.cpp:315] Slave hostname: saucy I0828 21:21:46.363301 27644 slave.cpp:316] Slave checkpoint: true I0828 21:21:46.363986 27644 state.cpp:33] Recovering state from '/tmp/SlaveRecoveryTest_0_ShutdownSlave_umhraW/meta' I0828 21:21:46.364308 27644 status_update_manager.cpp:193] Recovering status update manager I0828 21:21:46.364600 27644 containerizer.cpp:252] Recovering containerizer I0828 21:21:46.365325 27646 slave.cpp:3204] Finished recovery I0828 21:21:46.365839 27646 slave.cpp:598] New master detected at master@127.0.1.1:45613 I0828 21:21:46.366041 27646 slave.cpp:672] Authenticating with master master@127.0.1.1:45613 I0828 21:21:46.366317 27646 slave.cpp:645] Detecting new master I0828 21:21:46.366569 27646 status_update_manager.cpp:167] New master detected at master@127.0.1.1:45613 I0828 21:21:46.366827 27646 authenticatee.hpp:128] Creating new client SASL connection I0828 21:21:46.367204 27646 master.cpp:3637] Authenticating slave(48)@127.0.1.1:45613 I0828 21:21:46.367553 27646 authenticator.hpp:156] Creating new server SASL connection I0828 21:21:46.367857 27646 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5 I0828 21:21:46.368031 27646 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5' I0828 21:21:46.368228 27646 authenticator.hpp:262] Received SASL authentication start I0828 21:21:46.368444 27646 authenticator.hpp:384] Authentication requires more steps I0828 21:21:46.368648 27646 authenticatee.hpp:265] Received SASL authentication step I0828 21:21:46.368924 27646 authenticator.hpp:290] Received SASL authentication step I0828 21:21:46.369120 27646 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'saucy' server FQDN: 'saucy' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0828 21:21:46.369350 27646 auxprop.cpp:153] Looking up auxiliary property '*userPassword' I0828 21:21:46.369544 27646 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0828 21:21:46.369730 27646 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'saucy' server FQDN: 'saucy' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0828 21:21:46.369958 27646 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0828 21:21:46.370131 27646 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0828 21:21:46.370311 27646 authenticator.hpp:376] Authentication success I0828 21:21:46.370518 27646 authenticatee.hpp:305] Authentication success I0828 21:21:46.370637 27642 master.cpp:3677] Successfully authenticated principal 'test-principal' at slave(48)@127.0.1.1:45613 I0828 21:21:46.371772 27641 slave.cpp:729] Successfully authenticated with master master@127.0.1.1:45613 I0828 21:21:46.371984 27641 slave.cpp:980] Will retry registration in 15.311045ms if necessary I0828 21:21:46.372643 27641 master.cpp:2836] Registering slave at slave(48)@127.0.1.1:45613 (saucy) with id 20140828-212146-16842879-45613-27625-0 I0828 21:21:46.373016 27641 registrar.cpp:422] Attempting to update the 'registry' I0828 21:21:46.374539 27641 log.cpp:680] Attempting to append 289 bytes to the log I0828 21:21:46.374876 27641 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 3 I0828 21:21:46.375296 27641 replica.cpp:508] Replica received write request for position 3 I0828 21:21:46.376046 27625 sched.cpp:137] Version: 0.21.0 I0828 21:21:46.376374 27646 sched.cpp:233] New master detected at master@127.0.1.1:45613 I0828 21:21:46.376595 27646 sched.cpp:283] Authenticating with master master@127.0.1.1:45613 I0828 21:21:46.376857 27646 authenticatee.hpp:128] Creating new client SASL connection I0828 21:21:46.377234 27646 master.cpp:3637] Authenticating scheduler-cb5a0264-23cc-45d0-bc4c-a92fa5308158@127.0.1.1:45613 I0828 21:21:46.377496 27646 authenticator.hpp:156] Creating new server SASL connection I0828 21:21:46.377771 27646 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5 I0828 21:21:46.377961 27646 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5' I0828 21:21:46.378170 27646 authenticator.hpp:262] Received SASL authentication start I0828 21:21:46.378360 27646 authenticator.hpp:384] Authentication requires more steps I0828 21:21:46.378588 27639 authenticatee.hpp:265] Received SASL authentication step I0828 21:21:46.378789 27646 authenticator.hpp:290] Received SASL authentication step I0828 21:21:46.378942 27646 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'saucy' server FQDN: 'saucy' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0828 21:21:46.379091 27646 auxprop.cpp:153] Looking up auxiliary property '*userPassword' I0828 21:21:46.379298 27646 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0828 21:21:46.379539 27646 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'saucy' server FQDN: 'saucy' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0828 21:21:46.379720 27646 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0828 21:21:46.379935 27646 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0828 21:21:46.380089 27646 authenticator.hpp:376] Authentication success I0828 21:21:46.380306 27642 authenticatee.hpp:305] Authentication success I0828 21:21:46.382625 27642 sched.cpp:357] Successfully authenticated with master master@127.0.1.1:45613 I0828 21:21:46.383031 27642 sched.cpp:476] Sending registration request to master@127.0.1.1:45613 I0828 21:21:46.382928 27640 master.cpp:3677] Successfully authenticated principal 'test-principal' at scheduler-cb5a0264-23cc-45d0-bc4c-a92fa5308158@127.0.1.1:45613 I0828 21:21:46.383651 27640 master.cpp:1324] Received registration request from scheduler-cb5a0264-23cc-45d0-bc4c-a92fa5308158@127.0.1.1:45613 I0828 21:21:46.383846 27640 master.cpp:1284] Authorizing framework principal 'test-principal' to receive offers for role '*' I0828 21:21:46.384184 27640 master.cpp:1383] Registering framework 20140828-212146-16842879-45613-27625-0000 at scheduler-cb5a0264-23cc-45d0-bc4c-a92fa5308158@127.0.1.1:45613 I0828 21:21:46.384464 27640 sched.cpp:407] Framework registered with 20140828-212146-16842879-45613-27625-0000 I0828 21:21:46.384764 27640 sched.cpp:421] Scheduler::registered took 18266ns I0828 21:21:46.384600 27644 hierarchical_allocator_process.hpp:329] Added framework 20140828-212146-16842879-45613-27625-0000 I0828 21:21:46.385171 27644 hierarchical_allocator_process.hpp:691] No resources available to allocate! I0828 21:21:46.385330 27644 hierarchical_allocator_process.hpp:653] Performed allocation for 0 slaves in 160171ns I0828 21:21:46.386292 27641 leveldb.cpp:343] Persisting action (308 bytes) to leveldb took 10.815384ms I0828 21:21:46.386492 27641 replica.cpp:676] Persisted action at 3 I0828 21:21:46.386844 27641 replica.cpp:655] Replica received learned notice for position 3 I0828 21:21:46.387980 27643 slave.cpp:980] Will retry registration in 19.851524ms if necessary I0828 21:21:46.388140 27639 master.cpp:2824] Ignoring register slave message from slave(48)@127.0.1.1:45613 (saucy) as admission is already in progress I0828 21:21:46.396355 27641 leveldb.cpp:343] Persisting action (310 bytes) to leveldb took 9.275034ms I0828 21:21:46.396641 27641 replica.cpp:676] Persisted action at 3 I0828 21:21:46.396837 27641 replica.cpp:661] Replica learned APPEND action at position 3 I0828 21:21:46.397405 27641 registrar.cpp:479] Successfully updated 'registry' I0828 21:21:46.397528 27645 log.cpp:699] Attempting to truncate the log to 3 I0828 21:21:46.397878 27645 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 4 I0828 21:21:46.398239 27645 replica.cpp:508] Replica received write request for position 4 I0828 21:21:46.398597 27641 master.cpp:2876] Registered slave 20140828-212146-16842879-45613-27625-0 at slave(48)@127.0.1.1:45613 (saucy) I0828 21:21:46.398870 27641 master.cpp:4110] Adding slave 20140828-212146-16842879-45613-27625-0 at slave(48)@127.0.1.1:45613 (saucy) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0828 21:21:46.399178 27639 slave.cpp:763] Registered with master master@127.0.1.1:45613; given slave ID 20140828-212146-16842879-45613-27625-0 I0828 21:21:46.399521 27639 slave.cpp:776] Checkpointing SlaveInfo to '/tmp/SlaveRecoveryTest_0_ShutdownSlave_umhraW/meta/slaves/20140828-212146-16842879-45613-27625-0/slave.info' I0828 21:21:46.399961 27641 hierarchical_allocator_process.hpp:442] Added slave 20140828-212146-16842879-45613-27625-0 (saucy) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (and cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] available) I0828 21:21:46.400316 27641 hierarchical_allocator_process.hpp:728] Offering cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140828-212146-16842879-45613-27625-0 to framework 20140828-212146-16842879-45613-27625-0000 I0828 21:21:46.400158 27644 slave.cpp:2333] Received ping from slave-observer(45)@127.0.1.1:45613 I0828 21:21:46.400872 27639 master.hpp:857] Adding offer 20140828-212146-16842879-45613-27625-0 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140828-212146-16842879-45613-27625-0 (saucy) I0828 21:21:46.401105 27639 master.cpp:3584] Sending 1 offers to framework 20140828-212146-16842879-45613-27625-0000 I0828 21:21:46.401448 27639 sched.cpp:544] Scheduler::resourceOffers took 19056ns I0828 21:21:46.401700 27641 hierarchical_allocator_process.hpp:673] Performed allocation for slave 20140828-212146-16842879-45613-27625-0 in 1.430159ms I0828 21:21:46.403659 27644 master.hpp:867] Removing offer 20140828-212146-16842879-45613-27625-0 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140828-212146-16842879-45613-27625-0 (saucy) I0828 21:21:46.403903 27644 master.cpp:2194] Processing reply for offers: [ 20140828-212146-16842879-45613-27625-0 ] on slave 20140828-212146-16842879-45613-27625-0 at slave(48)@127.0.1.1:45613 (saucy) for framework 20140828-212146-16842879-45613-27625-0000 I0828 21:21:46.404116 27644 master.cpp:2277] Authorizing framework principal 'test-principal' to launch task cf5afc1b-c007-435b-8c36-be8aa3659d3a as user 'jenkins' I0828 21:21:46.404578 27644 master.hpp:829] Adding task cf5afc1b-c007-435b-8c36-be8aa3659d3a with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140828-212146-16842879-45613-27625-0 (saucy) I0828 21:21:46.404824 27644 master.cpp:2343] Launching task cf5afc1b-c007-435b-8c36-be8aa3659d3a of framework 20140828-212146-16842879-45613-27625-0000 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140828-212146-16842879-45613-27625-0 at slave(48)@127.0.1.1:45613 (saucy) I0828 21:21:46.405206 27644 slave.cpp:1011] Got assigned task cf5afc1b-c007-435b-8c36-be8aa3659d3a for framework 20140828-212146-16842879-45613-27625-0000 I0828 21:21:46.405462 27644 slave.cpp:3542] Checkpointing FrameworkInfo to '/tmp/SlaveRecoveryTest_0_ShutdownSlave_umhraW/meta/slaves/20140828-212146-16842879-45613-27625-0/frameworks/20140828-212146-16842879-45613-27625-0000/framework.info' I0828 21:21:46.405840 27644 slave.cpp:3549] Checkpointing framework pid 'scheduler-cb5a0264-23cc-45d0-bc4c-a92fa5308158@127.0.1.1:45613' to '/tmp/SlaveRecoveryTest_0_ShutdownSlave_umhraW/meta/slaves/20140828-212146-16842879-45613-27625-0/frameworks/20140828-212146-16842879-45613-27625-0000/framework.pid' I0828 21:21:46.406122 27645 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 7.684731ms I0828 21:21:46.406288 27645 replica.cpp:676] Persisted action at 4 I0828 21:21:46.406618 27645 replica.cpp:655] Replica received learned notice for position 4 I0828 21:21:46.407562 27644 slave.cpp:1121] Launching task cf5afc1b-c007-435b-8c36-be8aa3659d3a for framework 20140828-212146-16842879-45613-27625-0000 I0828 21:21:46.409296 27644 slave.cpp:3858] Checkpointing ExecutorInfo to '/tmp/SlaveRecoveryTest_0_ShutdownSlave_umhraW/meta/slaves/20...",2 MESOS-1751,"Request for ""stats.json"" cannot be fulfilled after stopping the framework ","Request for ""stats.json"" to master from a test case doesn't work after calling frameworks' {{driver.stop()}}. However, it works for ""state.json"". I think the problem is related to {{stats()}} continuation {{_stats()}}. The following test illustrates the issue: {code:title=TestCase.cpp|borderStyle=solid} TEST_F(MasterTest, RequestAfterDriverStop) { Try > master = StartMaster(); ASSERT_SOME(master); Try > slave = StartSlave(); ASSERT_SOME(slave); MockScheduler sched; MesosSchedulerDriver driver( &sched, DEFAULT_FRAMEWORK_INFO, master.get(), DEFAULT_CREDENTIAL); driver.start(); Future response_before = process::http::get(master.get(), ""stats.json""); AWAIT_READY(response_before); driver.stop(); Future response_after = process::http::get(master.get(), ""stats.json""); AWAIT_READY(response_after); driver.join(); Shutdown(); // Must shutdown before 'containerizer' gets deallocated. } {code}",5 MESOS-1752,"Allow variadic templates","Add variadic templates to the C++11 configure check. Once there, we can start using them in the code-base.",1 MESOS-1758,"Freezer failure leads to lost task during container destruction.","In the past we've seen numerous issues around the freezer. Lately, on the 2.6.44 kernel, we've seen issues where we're unable to freeze the cgroup: (1) An oom occurs. (2) No indication of oom in the kernel logs. (3) The slave is unable to freeze the cgroup. (4) The task is marked as lost. {noformat} I0903 16:46:24.956040 25469 mem.cpp:575] Memory limit exceeded: Requested: 15488MB Maximum Used: 15488MB MEMORY STATISTICS: cache 7958691840 rss 8281653248 mapped_file 9474048 pgpgin 4487861 pgpgout 522933 pgfault 2533780 pgmajfault 11 inactive_anon 0 active_anon 8281653248 inactive_file 7631708160 active_file 326852608 unevictable 0 hierarchical_memory_limit 16240345088 total_cache 7958691840 total_rss 8281653248 total_mapped_file 9474048 total_pgpgin 4487861 total_pgpgout 522933 total_pgfault 2533780 total_pgmajfault 11 total_inactive_anon 0 total_active_anon 8281653248 total_inactive_file 7631728640 total_active_file 326852608 total_unevictable 0 I0903 16:46:24.956848 25469 containerizer.cpp:1041] Container bbb9732a-d600-4c1b-b326-846338c608c3 has reached its limit for resource mem(*):1.62403e+10 and will be terminated I0903 16:46:24.957427 25469 containerizer.cpp:909] Destroying container 'bbb9732a-d600-4c1b-b326-846338c608c3' I0903 16:46:24.958664 25481 cgroups.cpp:2192] Freezing cgroup /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 I0903 16:46:34.959529 25488 cgroups.cpp:2209] Thawing cgroup /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 I0903 16:46:34.962070 25482 cgroups.cpp:1404] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 after 1.710848ms I0903 16:46:34.962658 25479 cgroups.cpp:2192] Freezing cgroup /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 I0903 16:46:44.963349 25488 cgroups.cpp:2209] Thawing cgroup /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 I0903 16:46:44.965631 25472 cgroups.cpp:1404] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 after 1.588224ms I0903 16:46:44.966356 25472 cgroups.cpp:2192] Freezing cgroup /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 I0903 16:46:54.967254 25488 cgroups.cpp:2209] Thawing cgroup /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 I0903 16:46:56.008447 25475 cgroups.cpp:1404] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 after 2.15296ms I0903 16:46:56.009071 25466 cgroups.cpp:2192] Freezing cgroup /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 I0903 16:47:06.010329 25488 cgroups.cpp:2209] Thawing cgroup /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 I0903 16:47:06.012538 25467 cgroups.cpp:1404] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 after 1.643008ms I0903 16:47:06.013216 25467 cgroups.cpp:2192] Freezing cgroup /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 I0903 16:47:12.516348 25480 slave.cpp:3030] Current usage 9.57%. Max allowed age: 5.630238827780799days I0903 16:47:16.015192 25488 cgroups.cpp:2209] Thawing cgroup /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 I0903 16:47:16.017043 25486 cgroups.cpp:1404] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 after 1.511168ms I0903 16:47:16.017555 25480 cgroups.cpp:2192] Freezing cgroup /sys/fs/cgroup/freezer/mesos/bbb9732a-d600-4c1b-b326-846338c608c3 I0903 16:47:19.862746 25483 http.cpp:245] HTTP request for '/slave(1)/stats.json' E0903 16:47:24.960055 25472 slave.cpp:2557] Termination of executor 'E' of framework '201104070004-0000002563-0000' failed: Failed to destroy container: discarded future I0903 16:47:24.962054 25472 slave.cpp:2087] Handling status update TASK_LOST (UUID: c0c1633b-7221-40dc-90a2-660ef639f747) for task T of framework 201104070004-0000002563-0000 from @0.0.0.0:0 I0903 16:47:24.963470 25469 mem.cpp:293] Updated 'memory.soft_limit_in_bytes' to 128MB for container bbb9732a-d600-4c1b-b326-846338c608c3 I0903 16:47:24.963541 25471 cpushare.cpp:338] Updated 'cpu.shares' to 256 (cpus 0.25) for container bbb9732a-d600-4c1b-b326-846338c608c3 I0903 16:47:24.964756 25471 cpushare.cpp:359] Updated 'cpu.cfs_period_us' to 100ms and 'cpu.cfs_quota_us' to 25ms (cpus 0.25) for container bbb9732a-d600-4c1b-b326-846338c608c3 I0903 16:47:43.406610 25476 status_update_manager.cpp:320] Received status update TASK_LOST (UUID: c0c1633b-7221-40dc-90a2-660ef639f747) for task T of framework 201104070004-0000002563-0000 I0903 16:47:43.406991 25476 status_update_manager.hpp:342] Checkpointing UPDATE for status update TASK_LOST (UUID: c0c1633b-7221-40dc-90a2-660ef639f747) for task T of framework 201104070004-0000002563-0000 I0903 16:47:43.410475 25476 status_update_manager.cpp:373] Forwarding status update TASK_LOST (UUID: c0c1633b-7221-40dc-90a2-660ef639f747) for task T of framework 201104070004-0000002563-0000 to master@:5050 I0903 16:47:43.439923 25480 status_update_manager.cpp:398] Received status update acknowledgement (UUID: c0c1633b-7221-40dc-90a2-660ef639f747) for task T of framework 201104070004-0000002563-0000 I0903 16:47:43.440115 25480 status_update_manager.hpp:342] Checkpointing ACK for status update TASK_LOST (UUID: c0c1633b-7221-40dc-90a2-660ef639f747) for task T of framework 201104070004-0000002563-0000 I0903 16:47:43.443595 25480 slave.cpp:2709] Cleaning up executor 'E' of framework 201104070004-0000002563-0000 {noformat} We should consider avoiding the freezer entirely in favor of a kill(2) loop. We don't have to wait for pid namespaces to remove the freezer dependency. At the very least, when the freezer fails, we should proceed with a kill(2) loop to ensure that we destroy the cgroup.",2 MESOS-1760,"MasterAuthorizationTest.FrameworkRemovedBeforeReregistration is flaky","Observed this on Apache CI: https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/2355/changes {code} [ RUN] MasterAuthorizationTest.FrameworkRemovedBeforeReregistration Using temporary directory '/tmp/MasterAuthorizationTest_FrameworkRemovedBeforeReregistration_0tw16Z' I0903 22:04:33.520237 25565 leveldb.cpp:176] Opened db in 49.073821ms I0903 22:04:33.538331 25565 leveldb.cpp:183] Compacted db in 18.065051ms I0903 22:04:33.538363 25565 leveldb.cpp:198] Created db iterator in 4826ns I0903 22:04:33.538377 25565 leveldb.cpp:204] Seeked to beginning of db in 682ns I0903 22:04:33.538385 25565 leveldb.cpp:273] Iterated through 0 keys in the db in 312ns I0903 22:04:33.538399 25565 replica.cpp:741] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0903 22:04:33.538624 25593 recover.cpp:425] Starting replica recovery I0903 22:04:33.538707 25598 recover.cpp:451] Replica is in EMPTY status I0903 22:04:33.540909 25590 master.cpp:286] Master 20140903-220433-453759884-44122-25565 (hemera.apache.org) started on 140.211.11.27:44122 I0903 22:04:33.540932 25590 master.cpp:332] Master only allowing authenticated frameworks to register I0903 22:04:33.540936 25590 master.cpp:337] Master only allowing authenticated slaves to register I0903 22:04:33.540941 25590 credentials.hpp:36] Loading credentials for authentication from '/tmp/MasterAuthorizationTest_FrameworkRemovedBeforeReregistration_0tw16Z/credentials' I0903 22:04:33.541337 25590 master.cpp:366] Authorization enabled I0903 22:04:33.541508 25597 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I0903 22:04:33.542343 25582 hierarchical_allocator_process.hpp:299] Initializing hierarchical allocator process with master : master@140.211.11.27:44122 I0903 22:04:33.542445 25592 master.cpp:120] No whitelist given. Advertising offers for all slaves I0903 22:04:33.543175 25602 recover.cpp:188] Received a recover response from a replica in EMPTY status I0903 22:04:33.543637 25587 recover.cpp:542] Updating replica status to STARTING I0903 22:04:33.544256 25579 master.cpp:1205] The newly elected leader is master@140.211.11.27:44122 with id 20140903-220433-453759884-44122-25565 I0903 22:04:33.544275 25579 master.cpp:1218] Elected as the leading master! I0903 22:04:33.544282 25579 master.cpp:1036] Recovering from registrar I0903 22:04:33.544401 25579 registrar.cpp:313] Recovering registrar I0903 22:04:33.558487 25593 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 14.678563ms I0903 22:04:33.558531 25593 replica.cpp:320] Persisted replica status to STARTING I0903 22:04:33.558653 25593 recover.cpp:451] Replica is in STARTING status I0903 22:04:33.559867 25588 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I0903 22:04:33.560057 25602 recover.cpp:188] Received a recover response from a replica in STARTING status I0903 22:04:33.561280 25584 recover.cpp:542] Updating replica status to VOTING I0903 22:04:33.576900 25581 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 14.712427ms I0903 22:04:33.576942 25581 replica.cpp:320] Persisted replica status to VOTING I0903 22:04:33.577018 25581 recover.cpp:556] Successfully joined the Paxos group I0903 22:04:33.577108 25581 recover.cpp:440] Recover process terminated I0903 22:04:33.577401 25581 log.cpp:656] Attempting to start the writer I0903 22:04:33.578559 25589 replica.cpp:474] Replica received implicit promise request with proposal 1 I0903 22:04:33.594611 25589 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 16.029152ms I0903 22:04:33.594640 25589 replica.cpp:342] Persisted promised to 1 I0903 22:04:33.595391 25584 coordinator.cpp:230] Coordinator attemping to fill missing position I0903 22:04:33.597512 25588 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2 I0903 22:04:33.613037 25588 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 15.502568ms I0903 22:04:33.613065 25588 replica.cpp:676] Persisted action at 0 I0903 22:04:33.615435 25585 replica.cpp:508] Replica received write request for position 0 I0903 22:04:33.615463 25585 leveldb.cpp:438] Reading position from leveldb took 10743ns I0903 22:04:33.630801 25585 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 15.320225ms I0903 22:04:33.630852 25585 replica.cpp:676] Persisted action at 0 I0903 22:04:33.631126 25585 replica.cpp:655] Replica received learned notice for position 0 I0903 22:04:33.647801 25585 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 16.652951ms I0903 22:04:33.647830 25585 replica.cpp:676] Persisted action at 0 I0903 22:04:33.647842 25585 replica.cpp:661] Replica learned NOP action at position 0 I0903 22:04:33.648548 25583 log.cpp:672] Writer started with ending position 0 I0903 22:04:33.649235 25583 leveldb.cpp:438] Reading position from leveldb took 25209ns I0903 22:04:33.650897 25591 registrar.cpp:346] Successfully fetched the registry (0B) I0903 22:04:33.650930 25591 registrar.cpp:422] Attempting to update the 'registry' I0903 22:04:33.652861 25601 log.cpp:680] Attempting to append 138 bytes to the log I0903 22:04:33.653097 25586 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 1 I0903 22:04:33.655225 25590 replica.cpp:508] Replica received write request for position 1 I0903 22:04:33.669618 25590 leveldb.cpp:343] Persisting action (157 bytes) to leveldb took 14.337486ms I0903 22:04:33.669663 25590 replica.cpp:676] Persisted action at 1 I0903 22:04:33.670045 25584 replica.cpp:655] Replica received learned notice for position 1 I0903 22:04:34.414243 25584 leveldb.cpp:343] Persisting action (159 bytes) to leveldb took 15.401247ms I0903 22:04:34.414300 25584 replica.cpp:676] Persisted action at 1 I0903 22:04:34.414316 25584 replica.cpp:661] Replica learned APPEND action at position 1 I0903 22:04:34.414937 25589 registrar.cpp:479] Successfully updated 'registry' I0903 22:04:34.415069 25585 log.cpp:699] Attempting to truncate the log to 1 I0903 22:04:34.415194 25589 registrar.cpp:372] Successfully recovered registrar I0903 22:04:34.415284 25589 master.cpp:1063] Recovered 0 slaves from the Registry (100B) ; allowing 10mins for slaves to re-register I0903 22:04:34.415362 25587 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 2 I0903 22:04:34.418926 25597 replica.cpp:508] Replica received write request for position 2 I0903 22:04:34.434321 25597 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 15.368147ms I0903 22:04:34.434352 25597 replica.cpp:676] Persisted action at 2 I0903 22:04:34.435022 25582 replica.cpp:655] Replica received learned notice for position 2 I0903 22:04:34.450331 25582 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 15.284486ms I0903 22:04:34.450387 25582 leveldb.cpp:401] Deleting ~1 keys from leveldb took 25774ns I0903 22:04:34.450402 25582 replica.cpp:676] Persisted action at 2 I0903 22:04:34.450412 25582 replica.cpp:661] Replica learned TRUNCATE action at position 2 I0903 22:04:34.460691 25565 sched.cpp:137] Version: 0.21.0 I0903 22:04:34.460927 25582 sched.cpp:233] New master detected at master@140.211.11.27:44122 I0903 22:04:34.460948 25582 sched.cpp:283] Authenticating with master master@140.211.11.27:44122 I0903 22:04:34.461359 25582 authenticatee.hpp:128] Creating new client SASL connection I0903 22:04:34.461647 25582 master.cpp:3637] Authenticating scheduler-04e0b571-7e0c-4ef3-bb14-c6bbfd8ac9a4@140.211.11.27:44122 I0903 22:04:34.461801 25598 authenticator.hpp:156] Creating new server SASL connection I0903 22:04:34.462172 25598 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5 I0903 22:04:34.462185 25598 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5' I0903 22:04:34.462257 25598 authenticator.hpp:262] Received SASL authentication start I0903 22:04:34.462323 25598 authenticator.hpp:384] Authentication requires more steps I0903 22:04:34.462345 25598 authenticatee.hpp:265] Received SASL authentication step I0903 22:04:34.462417 25598 authenticator.hpp:290] Received SASL authentication step I0903 22:04:34.462522 25598 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'hemera.apache.org' server FQDN: 'hemera.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0903 22:04:34.462529 25598 auxprop.cpp:153] Looking up auxiliary property '*userPassword' I0903 22:04:34.462538 25598 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0903 22:04:34.462543 25598 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'hemera.apache.org' server FQDN: 'hemera.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0903 22:04:34.462548 25598 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0903 22:04:34.462550 25598 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0903 22:04:34.462558 25598 authenticator.hpp:376] Authentication success I0903 22:04:34.462635 25598 master.cpp:3677] Successfully authenticated principal 'test-principal' at scheduler-04e0b571-7e0c-4ef3-bb14-c6bbfd8ac9a4@140.211.11.27:44122 I0903 22:04:34.462687 25590 authenticatee.hpp:305] Authentication success I0903 22:04:34.463219 25588 sched.cpp:357] Successfully authenticated with master master@140.211.11.27:44122 I0903 22:04:34.463243 25588 sched.cpp:476] Sending registration request to master@140.211.11.27:44122 I0903 22:04:34.463307 25588 master.cpp:1324] Received registration request from scheduler-04e0b571-7e0c-4ef3-bb14-c6bbfd8ac9a4@140.211.11.27:44122 I0903 22:04:34.463330 25588 master.cpp:1284] Authorizing framework principal 'test-principal' to receive offers for role '*' I0903 22:04:34.463412 25588 master.cpp:1383] Registering framework 20140903-220433-453759884-44122-25565-0000 at scheduler-04e0b571-7e0c-4ef3-bb14-c6bbfd8ac9a4@140.211.11.27:44122 I0903 22:04:34.463577 25598 sched.cpp:407] Framework registered with 20140903-220433-453759884-44122-25565-0000 I0903 22:04:34.463728 25587 hierarchical_allocator_process.hpp:329] Added framework 20140903-220433-453759884-44122-25565-0000 I0903 22:04:34.463739 25587 hierarchical_allocator_process.hpp:697] No resources available to allocate! I0903 22:04:34.463743 25587 hierarchical_allocator_process.hpp:659] Performed allocation for 0 slaves in 5016ns I0903 22:04:34.463755 25598 sched.cpp:421] Scheduler::registered took 165035ns I0903 22:04:34.465558 25583 sched.cpp:227] Scheduler::disconnected took 6254ns I0903 22:04:34.465566 25583 sched.cpp:233] New master detected at master@140.211.11.27:44122 I0903 22:04:34.465575 25583 sched.cpp:283] Authenticating with master master@140.211.11.27:44122 I0903 22:04:34.465642 25583 authenticatee.hpp:128] Creating new client SASL connection I0903 22:04:34.465790 25583 master.cpp:1680] Deactivating framework 20140903-220433-453759884-44122-25565-0000 I0903 22:04:34.465850 25583 master.cpp:3637] Authenticating scheduler-04e0b571-7e0c-4ef3-bb14-c6bbfd8ac9a4@140.211.11.27:44122 I0903 22:04:34.465879 25601 hierarchical_allocator_process.hpp:405] Deactivated framework 20140903-220433-453759884-44122-25565-0000 I0903 22:04:34.466047 25600 authenticator.hpp:156] Creating new server SASL connection I0903 22:04:34.466315 25600 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5 I0903 22:04:34.466326 25600 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5' I0903 22:04:34.466346 25600 authenticator.hpp:262] Received SASL authentication start I0903 22:04:34.466418 25600 authenticator.hpp:384] Authentication requires more steps I0903 22:04:34.466436 25600 authenticatee.hpp:265] Received SASL authentication step I0903 22:04:34.466475 25600 authenticator.hpp:290] Received SASL authentication step I0903 22:04:34.466486 25600 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'hemera.apache.org' server FQDN: 'hemera.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0903 22:04:34.466491 25600 auxprop.cpp:153] Looking up auxiliary property '*userPassword' I0903 22:04:34.466496 25600 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0903 22:04:34.466502 25600 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'hemera.apache.org' server FQDN: 'hemera.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0903 22:04:34.466506 25600 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0903 22:04:34.466509 25600 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0903 22:04:34.466516 25600 authenticator.hpp:376] Authentication success I0903 22:04:34.466596 25588 master.cpp:3677] Successfully authenticated principal 'test-principal' at scheduler-04e0b571-7e0c-4ef3-bb14-c6bbfd8ac9a4@140.211.11.27:44122 I0903 22:04:34.466629 25597 authenticatee.hpp:305] Authentication success I0903 22:04:34.467062 25594 sched.cpp:357] Successfully authenticated with master master@140.211.11.27:44122 I0903 22:04:34.467077 25594 sched.cpp:476] Sending registration request to master@140.211.11.27:44122 I0903 22:04:34.467190 25588 master.cpp:1448] Received re-registration request from framework 20140903-220433-453759884-44122-25565-0000 at scheduler-04e0b571-7e0c-4ef3-bb14-c6bbfd8ac9a4@140.211.11.27:44122 I0903 22:04:36.368134 25588 master.cpp:1284] Authorizing framework principal 'test-principal' to receive offers for role '*' I0903 22:04:34.542999 25594 hierarchical_allocator_process.hpp:697] No resources available to allocate! I0903 22:04:35.463639 25582 sched.cpp:476] Sending registration request to master@140.211.11.27:44122 I0903 22:04:36.368185 25594 hierarchical_allocator_process.hpp:659] Performed allocation for 0 slaves in 1.825177748secs I0903 22:04:36.368302 25588 master.cpp:1448] Received re-registration request from framework 20140903-220433-453759884-44122-25565-0000 at scheduler-04e0b571-7e0c-4ef3-bb14-c6bbfd8ac9a4@140.211.11.27:44122 I0903 22:04:36.368330 25588 master.cpp:1284] Authorizing framework principal 'test-principal' to receive offers for role '*' I0903 22:04:36.368388 25582 sched.cpp:476] Sending registration request to master@140.211.11.27:44122 : Failure Mock function called more times than expected - returning default value. Function call: authorize(@0x2ba11964c1b0 40-byte object ) The mock function has no default action set, and its return type has no default value set. *** Aborted at 1409781876 (unix time) try ""date -d @1409781876"" if you are using GNU date *** I0903 22:04:36.368913 25598 sched.cpp:745] Stopping framework '20140903-220433-453759884-44122-25565-0000' PC: @ 0x2ba117a990d5 (unknown) *** SIGABRT (@0x3ea000063dd) received by PID 25565 (TID 0x2ba11964d700) from PID 25565; stack trace: *** @ 0x2ba117854cb0 (unknown) @ 0x2ba117a990d5 (unknown) @ 0x2ba117a9c83b (unknown) @ 0x9cba9d testing::internal::GoogleTestFailureReporter::ReportFailure() @ 0x790091 testing::internal::FunctionMockerBase<>::PerformDefaultAction() @ 0x790166 testing::internal::FunctionMockerBase<>::UntypedPerformDefaultAction() @ 0x9c3daa testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith() @ 0x787279 mesos::internal::tests::MockAuthorizer::authorize() @ 0x2ba1157c133d mesos::internal::master::Master::validate() @ 0x2ba1157c2b7a mesos::internal::master::Master::reregisterFramework() @ 0x2ba1157e0038 ProtobufProcess<>::handler2<>() @ 0x2ba1157dde89 std::tr1::_Function_handler<>::_M_invoke() @ 0x2ba1157b15f7 mesos::internal::master::Master::_visit() @ 0x2ba1157bfa3e mesos::internal::master::Master::visit() @ 0x2ba115caf5e7 process::ProcessManager::resume() @ 0x2ba115cb027c process::schedule() @ 0x2ba11784ce9a start_thread @ 0x2ba117b5731d (unknown) {code}",1 MESOS-1765,"Use PID namespace to avoid freezing cgroup","There is some known kernel issue when we freeze the whole cgroup upon OOM. Mesos probably can just use PID namespace so that we will only need to kill the ""init"" of the pid namespace, instead of freezing all the processes and killing them one by one. But I am not quite sure if this would break the existing code.",5 MESOS-1766,"MasterAuthorizationTest.DuplicateRegistration test is flaky","{code} [ RUN ] MasterAuthorizationTest.DuplicateRegistration Using temporary directory '/tmp/MasterAuthorizationTest_DuplicateRegistration_pVJg7m' I0905 15:53:16.398993 25769 leveldb.cpp:176] Opened db in 2.601036ms I0905 15:53:16.399566 25769 leveldb.cpp:183] Compacted db in 546216ns I0905 15:53:16.399590 25769 leveldb.cpp:198] Created db iterator in 2787ns I0905 15:53:16.399605 25769 leveldb.cpp:204] Seeked to beginning of db in 500ns I0905 15:53:16.399617 25769 leveldb.cpp:273] Iterated through 0 keys in the db in 185ns I0905 15:53:16.399633 25769 replica.cpp:741] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0905 15:53:16.399817 25786 recover.cpp:425] Starting replica recovery I0905 15:53:16.399952 25793 recover.cpp:451] Replica is in EMPTY status I0905 15:53:16.400683 25795 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I0905 15:53:16.400795 25787 recover.cpp:188] Received a recover response from a replica in EMPTY status I0905 15:53:16.401005 25783 recover.cpp:542] Updating replica status to STARTING I0905 15:53:16.401470 25786 master.cpp:286] Master 20140905-155316-3125920579-49188-25769 (penates.apache.org) started on 67.195.81.186:49188 I0905 15:53:16.401521 25786 master.cpp:332] Master only allowing authenticated frameworks to register I0905 15:53:16.401533 25786 master.cpp:337] Master only allowing authenticated slaves to register I0905 15:53:16.401543 25786 credentials.hpp:36] Loading credentials for authentication from '/tmp/MasterAuthorizationTest_DuplicateRegistration_pVJg7m/credentials' I0905 15:53:16.401558 25793 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 474683ns I0905 15:53:16.401582 25793 replica.cpp:320] Persisted replica status to STARTING I0905 15:53:16.401667 25793 recover.cpp:451] Replica is in STARTING status I0905 15:53:16.401669 25786 master.cpp:366] Authorization enabled I0905 15:53:16.401898 25795 master.cpp:120] No whitelist given. Advertising offers for all slaves I0905 15:53:16.401936 25796 hierarchical_allocator_process.hpp:299] Initializing hierarchical allocator process with master : master@67.195.81.186:49188 I0905 15:53:16.402160 25784 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I0905 15:53:16.402333 25790 master.cpp:1205] The newly elected leader is master@67.195.81.186:49188 with id 20140905-155316-3125920579-49188-25769 I0905 15:53:16.402359 25790 master.cpp:1218] Elected as the leading master! I0905 15:53:16.402371 25790 master.cpp:1036] Recovering from registrar I0905 15:53:16.402472 25798 registrar.cpp:313] Recovering registrar I0905 15:53:16.402529 25791 recover.cpp:188] Received a recover response from a replica in STARTING status I0905 15:53:16.402782 25788 recover.cpp:542] Updating replica status to VOTING I0905 15:53:16.403002 25795 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 116403ns I0905 15:53:16.403020 25795 replica.cpp:320] Persisted replica status to VOTING I0905 15:53:16.403081 25791 recover.cpp:556] Successfully joined the Paxos group I0905 15:53:16.403197 25791 recover.cpp:440] Recover process terminated I0905 15:53:16.403388 25796 log.cpp:656] Attempting to start the writer I0905 15:53:16.403993 25784 replica.cpp:474] Replica received implicit promise request with proposal 1 I0905 15:53:16.404147 25784 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 132156ns I0905 15:53:16.404167 25784 replica.cpp:342] Persisted promised to 1 I0905 15:53:16.404542 25795 coordinator.cpp:230] Coordinator attemping to fill missing position I0905 15:53:16.405498 25787 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2 I0905 15:53:16.405868 25787 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 347231ns I0905 15:53:16.405886 25787 replica.cpp:676] Persisted action at 0 I0905 15:53:16.406553 25788 replica.cpp:508] Replica received write request for position 0 I0905 15:53:16.406582 25788 leveldb.cpp:438] Reading position from leveldb took 11402ns I0905 15:53:16.529067 25788 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 535803ns I0905 15:53:16.529088 25788 replica.cpp:676] Persisted action at 0 I0905 15:53:16.529355 25784 replica.cpp:655] Replica received learned notice for position 0 I0905 15:53:16.529784 25784 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 406036ns I0905 15:53:16.529806 25784 replica.cpp:676] Persisted action at 0 I0905 15:53:16.529817 25784 replica.cpp:661] Replica learned NOP action at position 0 I0905 15:53:16.530108 25783 log.cpp:672] Writer started with ending position 0 I0905 15:53:16.530597 25792 leveldb.cpp:438] Reading position from leveldb took 14594ns I0905 15:53:16.532060 25787 registrar.cpp:346] Successfully fetched the registry (0B) I0905 15:53:16.532091 25787 registrar.cpp:422] Attempting to update the 'registry' I0905 15:53:16.533537 25785 log.cpp:680] Attempting to append 140 bytes to the log I0905 15:53:16.533596 25785 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 1 I0905 15:53:16.533998 25798 replica.cpp:508] Replica received write request for position 1 I0905 15:53:16.534397 25798 leveldb.cpp:343] Persisting action (159 bytes) to leveldb took 372452ns I0905 15:53:16.534416 25798 replica.cpp:676] Persisted action at 1 I0905 15:53:16.534808 25793 replica.cpp:655] Replica received learned notice for position 1 I0905 15:53:16.534996 25793 leveldb.cpp:343] Persisting action (161 bytes) to leveldb took 164609ns I0905 15:53:16.535014 25793 replica.cpp:676] Persisted action at 1 I0905 15:53:16.535025 25793 replica.cpp:661] Replica learned APPEND action at position 1 I0905 15:53:16.535368 25784 registrar.cpp:479] Successfully updated 'registry' I0905 15:53:16.535419 25784 registrar.cpp:372] Successfully recovered registrar I0905 15:53:16.535452 25785 log.cpp:699] Attempting to truncate the log to 1 I0905 15:53:16.535555 25791 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 2 I0905 15:53:16.535553 25792 master.cpp:1063] Recovered 0 slaves from the Registry (102B) ; allowing 10mins for slaves to re-register I0905 15:53:16.536038 25784 replica.cpp:508] Replica received write request for position 2 I0905 15:53:16.536166 25784 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 101619ns I0905 15:53:16.536185 25784 replica.cpp:676] Persisted action at 2 I0905 15:53:16.536497 25791 replica.cpp:655] Replica received learned notice for position 2 I0905 15:53:16.536633 25791 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 109281ns I0905 15:53:16.536664 25791 leveldb.cpp:401] Deleting ~1 keys from leveldb took 14164ns I0905 15:53:16.536677 25791 replica.cpp:676] Persisted action at 2 I0905 15:53:16.536689 25791 replica.cpp:661] Replica learned TRUNCATE action at position 2 I0905 15:53:16.548408 25769 sched.cpp:137] Version: 0.21.0 I0905 15:53:16.548627 25792 sched.cpp:233] New master detected at master@67.195.81.186:49188 I0905 15:53:16.548653 25792 sched.cpp:283] Authenticating with master master@67.195.81.186:49188 I0905 15:53:16.548857 25797 authenticatee.hpp:128] Creating new client SASL connection I0905 15:53:16.548950 25797 master.cpp:3637] Authenticating scheduler-33430370-6af5-4c7b-bbd8-f6a43269ecf5@67.195.81.186:49188 I0905 15:53:16.549041 25797 authenticator.hpp:156] Creating new server SASL connection I0905 15:53:16.549120 25797 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5 I0905 15:53:16.549141 25797 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5' I0905 15:53:16.549180 25797 authenticator.hpp:262] Received SASL authentication start I0905 15:53:16.549229 25797 authenticator.hpp:384] Authentication requires more steps I0905 15:53:16.549268 25797 authenticatee.hpp:265] Received SASL authentication step I0905 15:53:16.549351 25787 authenticator.hpp:290] Received SASL authentication step I0905 15:53:16.549378 25787 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'penates.apache.org' server FQDN: 'penates.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0905 15:53:16.549391 25787 auxprop.cpp:153] Looking up auxiliary property '*userPassword' I0905 15:53:16.549403 25787 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0905 15:53:16.549415 25787 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'penates.apache.org' server FQDN: 'penates.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0905 15:53:16.549424 25787 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0905 15:53:16.549432 25787 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0905 15:53:16.549448 25787 authenticator.hpp:376] Authentication success I0905 15:53:16.549489 25787 authenticatee.hpp:305] Authentication success I0905 15:53:16.549525 25787 master.cpp:3677] Successfully authenticated principal 'test-principal' at scheduler-33430370-6af5-4c7b-bbd8-f6a43269ecf5@67.195.81.186:49188 I0905 15:53:16.549669 25783 sched.cpp:357] Successfully authenticated with master master@67.195.81.186:49188 I0905 15:53:16.549690 25783 sched.cpp:476] Sending registration request to master@67.195.81.186:49188 I0905 15:53:16.549751 25787 master.cpp:1324] Received registration request from scheduler-33430370-6af5-4c7b-bbd8-f6a43269ecf5@67.195.81.186:49188 I0905 15:53:16.549782 25787 master.cpp:1284] Authorizing framework principal 'test-principal' to receive offers for role '*' I0905 15:53:16.551250 25791 sched.cpp:233] New master detected at master@67.195.81.186:49188 I0905 15:53:16.551273 25791 sched.cpp:283] Authenticating with master master@67.195.81.186:49188 I0905 15:53:16.551357 25788 authenticatee.hpp:128] Creating new client SASL connection I0905 15:53:16.551456 25791 master.cpp:3637] Authenticating scheduler-33430370-6af5-4c7b-bbd8-f6a43269ecf5@67.195.81.186:49188 I0905 15:53:16.551553 25788 authenticator.hpp:156] Creating new server SASL connection I0905 15:53:16.551673 25786 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5 I0905 15:53:16.551697 25786 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5' I0905 15:53:16.551755 25792 authenticator.hpp:262] Received SASL authentication start I0905 15:53:16.551808 25792 authenticator.hpp:384] Authentication requires more steps I0905 15:53:16.551856 25792 authenticatee.hpp:265] Received SASL authentication step I0905 15:53:16.551920 25786 authenticator.hpp:290] Received SASL authentication step I0905 15:53:16.551949 25786 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'penates.apache.org' server FQDN: 'penates.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0905 15:53:16.551966 25786 auxprop.cpp:153] Looking up auxiliary property '*userPassword' I0905 15:53:16.551985 25786 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0905 15:53:16.551997 25786 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'penates.apache.org' server FQDN: 'penates.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0905 15:53:16.552006 25786 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0905 15:53:16.552014 25786 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0905 15:53:16.552031 25786 authenticator.hpp:376] Authentication success I0905 15:53:16.552081 25792 authenticatee.hpp:305] Authentication success I0905 15:53:16.552100 25786 master.cpp:3677] Successfully authenticated principal 'test-principal' at scheduler-33430370-6af5-4c7b-bbd8-f6a43269ecf5@67.195.81.186:49188 I0905 15:53:16.552249 25792 sched.cpp:357] Successfully authenticated with master master@67.195.81.186:49188 I0905 15:53:17.402861 25793 hierarchical_allocator_process.hpp:697] No resources available to allocate! I0905 15:53:18.874348 25792 sched.cpp:476] Sending registration request to master@67.195.81.186:49188 I0905 15:53:18.874364 25793 hierarchical_allocator_process.hpp:659] Performed allocation for 0 slaves in 1.471501003secs I0905 15:53:18.874420 25792 sched.cpp:476] Sending registration request to master@67.195.81.186:49188 I0905 15:53:18.874451 25793 master.cpp:1324] Received registration request from scheduler-33430370-6af5-4c7b-bbd8-f6a43269ecf5@67.195.81.186:49188 I0905 15:53:18.874480 25793 master.cpp:1284] Authorizing framework principal 'test-principal' to receive offers for role '*' I0905 15:53:18.874565 25793 master.cpp:1324] Received registration request from scheduler-33430370-6af5-4c7b-bbd8-f6a43269ecf5@67.195.81.186:49188 I0905 15:53:18.874588 25793 master.cpp:1284] Authorizing framework principal 'test-principal' to receive offers for role '*' : Failure Mock function called more times than expected - returning default value. Function call: authorize(@0x2b9ed7fe9350 40-byte object <90-BA B4-D4 9E-2B 00-00 00-00 00-00 00-00 00-00 A0-FA 06-F4 9E-2B 00-00 80-17 09-F4 9E-2B 00-00 00-00 00-00 03-00 00-00>) The mock function has no default action set, and its return type has no default value set. *** Aborted at 1409932398 (unix time) try ""date -d @1409932398"" if you are using GNU date *** PC: @ 0x2b9ed6233f79 (unknown) *** SIGABRT (@0x95c000064a9) received by PID 25769 (TID 0x2b9ed7fea700) from PID 25769; stack trace: *** @ 0x2b9ed5fef340 (unknown) @ 0x2b9ed6233f79 (unknown) @ 0x2b9ed6237388 (unknown) @ 0x93a5ec testing::internal::GoogleTestFailureReporter::ReportFailure() @ 0x7296c5 testing::internal::FunctionMockerBase<>::UntypedPerformDefaultAction() @ 0x933094 testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith() @ 0x71fbde mesos::internal::tests::MockAuthorizer::authorize() @ 0x2b9ed4038caf mesos::internal::master::Master::validate() @ 0x2b9ed4039763 mesos::internal::master::Master::registerFramework() @ 0x2b9ed40a0c0f ProtobufProcess<>::handler1<>() @ 0x2b9ed4050c57 std::_Function_handler<>::_M_invoke() @ 0x2b9ed407d202 ProtobufProcess<>::visit() @ 0x2b9ed402af1a mesos::internal::master::Master::_visit() @ 0x2b9ed4037eb8 mesos::internal::master::Master::visit() @ 0x2b9ed44cb792 process::ProcessManager::resume() @ 0x2b9ed44cba9c process::schedule() @ 0x2b9ed5fe7182 start_thread @ 0x2b9ed62f830d (unknown) {code}",2 MESOS-1771,"introduce unique_ptr","* add unique_ptr to the configure check * document use of unique_ptr in style guide ** use when possible, use std::move when necessary * move raw pointers to Owned to establish ownership * deprecate Owned in favour of unique_ptr ",1 MESOS-1777,"Design persistent resources",NULL,13 MESOS-1778,"Provide an option to validate flag value in stout/flags. ","Currently we can provide the default value for a flag, but cannot check if the flag is set to a reasonable value and, e.g., issue a warning. Passing an optional lambda checker to {{FlagBase::add()}} can be a possible solution.",3 MESOS-1782,"AllocatorTest/0.FrameworkExited is flaky","{noformat:title=} [ RUN ] AllocatorTest/0.FrameworkExited Using temporary directory '/tmp/AllocatorTest_0_FrameworkExited_B6WZng' I0909 08:02:35.116555 18112 leveldb.cpp:176] Opened db in 31.64686ms I0909 08:02:35.126065 18112 leveldb.cpp:183] Compacted db in 9.449823ms I0909 08:02:35.126118 18112 leveldb.cpp:198] Created db iterator in 5858ns I0909 08:02:35.126137 18112 leveldb.cpp:204] Seeked to beginning of db in 1136ns I0909 08:02:35.126150 18112 leveldb.cpp:273] Iterated through 0 keys in the db in 560ns I0909 08:02:35.126178 18112 replica.cpp:741] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0909 08:02:35.126502 18133 recover.cpp:425] Starting replica recovery I0909 08:02:35.126601 18133 recover.cpp:451] Replica is in EMPTY status I0909 08:02:35.127012 18133 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I0909 08:02:35.127094 18133 recover.cpp:188] Received a recover response from a replica in EMPTY status I0909 08:02:35.127223 18133 recover.cpp:542] Updating replica status to STARTING I0909 08:02:35.226631 18133 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 99.308134ms I0909 08:02:35.226690 18133 replica.cpp:320] Persisted replica status to STARTING I0909 08:02:35.226812 18131 recover.cpp:451] Replica is in STARTING status I0909 08:02:35.227246 18131 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I0909 08:02:35.227308 18131 recover.cpp:188] Received a recover response from a replica in STARTING status I0909 08:02:35.227409 18131 recover.cpp:542] Updating replica status to VOTING I0909 08:02:35.228540 18129 master.cpp:286] Master 20140909-080235-16842879-44005-18112 (precise) started on 127.0.1.1:44005 I0909 08:02:35.228593 18129 master.cpp:332] Master only allowing authenticated frameworks to register I0909 08:02:35.228607 18129 master.cpp:337] Master only allowing authenticated slaves to register I0909 08:02:35.228620 18129 credentials.hpp:36] Loading credentials for authentication from '/tmp/AllocatorTest_0_FrameworkExited_B6WZng/credentials' I0909 08:02:35.228754 18129 master.cpp:366] Authorization enabled I0909 08:02:35.229560 18129 master.cpp:120] No whitelist given. Advertising offers for all slaves I0909 08:02:35.229933 18129 hierarchical_allocator_process.hpp:299] Initializing hierarchical allocator process with master : master@127.0.1.1:44005 I0909 08:02:35.230057 18127 master.cpp:1212] The newly elected leader is master@127.0.1.1:44005 with id 20140909-080235-16842879-44005-18112 I0909 08:02:35.230129 18127 master.cpp:1225] Elected as the leading master! I0909 08:02:35.230144 18127 master.cpp:1043] Recovering from registrar I0909 08:02:35.230257 18127 registrar.cpp:313] Recovering registrar I0909 08:02:35.232461 18131 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 4.999384ms I0909 08:02:35.232489 18131 replica.cpp:320] Persisted replica status to VOTING I0909 08:02:35.232544 18131 recover.cpp:556] Successfully joined the Paxos group I0909 08:02:35.232611 18131 recover.cpp:440] Recover process terminated I0909 08:02:35.232727 18131 log.cpp:656] Attempting to start the writer I0909 08:02:35.233012 18131 replica.cpp:474] Replica received implicit promise request with proposal 1 I0909 08:02:35.238785 18131 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 5.749504ms I0909 08:02:35.238818 18131 replica.cpp:342] Persisted promised to 1 I0909 08:02:35.244056 18131 coordinator.cpp:230] Coordinator attemping to fill missing position I0909 08:02:35.244580 18131 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2 I0909 08:02:35.250143 18131 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 5.382351ms I0909 08:02:35.250319 18131 replica.cpp:676] Persisted action at 0 I0909 08:02:35.250901 18131 replica.cpp:508] Replica received write request for position 0 I0909 08:02:35.251137 18131 leveldb.cpp:438] Reading position from leveldb took 18689ns I0909 08:02:35.256597 18131 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 5.274169ms I0909 08:02:35.256764 18131 replica.cpp:676] Persisted action at 0 I0909 08:02:35.263712 18126 replica.cpp:655] Replica received learned notice for position 0 I0909 08:02:35.269613 18126 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 5.417225ms I0909 08:02:35.351641 18126 replica.cpp:676] Persisted action at 0 I0909 08:02:35.351655 18126 replica.cpp:661] Replica learned NOP action at position 0 I0909 08:02:35.351889 18126 log.cpp:672] Writer started with ending position 0 I0909 08:02:35.352165 18126 leveldb.cpp:438] Reading position from leveldb took 25215ns I0909 08:02:35.353163 18126 registrar.cpp:346] Successfully fetched the registry (0B) I0909 08:02:35.353185 18126 registrar.cpp:422] Attempting to update the 'registry' I0909 08:02:35.354152 18126 log.cpp:680] Attempting to append 120 bytes to the log I0909 08:02:35.354195 18126 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 1 I0909 08:02:35.354416 18126 replica.cpp:508] Replica received write request for position 1 I0909 08:02:35.351579 18127 hierarchical_allocator_process.hpp:697] No resources available to allocate! I0909 08:02:35.354558 18127 hierarchical_allocator_process.hpp:659] Performed allocation for 0 slaves in 2.984795ms I0909 08:02:35.360254 18126 leveldb.cpp:343] Persisting action (137 bytes) to leveldb took 5.811986ms I0909 08:02:35.360285 18126 replica.cpp:676] Persisted action at 1 I0909 08:02:35.364126 18132 replica.cpp:655] Replica received learned notice for position 1 I0909 08:02:35.369856 18132 leveldb.cpp:343] Persisting action (139 bytes) to leveldb took 5.702756ms I0909 08:02:35.369899 18132 replica.cpp:676] Persisted action at 1 I0909 08:02:35.369910 18132 replica.cpp:661] Replica learned APPEND action at position 1 I0909 08:02:35.370209 18132 registrar.cpp:479] Successfully updated 'registry' I0909 08:02:35.370311 18132 registrar.cpp:372] Successfully recovered registrar I0909 08:02:35.370477 18132 log.cpp:699] Attempting to truncate the log to 1 I0909 08:02:35.370553 18132 master.cpp:1070] Recovered 0 slaves from the Registry (84B) ; allowing 10mins for slaves to re-register I0909 08:02:35.370594 18132 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 2 I0909 08:02:35.371201 18127 replica.cpp:508] Replica received write request for position 2 I0909 08:02:35.376760 18127 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 5.264501ms I0909 08:02:35.377105 18127 replica.cpp:676] Persisted action at 2 I0909 08:02:35.377770 18127 replica.cpp:655] Replica received learned notice for position 2 I0909 08:02:35.383363 18127 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 5.272769ms I0909 08:02:35.383818 18127 leveldb.cpp:401] Deleting ~1 keys from leveldb took 28148ns I0909 08:02:35.384137 18127 replica.cpp:676] Persisted action at 2 I0909 08:02:35.384399 18127 replica.cpp:661] Replica learned TRUNCATE action at position 2 I0909 08:02:35.396512 18127 slave.cpp:167] Slave started on 64)@127.0.1.1:44005 I0909 08:02:35.654770 18131 hierarchical_allocator_process.hpp:697] No resources available to allocate! I0909 08:02:35.654847 18131 hierarchical_allocator_process.hpp:659] Performed allocation for 0 slaves in 104933ns I0909 08:02:35.654974 18127 credentials.hpp:84] Loading credential for authentication from '/tmp/AllocatorTest_0_FrameworkExited_xV9Mk4/credential' I0909 08:02:35.655097 18127 slave.cpp:274] Slave using credential for: test-principal I0909 08:02:35.655203 18127 slave.cpp:287] Slave resources: cpus(*):3; mem(*):1024; disk(*):25116; ports(*):[31000-32000] I0909 08:02:35.655274 18127 slave.cpp:315] Slave hostname: precise I0909 08:02:35.655285 18127 slave.cpp:316] Slave checkpoint: false I0909 08:02:35.655804 18127 state.cpp:33] Recovering state from '/tmp/AllocatorTest_0_FrameworkExited_xV9Mk4/meta' I0909 08:02:35.655913 18127 status_update_manager.cpp:193] Recovering status update manager I0909 08:02:35.656005 18127 slave.cpp:3202] Finished recovery I0909 08:02:35.656251 18127 slave.cpp:598] New master detected at master@127.0.1.1:44005 I0909 08:02:35.656285 18127 slave.cpp:672] Authenticating with master master@127.0.1.1:44005 I0909 08:02:35.656325 18127 slave.cpp:645] Detecting new master I0909 08:02:35.656358 18127 status_update_manager.cpp:167] New master detected at master@127.0.1.1:44005 I0909 08:02:35.656389 18127 authenticatee.hpp:128] Creating new client SASL connection I0909 08:02:35.656563 18127 master.cpp:3653] Authenticating slave(64)@127.0.1.1:44005 I0909 08:02:35.656651 18127 authenticator.hpp:156] Creating new server SASL connection I0909 08:02:35.656770 18127 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5 I0909 08:02:35.656796 18127 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5' I0909 08:02:35.656822 18127 authenticator.hpp:262] Received SASL authentication start I0909 08:02:35.656858 18127 authenticator.hpp:384] Authentication requires more steps I0909 08:02:35.656883 18127 authenticatee.hpp:265] Received SASL authentication step I0909 08:02:35.656924 18127 authenticator.hpp:290] Received SASL authentication step I0909 08:02:35.656960 18127 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'precise' server FQDN: 'precise' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0909 08:02:35.656971 18127 auxprop.cpp:153] Looking up auxiliary property '*userPassword' I0909 08:02:35.656982 18127 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0909 08:02:35.656997 18127 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'precise' server FQDN: 'precise' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0909 08:02:35.657004 18127 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0909 08:02:35.657008 18127 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0909 08:02:35.657019 18127 authenticator.hpp:376] Authentication success I0909 08:02:35.657047 18127 authenticatee.hpp:305] Authentication success I0909 08:02:35.657073 18127 master.cpp:3693] Successfully authenticated principal 'test-principal' at slave(64)@127.0.1.1:44005 I0909 08:02:35.657145 18127 slave.cpp:729] Successfully authenticated with master master@127.0.1.1:44005 I0909 08:02:35.657183 18127 slave.cpp:980] Will retry registration in 19.238717ms if necessary I0909 08:02:35.657276 18128 master.cpp:2843] Registering slave at slave(64)@127.0.1.1:44005 (precise) with id 20140909-080235-16842879-44005-18112-0 I0909 08:02:35.657389 18128 registrar.cpp:422] Attempting to update the 'registry' I0909 08:02:35.658382 18130 log.cpp:680] Attempting to append 295 bytes to the log I0909 08:02:35.658432 18130 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 3 I0909 08:02:35.658635 18130 replica.cpp:508] Replica received write request for position 3 I0909 08:02:35.660959 18112 sched.cpp:137] Version: 0.21.0 I0909 08:02:35.661093 18126 sched.cpp:233] New master detected at master@127.0.1.1:44005 I0909 08:02:35.661111 18126 sched.cpp:283] Authenticating with master master@127.0.1.1:44005 I0909 08:02:35.661175 18126 authenticatee.hpp:128] Creating new client SASL connection I0909 08:02:35.661306 18126 master.cpp:3653] Authenticating scheduler-fd929918-7057-4fef-923a-ed9d6fd355be@127.0.1.1:44005 I0909 08:02:35.661376 18126 authenticator.hpp:156] Creating new server SASL connection I0909 08:02:35.661466 18126 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5 I0909 08:02:35.661483 18126 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5' I0909 08:02:35.661504 18126 authenticator.hpp:262] Received SASL authentication start I0909 08:02:35.661530 18126 authenticator.hpp:384] Authentication requires more steps I0909 08:02:35.661552 18126 authenticatee.hpp:265] Received SASL authentication step I0909 08:02:35.661579 18126 authenticator.hpp:290] Received SASL authentication step I0909 08:02:35.661592 18126 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'precise' server FQDN: 'precise' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0909 08:02:35.661598 18126 auxprop.cpp:153] Looking up auxiliary property '*userPassword' I0909 08:02:35.661607 18126 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0909 08:02:35.661613 18126 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'precise' server FQDN: 'precise' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0909 08:02:35.661619 18126 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0909 08:02:35.661623 18126 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0909 08:02:35.661633 18126 authenticator.hpp:376] Authentication success I0909 08:02:35.661653 18126 authenticatee.hpp:305] Authentication success I0909 08:02:35.661672 18126 master.cpp:3693] Successfully authenticated principal 'test-principal' at scheduler-fd929918-7057-4fef-923a-ed9d6fd355be@127.0.1.1:44005 I0909 08:02:35.661730 18126 sched.cpp:357] Successfully authenticated with master master@127.0.1.1:44005 I0909 08:02:35.661741 18126 sched.cpp:476] Sending registration request to master@127.0.1.1:44005 I0909 08:02:35.661782 18126 master.cpp:1331] Received registration request from scheduler-fd929918-7057-4fef-923a-ed9d6fd355be@127.0.1.1:44005 I0909 08:02:35.661798 18126 master.cpp:1291] Authorizing framework principal 'test-principal' to receive offers for role '*' I0909 08:02:35.661917 18126 master.cpp:1390] Registering framework 20140909-080235-16842879-44005-18112-0000 at scheduler-fd929918-7057-4fef-923a-ed9d6fd355be@127.0.1.1:44005 I0909 08:02:35.662017 18126 sched.cpp:407] Framework registered with 20140909-080235-16842879-44005-18112-0000 I0909 08:02:35.662039 18126 sched.cpp:421] Scheduler::registered took 9070ns I0909 08:02:35.662119 18126 hierarchical_allocator_process.hpp:329] Added framework 20140909-080235-16842879-44005-18112-0000 I0909 08:02:35.662130 18126 hierarchical_allocator_process.hpp:697] No resources available to allocate! I0909 08:02:35.662135 18126 hierarchical_allocator_process.hpp:659] Performed allocation for 0 slaves in 5558ns I0909 08:02:35.672230 18130 leveldb.cpp:343] Persisting action (314 bytes) to leveldb took 13.567526ms I0909 08:02:35.672268 18130 replica.cpp:676] Persisted action at 3 I0909 08:02:35.672483 18130 replica.cpp:655] Replica received learned notice for position 3 I0909 08:02:35.677322 18132 slave.cpp:980] Will retry registration in 14.890338ms if necessary I0909 08:02:35.677399 18132 master.cpp:2831] Ignoring register slave message from slave(64)@127.0.1.1:44005 (precise) as admission is already in progress I0909 08:02:35.680881 18130 leveldb.cpp:343] Persisting action (316 bytes) to leveldb took 8.376798ms I0909 08:02:35.680908 18130 replica.cpp:676] Persisted action at 3 I0909 08:02:35.680917 18130 replica.cpp:661] Replica learned APPEND action at position 3 I0909 08:02:35.681252 18130 registrar.cpp:479] Successfully updated 'registry' I0909 08:02:35.681330 18130 log.cpp:699] Attempting to truncate the log to 3 I0909 08:02:35.681385 18130 master.cpp:2883] Registered slave 20140909-080235-16842879-44005-18112-0 at slave(64)@127.0.1.1:44005 (precise) I0909 08:02:35.681399 18130 master.cpp:4126] Adding slave 20140909-080235-16842879-44005-18112-0 at slave(64)@127.0.1.1:44005 (precise) with cpus(*):3; mem(*):1024; disk(*):25116; ports(*):[31000-32000] I0909 08:02:35.681504 18130 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 4 I0909 08:02:35.681570 18130 slave.cpp:763] Registered with master master@127.0.1.1:44005; given slave ID 20140909-080235-16842879-44005-18112-0 I0909 08:02:35.681689 18130 slave.cpp:2329] Received ping from slave-observer(50)@127.0.1.1:44005 I0909 08:02:35.681753 18130 hierarchical_allocator_process.hpp:442] Added slave 20140909-080235-16842879-44005-18112-0 (precise) with cpus(*):3; mem(*):1024; disk(*):25116; ports(*):[31000-32000] (and cpus(*):3; mem(*):1024; disk(*):25116; ports(*):[31000-32000] available) I0909 08:02:35.681808 18130 hierarchical_allocator_process.hpp:734] Offering cpus(*):3; mem(*):1024; disk(*):25116; ports(*):[31000-32000] on slave 20140909-080235-16842879-44005-18112-0 to framework 20140909-080235-16842879-44005-18112-0000 I0909 08:02:35.681892 18130 hierarchical_allocator_process.hpp:679] Performed allocation for slave 20140909-080235-16842879-44005-18112-0 in 109580ns I0909 08:02:35.681968 18130 master.hpp:861] Adding offer 20140909-080235-16842879-44005-18112-0 with resources cpus(*):3; mem(*):1024; disk(*):25116; ports(*):[31000-32000] on slave 20140909-080235-16842879-44005-18112-0 (precise) I0909 08:02:35.682014 18130 master.cpp:3600] Sending 1 offers to framework 20140909-080235-16842879-44005-18112-0000 I0909 08:02:35.682443 18130 sched.cpp:544] Scheduler::resourceOffers took 254258ns I0909 08:02:35.682633 18130 master.hpp:871] Removing offer 20140909-080235-16842879-44005-18112-0 with resources cpus(*):3; mem(*):1024; disk(*):25116; ports(*):[31000-32000] on slave 20140909-080235-16842879-44005-18112-0 (precise) I0909 08:02:35.682684 18130 master.cpp:2201] Processing reply for offers: [ 20140909-080235-16842879-44005-18112-0 ] on slave 20140909-080235-16842879-44005-18112-0 at slave(64)@127.0.1.1:44005 (precise) for framework 20140909-080235-16842879-44005-18112-0000 I0909 08:02:35.682708 18130 master.cpp:2284] Authorizing framework principal 'test-principal' to launch task 0 as user 'jenkins' I0909 08:02:35.682971 18130 replica.cpp:508] Replica received write request for position 4 I0909 08:02:35.683132 18132 master.hpp:833] Adding task 0 with resources cpus(*):2; mem(*):512 on slave 20140909-080235-16842879-44005-18112-0 (precise) I0909 08:02:35.683159 18132 master.cpp:2350] Launching task 0 of framework 20140909-080235-16842879-44005-18112-0000 with resources cpus(*):2; mem(*):512 on slave 20140909-080235-16842879-44005-18112-0 at slave(64)@127.0.1.1:44005 (precise) I0909 08:02:35.683363 18132 slave.cpp:1011] Got assigned task 0 for framework 20140909-080235-16842879-44005-18112-0000 I0909 08:02:35.683580 18132 slave.cpp:1121] Launching task 0 for framework 20140909-080235-16842879-44005-18112-0000 I0909 08:02:35.684833 18133 hierarchical_allocator_process.hpp:563] Recovered cpus(*):1; mem(*):512; disk(*):25116; ports(*):[31000-32000] (total allocatable: cpus(*):1; mem(*):512; disk(*):25116; ports(*):[31000-32000]) on slave 20140909-080235-16842879-44005-18112-0 from framework 20140909-080235-16842879-44005-18112-0000 I0909 08:02:35.684864 18133 hierarchical_allocator_process.hpp:599] Framework 20140909-080235-16842879-44005-18112-0000 filtered slave 20140909-080235-16842879-44005-18112-0 for 5secs I0909 08:02:35.686401 18132 exec.cpp:132] Version: 0.21.0 I0909 08:02:35.686848 18128 exec.cpp:182] Executor started at: executor(8)@127.0.1.1:44005 with pid 18112 I0909 08:02:35.687095 18132 slave.cpp:1231] Queuing task '0' for executor executor-1 of framework '20140909-080235-16842879-44005-18112-0000 I0909 08:02:35.687302 18132 slave.cpp:552] Successfully attached file '/tmp/AllocatorTest_0_FrameworkExited_xV9Mk4/slaves/20140909-080235-16842879-44005-18112-0/frameworks/20140909-080235-16842879-44005-18112-0000/executors/executor-1/runs/c4458e43...",1 MESOS-1784,"Design the semantics for updating FrameworkInfo","Currently, there is no easy way for frameworks to update their FrameworkInfo., resulting in issues like MESOS-703 and MESOS-1218. This ticket captures the design for doing FrameworkInfo update without having to roll masters/slaves/tasks/executors.",3 MESOS-1790,"Add ""chown"" option to CommandInfo.URI","Mesos fetcher always chown()s the extracted executor URIs as the executor user but sometimes this is not desirable, e.g., ""setuid"" bit gets lost during chown() if slave/fetcher is running as root. It would be nice to give frameworks the ability to skip the chown.",2 MESOS-1799,"Reconciliation can send out-of-order updates.","When a slave re-registers with the master, it currently sends the latest task state for all tasks that are not both terminal and acknowledged. However, reconciliation assumes that we always have the latest unacknowledged state of the task represented in the master. As a result, out-of-order updates are possible, e.g. (1) Slave has task T in TASK_FINISHED, with unacknowledged updates: [TASK_RUNNING, TASK_FINISHED]. (2) Master fails over. (3) New master re-registers the slave with T in TASK_FINISHED. (4) Reconciliation request arrives, master sends TASK_FINISHED. (5) Slave sends TASK_RUNNING to master, master sends TASK_RUNNING. I think the fix here is to preserve the task state invariants in the master, namely, that the master has the latest unacknowledged state of the task. This means when the slave re-registers, it should instead send the latest acknowledged state of each task.",3 MESOS-1807,"Disallow executors with cpu only or memory only resources","Currently master allows executors to be launched with either only cpus or only memory but we shouldn't allow that. This is because executor is an actual unix process that is launched by the slave. If an executor doesn't specify cpus, what should do the cpu limits be for that executor when there are no tasks running on it? If no cpu limits are set then it might starve other executors/tasks on the slave violating isolation guarantees. Same goes with memory. Moreover, the current containerizer/isolator code will throw failures when using such an executor, e.g., when the last task on the executor finishes and Containerizer::update() is called with 0 cpus or 0 mem.",3 MESOS-1808,"Expose RTT in container stats","As we expose the bandwidth, so we should expose the RTT as a measure of latency each container is experiencing. We can use {{ss}} to get the per-socket statistics and filter and aggregate accordingly to get a measure of RTT.",3 MESOS-1811,"Reconcile disconnected/deactivated semantics in the master code","Currently the master code treats a deactivated and disconnected slave similarly, by setting 'disconnected' variable in the slave struct. This causes us to disconnect() a slave in cases where we really only want to deactivate() the slave (e.g., authentication). It would be nice to differentiate these semantics by adding a new variable ""active"" in the Slave struct. We might want to do the same with the Framework struct for consistency.",3 MESOS-1813,"Fail fast in example frameworks if task goes into unexpected state","Most of the example frameworks launch a bunch of tasks and exit if *all* of them reach FINISHED state. But if there is a bug in the code resulting in TASK_LOST, the framework waits forever. Instead the framework should abort if an un-expected task state is encountered.",1 MESOS-1814,"Task attempted to use more offers than requested in example jave and python frameworks","{code} [ RUN ] ExamplesTest.JavaFramework Using temporary directory '/tmp/ExamplesTest_JavaFramework_2PcFCh' Enabling authentication for the framework WARNING: Logging before InitGoogleLogging() is written to STDERR I0917 23:14:35.199069 31510 process.cpp:1771] libprocess is initialized on 127.0.1.1:34609 for 8 cpus I0917 23:14:35.199794 31510 logging.cpp:177] Logging to STDERR I0917 23:14:35.225342 31510 leveldb.cpp:176] Opened db in 22.197149ms I0917 23:14:35.231133 31510 leveldb.cpp:183] Compacted db in 5.601897ms I0917 23:14:35.231498 31510 leveldb.cpp:198] Created db iterator in 215441ns I0917 23:14:35.231608 31510 leveldb.cpp:204] Seeked to beginning of db in 11488ns I0917 23:14:35.231722 31510 leveldb.cpp:273] Iterated through 0 keys in the db in 14016ns I0917 23:14:35.231917 31510 replica.cpp:741] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0917 23:14:35.233129 31526 recover.cpp:425] Starting replica recovery I0917 23:14:35.233614 31526 recover.cpp:451] Replica is in EMPTY status I0917 23:14:35.234994 31526 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I0917 23:14:35.240116 31519 recover.cpp:188] Received a recover response from a replica in EMPTY status I0917 23:14:35.240782 31519 recover.cpp:542] Updating replica status to STARTING I0917 23:14:35.242846 31524 master.cpp:286] Master 20140917-231435-16842879-34609-31503 (saucy) started on 127.0.1.1:34609 I0917 23:14:35.243191 31524 master.cpp:332] Master only allowing authenticated frameworks to register I0917 23:14:35.243288 31524 master.cpp:339] Master allowing unauthenticated slaves to register I0917 23:14:35.243399 31524 credentials.hpp:36] Loading credentials for authentication from '/tmp/ExamplesTest_JavaFramework_2PcFCh/credentials' W0917 23:14:35.243588 31524 credentials.hpp:51] Permissions on credentials file '/tmp/ExamplesTest_JavaFramework_2PcFCh/credentials' are too open. It is recommended that your credentials file is NOT accessible by others. I0917 23:14:35.243846 31524 master.cpp:366] Authorization enabled I0917 23:14:35.244882 31520 hierarchical_allocator_process.hpp:299] Initializing hierarchical allocator process with master : master@127.0.1.1:34609 I0917 23:14:35.245224 31520 master.cpp:120] No whitelist given. Advertising offers for all slaves I0917 23:14:35.246934 31524 master.cpp:1211] The newly elected leader is master@127.0.1.1:34609 with id 20140917-231435-16842879-34609-31503 I0917 23:14:35.247234 31524 master.cpp:1224] Elected as the leading master! I0917 23:14:35.247336 31524 master.cpp:1042] Recovering from registrar I0917 23:14:35.247542 31526 registrar.cpp:313] Recovering registrar I0917 23:14:35.250555 31510 containerizer.cpp:89] Using isolation: posix/cpu,posix/mem I0917 23:14:35.252326 31510 containerizer.cpp:89] Using isolation: posix/cpu,posix/mem I0917 23:14:35.252821 31520 slave.cpp:169] Slave started on 1)@127.0.1.1:34609 I0917 23:14:35.253552 31520 slave.cpp:289] Slave resources: cpus(*):1; mem(*):1001; disk(*):24988; ports(*):[31000-32000] I0917 23:14:35.253906 31520 slave.cpp:317] Slave hostname: saucy I0917 23:14:35.254004 31520 slave.cpp:318] Slave checkpoint: true I0917 23:14:35.254818 31520 state.cpp:33] Recovering state from '/tmp/mesos-w8snRW/0/meta' I0917 23:14:35.255106 31519 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 13.99622ms I0917 23:14:35.255235 31519 replica.cpp:320] Persisted replica status to STARTING I0917 23:14:35.255419 31519 recover.cpp:451] Replica is in STARTING status I0917 23:14:35.255834 31519 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I0917 23:14:35.256000 31519 recover.cpp:188] Received a recover response from a replica in STARTING status I0917 23:14:35.256217 31519 recover.cpp:542] Updating replica status to VOTING I0917 23:14:35.256641 31520 status_update_manager.cpp:193] Recovering status update manager I0917 23:14:35.257064 31520 containerizer.cpp:252] Recovering containerizer I0917 23:14:35.257725 31520 slave.cpp:3220] Finished recovery I0917 23:14:35.258463 31520 slave.cpp:600] New master detected at master@127.0.1.1:34609 I0917 23:14:35.258769 31524 status_update_manager.cpp:167] New master detected at master@127.0.1.1:34609 I0917 23:14:35.258885 31520 slave.cpp:636] No credentials provided. Attempting to register without authentication I0917 23:14:35.259024 31520 slave.cpp:647] Detecting new master I0917 23:14:35.259863 31520 slave.cpp:169] Slave started on 2)@127.0.1.1:34609 I0917 23:14:35.260288 31520 slave.cpp:289] Slave resources: cpus(*):1; mem(*):1001; disk(*):24988; ports(*):[31000-32000] I0917 23:14:35.260493 31520 slave.cpp:317] Slave hostname: saucy I0917 23:14:35.260588 31520 slave.cpp:318] Slave checkpoint: true I0917 23:14:35.265127 31510 containerizer.cpp:89] Using isolation: posix/cpu,posix/mem I0917 23:14:35.265877 31519 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 9.536278ms I0917 23:14:35.265983 31519 replica.cpp:320] Persisted replica status to VOTING I0917 23:14:35.266324 31519 recover.cpp:556] Successfully joined the Paxos group I0917 23:14:35.266511 31519 recover.cpp:440] Recover process terminated I0917 23:14:35.266978 31519 log.cpp:656] Attempting to start the writer I0917 23:14:35.268165 31523 replica.cpp:474] Replica received implicit promise request with proposal 1 I0917 23:14:35.269850 31525 slave.cpp:169] Slave started on 3)@127.0.1.1:34609 I0917 23:14:35.270365 31525 slave.cpp:289] Slave resources: cpus(*):1; mem(*):1001; disk(*):24988; ports(*):[31000-32000] I0917 23:14:35.270658 31525 slave.cpp:317] Slave hostname: saucy I0917 23:14:35.270781 31525 slave.cpp:318] Slave checkpoint: true I0917 23:14:35.271332 31525 state.cpp:33] Recovering state from '/tmp/mesos-w8snRW/2/meta' I0917 23:14:35.271580 31522 status_update_manager.cpp:193] Recovering status update manager I0917 23:14:35.271838 31522 containerizer.cpp:252] Recovering containerizer I0917 23:14:35.272238 31525 slave.cpp:3220] Finished recovery I0917 23:14:35.273002 31525 slave.cpp:600] New master detected at master@127.0.1.1:34609 I0917 23:14:35.273252 31521 status_update_manager.cpp:167] New master detected at master@127.0.1.1:34609 I0917 23:14:35.273360 31525 slave.cpp:636] No credentials provided. Attempting to register without authentication I0917 23:14:35.273507 31525 slave.cpp:647] Detecting new master I0917 23:14:35.275413 31525 state.cpp:33] Recovering state from '/tmp/mesos-w8snRW/1/meta' I0917 23:14:35.278506 31523 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 10.232514ms I0917 23:14:35.278712 31523 replica.cpp:342] Persisted promised to 1 I0917 23:14:35.279585 31523 coordinator.cpp:230] Coordinator attemping to fill missing position I0917 23:14:35.280400 31523 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2 I0917 23:14:35.280900 31526 status_update_manager.cpp:193] Recovering status update manager I0917 23:14:35.281282 31519 containerizer.cpp:252] Recovering containerizer I0917 23:14:35.281615 31520 slave.cpp:3220] Finished recovery I0917 23:14:35.281891 31510 sched.cpp:137] Version: 0.21.0 I0917 23:14:35.282306 31526 sched.cpp:233] New master detected at master@127.0.1.1:34609 I0917 23:14:35.282464 31526 sched.cpp:283] Authenticating with master master@127.0.1.1:34609 I0917 23:14:35.282891 31526 authenticatee.hpp:104] Initializing client SASL I0917 23:14:35.284816 31526 authenticatee.hpp:128] Creating new client SASL connection I0917 23:14:35.285428 31519 master.cpp:873] Dropping 'mesos.internal.AuthenticateMessage' message since not recovered yet I0917 23:14:35.288007 31521 slave.cpp:600] New master detected at master@127.0.1.1:34609 I0917 23:14:35.288399 31521 slave.cpp:636] No credentials provided. Attempting to register without authentication I0917 23:14:35.288535 31521 slave.cpp:647] Detecting new master I0917 23:14:35.288501 31519 status_update_manager.cpp:167] New master detected at master@127.0.1.1:34609 I0917 23:14:35.289625 31523 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 8.997343ms I0917 23:14:35.289784 31523 replica.cpp:676] Persisted action at 0 I0917 23:14:35.292667 31521 replica.cpp:508] Replica received write request for position 0 I0917 23:14:35.293112 31521 leveldb.cpp:438] Reading position from leveldb took 325638ns I0917 23:14:35.301774 31521 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 8.576338ms I0917 23:14:35.301916 31521 replica.cpp:676] Persisted action at 0 I0917 23:14:35.302289 31521 replica.cpp:655] Replica received learned notice for position 0 I0917 23:14:35.310542 31521 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 8.087789ms I0917 23:14:35.310675 31521 replica.cpp:676] Persisted action at 0 I0917 23:14:35.310946 31521 replica.cpp:661] Replica learned NOP action at position 0 I0917 23:14:35.311254 31521 log.cpp:672] Writer started with ending position 0 I0917 23:14:35.311957 31521 leveldb.cpp:438] Reading position from leveldb took 35110ns I0917 23:14:35.320283 31521 registrar.cpp:346] Successfully fetched the registry (0B) I0917 23:14:35.320513 31521 registrar.cpp:422] Attempting to update the 'registry' I0917 23:14:35.322226 31525 log.cpp:680] Attempting to append 118 bytes to the log I0917 23:14:35.322549 31525 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 1 I0917 23:14:35.322931 31525 replica.cpp:508] Replica received write request for position 1 I0917 23:14:35.330169 31525 leveldb.cpp:343] Persisting action (135 bytes) to leveldb took 7.133053ms I0917 23:14:35.330340 31525 replica.cpp:676] Persisted action at 1 I0917 23:14:35.330890 31525 replica.cpp:655] Replica received learned notice for position 1 I0917 23:14:35.339218 31525 leveldb.cpp:343] Persisting action (137 bytes) to leveldb took 8.192024ms I0917 23:14:35.339380 31525 replica.cpp:676] Persisted action at 1 I0917 23:14:35.339715 31525 replica.cpp:661] Replica learned APPEND action at position 1 I0917 23:14:35.340615 31525 registrar.cpp:479] Successfully updated 'registry' I0917 23:14:35.340802 31525 registrar.cpp:372] Successfully recovered registrar I0917 23:14:35.341104 31525 log.cpp:699] Attempting to truncate the log to 1 I0917 23:14:35.341351 31525 master.cpp:1069] Recovered 0 slaves from the Registry (82B) ; allowing 10mins for slaves to re-register I0917 23:14:35.341527 31525 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 2 I0917 23:14:35.341964 31525 replica.cpp:508] Replica received write request for position 2 I0917 23:14:35.352336 31525 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 10.213086ms I0917 23:14:35.352494 31525 replica.cpp:676] Persisted action at 2 I0917 23:14:35.356258 31523 replica.cpp:655] Replica received learned notice for position 2 I0917 23:14:35.364992 31523 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 8.606522ms I0917 23:14:35.365166 31523 leveldb.cpp:401] Deleting ~1 keys from leveldb took 48378ns I0917 23:14:35.365404 31523 replica.cpp:676] Persisted action at 2 I0917 23:14:35.365537 31523 replica.cpp:661] Replica learned TRUNCATE action at position 2 I0917 23:14:35.568366 31523 slave.cpp:994] Will retry registration in 423.208575ms if necessary I0917 23:14:35.568840 31522 master.cpp:2870] Registering slave at slave(3)@127.0.1.1:34609 (saucy) with id 20140917-231435-16842879-34609-31503-0 I0917 23:14:35.569422 31522 registrar.cpp:422] Attempting to update the 'registry' I0917 23:14:35.572013 31522 log.cpp:680] Attempting to append 289 bytes to the log I0917 23:14:35.572273 31519 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 3 I0917 23:14:35.572816 31519 replica.cpp:508] Replica received write request for position 3 I0917 23:14:35.579784 31519 leveldb.cpp:343] Persisting action (308 bytes) to leveldb took 6.809365ms I0917 23:14:35.579907 31519 replica.cpp:676] Persisted action at 3 I0917 23:14:35.580512 31519 replica.cpp:655] Replica received learned notice for position 3 I0917 23:14:35.588748 31519 leveldb.cpp:343] Persisting action (310 bytes) to leveldb took 8.112519ms I0917 23:14:35.588888 31519 replica.cpp:676] Persisted action at 3 I0917 23:14:35.588985 31519 replica.cpp:661] Replica learned APPEND action at position 3 I0917 23:14:35.589754 31519 registrar.cpp:479] Successfully updated 'registry' I0917 23:14:35.590070 31519 master.cpp:2910] Registered slave 20140917-231435-16842879-34609-31503-0 at slave(3)@127.0.1.1:34609 (saucy) I0917 23:14:35.590255 31519 master.cpp:4118] Adding slave 20140917-231435-16842879-34609-31503-0 at slave(3)@127.0.1.1:34609 (saucy) with cpus(*):1; mem(*):1001; disk(*):24988; ports(*):[31000-32000] I0917 23:14:35.590831 31519 slave.cpp:765] Registered with master master@127.0.1.1:34609; given slave ID 20140917-231435-16842879-34609-31503-0 I0917 23:14:35.589913 31523 log.cpp:699] Attempting to truncate the log to 3 I0917 23:14:35.591414 31523 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 4 I0917 23:14:35.591815 31523 replica.cpp:508] Replica received write request for position 4 I0917 23:14:35.591117 31521 hierarchical_allocator_process.hpp:442] Added slave 20140917-231435-16842879-34609-31503-0 (saucy) with cpus(*):1; mem(*):1001; disk(*):24988; ports(*):[31000-32000] (and cpus(*):1; mem(*):1001; disk(*):24988; ports(*):[31000-32000] available) I0917 23:14:35.592293 31521 hierarchical_allocator_process.hpp:679] Performed allocation for slave 20140917-231435-16842879-34609-31503-0 in 64364ns I0917 23:14:35.592953 31519 slave.cpp:778] Checkpointing SlaveInfo to '/tmp/mesos-w8snRW/2/meta/slaves/20140917-231435-16842879-34609-31503-0/slave.info' I0917 23:14:35.593475 31519 slave.cpp:2347] Received ping from slave-observer(1)@127.0.1.1:34609 I0917 23:14:35.601356 31523 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 9.420461ms I0917 23:14:35.601539 31523 replica.cpp:676] Persisted action at 4 I0917 23:14:35.602325 31523 replica.cpp:655] Replica received learned notice for position 4 I0917 23:14:35.610779 31523 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 8.34398ms I0917 23:14:35.611114 31523 leveldb.cpp:401] Deleting ~2 keys from leveldb took 66521ns I0917 23:14:35.611554 31523 replica.cpp:676] Persisted action at 4 I0917 23:14:35.611690 31523 replica.cpp:661] Replica learned TRUNCATE action at position 4 I0917 23:14:36.033941 31523 slave.cpp:994] Will retry registration in 322.705631ms if necessary I0917 23:14:36.034276 31521 master.cpp:2870] Registering slave at slave(1)@127.0.1.1:34609 (saucy) with id 20140917-231435-16842879-34609-31503-1 I0917 23:14:36.034536 31521 registrar.cpp:422] Attempting to update the 'registry' I0917 23:14:36.035889 31521 log.cpp:680] Attempting to append 454 bytes to the log I0917 23:14:36.036099 31524 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 5 I0917 23:14:36.036416 31524 replica.cpp:508] Replica received write request for position 5 I0917 23:14:36.046672 31524 leveldb.cpp:343] Persisting action (473 bytes) to leveldb took 10.160627ms I0917 23:14:36.047035 31524 replica.cpp:676] Persisted action at 5 I0917 23:14:36.047613 31524 replica.cpp:655] Replica received learned notice for position 5 I0917 23:14:36.053006 31524 leveldb.cpp:343] Persisting action (475 bytes) to leveldb took 5.180742ms I0917 23:14:36.053246 31524 replica.cpp:676] Persisted action at 5 I0917 23:14:36.053678 31524 replica.cpp:661] Replica learned APPEND action at position 5 I0917 23:14:36.060384 31524 registrar.cpp:479] Successfully updated 'registry' I0917 23:14:36.061328 31524 master.cpp:2910] Registered slave 20140917-231435-16842879-34609-31503-1 at slave(1)@127.0.1.1:34609 (saucy) I0917 23:14:36.061537 31524 master.cpp:4118] Adding slave 20140917-231435-16842879-34609-31503-1 at slave(1)@127.0.1.1:34609 (saucy) with cpus(*):1; mem(*):1001; disk(*):24988; ports(*):[31000-32000] I0917 23:14:36.061982 31524 slave.cpp:765] Registered with master master@127.0.1.1:34609; given slave ID 20140917-231435-16842879-34609-31503-1 I0917 23:14:36.062891 31524 slave.cpp:778] Checkpointing SlaveInfo to '/tmp/mesos-w8snRW/0/meta/slaves/20140917-231435-16842879-34609-31503-1/slave.info' I0917 23:14:36.061050 31525 log.cpp:699] Attempting to truncate the log to 5 I0917 23:14:36.063244 31525 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 6 I0917 23:14:36.063746 31525 replica.cpp:508] Replica received write request for position 6 I0917 23:14:36.062386 31520 hierarchical_allocator_process.hpp:442] Added slave 20140917-231435-16842879-34609-31503-1 (saucy) with cpus(*):1; mem(*):1001; disk(*):24988; ports(*):[31000-32000] (and cpus(*):1; mem(*):1001; disk(*):24988; ports(*):[31000-32000] available) I0917 23:14:36.064352 31520 hierarchical_allocator_process.hpp:679] Performed allocation for slave 20140917-231435-16842879-34609-31503-1 in 35730ns I0917 23:14:36.065166 31524 slave.cpp:2347] Received ping from slave-observer(2)@127.0.1.1:34609 I0917 23:14:36.070137 31525 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 6.242192ms I0917 23:14:36.070355 31525 replica.cpp:676] Persisted action at 6 I0917 23:14:36.071005 31525 replica.cpp:655] Replica received learned notice for position 6 I0917 23:14:36.076560 31525 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 5.368532ms I0917 23:14:36.077137 31525 leveldb.cpp:401] Deleting ~2 keys from leveldb took 371245ns I0917 23:14:36.077241 31525 replica.cpp:676] Persisted action at 6 I0917 23:14:36.077345 31525 replica.cpp:661] Replica learned TRUNCATE action at position 6 I0917 23:14:36.141270 31522 slave.cpp:994] Will retry registration in 1.857205901secs if necessary I0917 23:14:36.141644 31522 master.cpp:2870] Registering slave at slave(2)@127.0.1.1:34609 (saucy) with id 20140917-231435-16842879-34609-31503-2 I0917 23:14:36.141930 31522 registrar.cpp:422] Attempting to update the 'registry' I0917 23:14:36.143316 31521 log.cpp:680] Attempting to append 619 bytes to the log I0917 23:14:36.143646 31521 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 7 I0917 23:14:36.143954 31521 replica.cpp:508] Replica received write request for position 7 I0917 23:14:36.148875 31521 leveldb.cpp:343] Persisting action (638 bytes) to leveldb took 4.787834ms I0917 23:14:36.149085 31521 replica.cpp:676] Persisted action at 7 I0917 23:14:36.149673 31521 replica.cpp:655] Replica received learned notice for position 7 I0917 23:14:36.155232 31521 leveldb.cpp:343] Persisting action (640 bytes) to leveldb took 5.472209ms I0917 23:14:36.155522 31521 replica.cpp:676] Persisted action at 7 I0917 23:14:36.155936 31521 replica.cpp:661] Replica learned APPEND action at position 7 I0917 23:14:36.156481 31521 registrar.cpp:479] Successfully updated 'registry' I0917 23:14:36.156663 31526 log.cpp:699] Attempting to truncate the log to 7 I0917 23:14:36.156813 31526 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 8 I0917 23:14:36.157155 31526 replica.cpp:508] Replica received write request for position 8 I0917 23:14:36.157510 31520 master.cpp:2910] Registered slave 20140917-231435-16842879-34609-31503-2 at slave(2)@127.0.1.1:34609 (saucy) I0917 23:14:36.157645 31520 master.cpp:4118] Adding slave 20140917-231435-16842879-34609-31503-2 at slave(2)@127.0.1.1:34609 (saucy) with cpus(*):1; mem(*):1001; disk(*):24988; ports(*):[31000-32000] I0917 23:14:36.157928 31520 slave.cpp:765] Registered with master master@127.0.1.1:34609; given slave ID 20140917-231435-16842879-34609-31503-2 I0917 23:14:36.158304 31520 slave.cpp:778] Checkpointing SlaveInfo to '/tmp/mesos-w8snRW/1/meta/slaves/20140917-231435-16842879-34609-31503-2/slave.info' I0...",2 MESOS-1815,"Create a guide to becoming a committer","We have a committer's guide, but the process by which one becomes a committer is unclear. We should set some guidelines and a process by which we can grow contributors into committers.",3 MESOS-1817,"Completed tasks remains in TASK_RUNNING when framework is disconnected","We have run into a problem that cause tasks which completes, when a framework is disconnected and has a fail-over time, to remain in a running state even though the tasks actually finishes. This hogs the cluster and gives users a inconsistent view of the cluster state. Going to the slave, the task is finished. Going to the master, the task is still in a non-terminal state. When the scheduler reattaches or the failover timeout expires, the tasks finishes correctly. The current workflow of this scheduler has a long fail-over timeout, but may on the other hand never reattach. Here is a test framework we have been able to reproduce the issue with: https://gist.github.com/nqn/9b9b1de9123a6e836f54 It launches many short-lived tasks (1 second sleep) and when killing the framework instance, the master reports the tasks as running even after several minutes: http://cl.ly/image/2R3719461e0t/Screen%20Shot%202014-09-10%20at%203.19.39%20PM.png When clicking on one of the slaves where, for example, task 49 runs; the slave knows that it completed: http://cl.ly/image/2P410L3m1O1N/Screen%20Shot%202014-09-10%20at%203.21.29%20PM.png Here is the log of a mesos-local instance where I reproduced it: https://gist.github.com/nqn/f7ee20601199d70787c0 (Here task 10 to 19 are stuck in running state). There is a lot of output, so here is a filtered log for task 10: https://gist.github.com/nqn/a53e5ea05c5e41cd5a7d The problem turn out to be an issue with the ack-cycle of status updates: If the framework disconnects (with a failover timeout set), the status update manage on the slaves will keep trying to send the front of status update stream to the master (which in turn forwards it to the framework). If the first status update after the disconnect is terminal, things work out fine; the master pick the terminal state up, removes the task and release the resources. If, on the other hand, one non-terminal status is in the stream. The master will never know that the task finished (or failed) before the framework reconnects. During a discussion on the dev mailing list (http://mail-archives.apache.org/mod_mbox/mesos-dev/201409.mbox/%3cCADKthhAVR5mrq1s9HXw1BB_XFALXWWxjutp7MV4y3wP-Bh=aWg@mail.gmail.com%3e) we enumerated a couple of options to solve this problem. First off, having two ack-cycles: one between masters and slaves and one between masters and frameworks, would be ideal. We would be able to replay the statuses in order while keeping the master state current. However, this requires us to persist the master state in a replicated storage. As a first pass, we can make sure that the tasks caught in a running state doesn't hog the cluster when completed and the framework being disconnected. Here is a proof-of-concept to work out of: https://github.com/nqn/mesos/tree/niklas/status-update-disconnect/ A new (optional) field have been added to the internal status update message: https://github.com/nqn/mesos/blob/niklas/status-update-disconnect/src/messages/messages.proto#L68 Which makes it possible for the status update manager to set the field, if the latest status was terminal: https://github.com/nqn/mesos/blob/niklas/status-update-disconnect/src/slave/status_update_manager.cpp#L501 I added a test which should high-light the issue as well: https://github.com/nqn/mesos/blob/niklas/status-update-disconnect/src/tests/fault_tolerance_tests.cpp#L2478 I would love some input on the approach before moving on. There are rough edges in the PoC which (of course) should be addressed before bringing it for up review.",2 MESOS-1830,"Expose master stats differentiating between master-generated and slave-generated LOST tasks","The master exports a monotonically-increasing counter of tasks transitioned to TASK_LOST. This loses fidelity of the source of the lost task. A first step in exposing the source of lost tasks might be to just differentiate between TASK_LOST transitions initiated by the master vs the slave (and maybe bad input from the scheduler).",5 MESOS-1844,"AllocatorTest/0.SlaveLost is flaky","{code} [ RUN ] AllocatorTest/0.SlaveLost Using temporary directory '/tmp/AllocatorTest_0_SlaveLost_Z2oazw' I0929 16:58:29.484141 3486 leveldb.cpp:176] Opened db in 604109ns I0929 16:58:29.484629 3486 leveldb.cpp:183] Compacted db in 172697ns I0929 16:58:29.484912 3486 leveldb.cpp:198] Created db iterator in 6429ns I0929 16:58:29.485133 3486 leveldb.cpp:204] Seeked to beginning of db in 1618ns I0929 16:58:29.485337 3486 leveldb.cpp:273] Iterated through 0 keys in the db in 752ns I0929 16:58:29.485595 3486 replica.cpp:741] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0929 16:58:29.486017 3500 recover.cpp:425] Starting replica recovery I0929 16:58:29.486304 3500 recover.cpp:451] Replica is in EMPTY status I0929 16:58:29.486793 3500 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I0929 16:58:29.487205 3500 recover.cpp:188] Received a recover response from a replica in EMPTY status I0929 16:58:29.487540 3500 recover.cpp:542] Updating replica status to STARTING I0929 16:58:29.487911 3500 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 36629ns I0929 16:58:29.488173 3500 replica.cpp:320] Persisted replica status to STARTING I0929 16:58:29.488438 3500 recover.cpp:451] Replica is in STARTING status I0929 16:58:29.488891 3500 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I0929 16:58:29.489187 3500 recover.cpp:188] Received a recover response from a replica in STARTING status I0929 16:58:29.489516 3500 recover.cpp:542] Updating replica status to VOTING I0929 16:58:29.489887 3502 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 32099ns I0929 16:58:29.490124 3502 replica.cpp:320] Persisted replica status to VOTING I0929 16:58:29.490381 3500 recover.cpp:556] Successfully joined the Paxos group I0929 16:58:29.490713 3500 recover.cpp:440] Recover process terminated I0929 16:58:29.493401 3506 master.cpp:312] Master 20140929-165829-2759502016-55618-3486 (fedora-20) started on 192.168.122.164:55618 I0929 16:58:29.493700 3506 master.cpp:358] Master only allowing authenticated frameworks to register I0929 16:58:29.493921 3506 master.cpp:363] Master only allowing authenticated slaves to register I0929 16:58:29.494123 3506 credentials.hpp:36] Loading credentials for authentication from '/tmp/AllocatorTest_0_SlaveLost_Z2oazw/credentials' I0929 16:58:29.494500 3506 master.cpp:392] Authorization enabled I0929 16:58:29.495249 3506 master.cpp:120] No whitelist given. Advertising offers for all slaves I0929 16:58:29.495728 3502 hierarchical_allocator_process.hpp:299] Initializing hierarchical allocator process with master : master@192.168.122.164:55618 I0929 16:58:29.496196 3506 master.cpp:1241] The newly elected leader is master@192.168.122.164:55618 with id 20140929-165829-2759502016-55618-3486 I0929 16:58:29.496469 3506 master.cpp:1254] Elected as the leading master! I0929 16:58:29.496713 3506 master.cpp:1072] Recovering from registrar I0929 16:58:29.497020 3506 registrar.cpp:312] Recovering registrar I0929 16:58:29.497486 3506 log.cpp:656] Attempting to start the writer I0929 16:58:29.498105 3506 replica.cpp:474] Replica received implicit promise request with proposal 1 I0929 16:58:29.498373 3506 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 27145ns I0929 16:58:29.498605 3506 replica.cpp:342] Persisted promised to 1 I0929 16:58:29.500880 3500 coordinator.cpp:230] Coordinator attemping to fill missing position I0929 16:58:29.501404 3500 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2 I0929 16:58:29.501687 3500 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 57971ns I0929 16:58:29.501935 3500 replica.cpp:676] Persisted action at 0 I0929 16:58:29.504905 3507 replica.cpp:508] Replica received write request for position 0 I0929 16:58:29.505130 3507 leveldb.cpp:438] Reading position from leveldb took 18418ns I0929 16:58:29.505377 3507 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 19998ns I0929 16:58:29.505571 3507 replica.cpp:676] Persisted action at 0 I0929 16:58:29.505957 3507 replica.cpp:655] Replica received learned notice for position 0 I0929 16:58:29.506186 3507 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 21648ns I0929 16:58:29.506433 3507 replica.cpp:676] Persisted action at 0 I0929 16:58:29.506767 3507 replica.cpp:661] Replica learned NOP action at position 0 I0929 16:58:29.507199 3507 log.cpp:672] Writer started with ending position 0 I0929 16:58:29.507730 3507 leveldb.cpp:438] Reading position from leveldb took 11532ns I0929 16:58:29.508915 3507 registrar.cpp:345] Successfully fetched the registry (0B) I0929 16:58:29.509230 3507 registrar.cpp:421] Attempting to update the 'registry' I0929 16:58:29.510516 3500 log.cpp:680] Attempting to append 130 bytes to the log I0929 16:58:29.510949 3500 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 1 I0929 16:58:29.511363 3500 replica.cpp:508] Replica received write request for position 1 I0929 16:58:29.511697 3500 leveldb.cpp:343] Persisting action (149 bytes) to leveldb took 66530ns I0929 16:58:29.512039 3500 replica.cpp:676] Persisted action at 1 I0929 16:58:29.512460 3500 replica.cpp:655] Replica received learned notice for position 1 I0929 16:58:29.512778 3500 leveldb.cpp:343] Persisting action (151 bytes) to leveldb took 24121ns I0929 16:58:29.513013 3500 replica.cpp:676] Persisted action at 1 I0929 16:58:29.513239 3500 replica.cpp:661] Replica learned APPEND action at position 1 I0929 16:58:29.513674 3500 log.cpp:699] Attempting to truncate the log to 1 I0929 16:58:29.513954 3500 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 2 I0929 16:58:29.514385 3500 replica.cpp:508] Replica received write request for position 2 I0929 16:58:29.514680 3500 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 65014ns I0929 16:58:29.514991 3500 replica.cpp:676] Persisted action at 2 I0929 16:58:29.516978 3501 replica.cpp:655] Replica received learned notice for position 2 I0929 16:58:29.517319 3501 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 24103ns I0929 16:58:29.517546 3501 leveldb.cpp:401] Deleting ~1 keys from leveldb took 16533ns I0929 16:58:29.517801 3501 replica.cpp:676] Persisted action at 2 I0929 16:58:29.518039 3501 replica.cpp:661] Replica learned TRUNCATE action at position 2 I0929 16:58:29.518539 3507 registrar.cpp:478] Successfully updated 'registry' I0929 16:58:29.518885 3507 registrar.cpp:371] Successfully recovered registrar I0929 16:58:29.519201 3507 master.cpp:1099] Recovered 0 slaves from the Registry (94B) ; allowing 10mins for slaves to re-register I0929 16:58:29.533073 3505 slave.cpp:169] Slave started on 57)@192.168.122.164:55618 I0929 16:58:29.533500 3505 credentials.hpp:84] Loading credential for authentication from '/tmp/AllocatorTest_0_SlaveLost_xdXHfg/credential' I0929 16:58:29.533834 3505 slave.cpp:276] Slave using credential for: test-principal I0929 16:58:29.534168 3505 slave.cpp:289] Slave resources: cpus(*):2; mem(*):1024; disk(*):752; ports(*):[31000-32000] I0929 16:58:29.534751 3505 slave.cpp:317] Slave hostname: fedora-20 I0929 16:58:29.534965 3505 slave.cpp:318] Slave checkpoint: false I0929 16:58:29.535557 3505 state.cpp:33] Recovering state from '/tmp/AllocatorTest_0_SlaveLost_xdXHfg/meta' I0929 16:58:29.535951 3505 status_update_manager.cpp:193] Recovering status update manager I0929 16:58:29.536290 3505 slave.cpp:3271] Finished recovery I0929 16:58:29.536782 3505 slave.cpp:598] New master detected at master@192.168.122.164:55618 I0929 16:58:29.537122 3505 slave.cpp:672] Authenticating with master master@192.168.122.164:55618 I0929 16:58:29.537492 3505 slave.cpp:645] Detecting new master I0929 16:58:29.537294 3506 status_update_manager.cpp:167] New master detected at master@192.168.122.164:55618 I0929 16:58:29.537642 3507 authenticatee.hpp:128] Creating new client SASL connection I0929 16:58:29.538769 3502 master.cpp:3737] Authenticating slave(57)@192.168.122.164:55618 I0929 16:58:29.539091 3502 authenticator.hpp:156] Creating new server SASL connection I0929 16:58:29.539710 3503 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5 I0929 16:58:29.539943 3503 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5' I0929 16:58:29.540206 3502 authenticator.hpp:262] Received SASL authentication start I0929 16:58:29.540457 3502 authenticator.hpp:384] Authentication requires more steps I0929 16:58:29.540757 3502 authenticatee.hpp:265] Received SASL authentication step I0929 16:58:29.541121 3502 authenticator.hpp:290] Received SASL authentication step I0929 16:58:29.541368 3502 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'fedora-20' server FQDN: 'fedora-20' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0929 16:58:29.541599 3502 auxprop.cpp:153] Looking up auxiliary property '*userPassword' I0929 16:58:29.541874 3502 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0929 16:58:29.542129 3502 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'fedora-20' server FQDN: 'fedora-20' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0929 16:58:29.542333 3502 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0929 16:58:29.542553 3502 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0929 16:58:29.542785 3502 authenticator.hpp:376] Authentication success I0929 16:58:29.543047 3502 authenticatee.hpp:305] Authentication success I0929 16:58:29.543381 3502 slave.cpp:729] Successfully authenticated with master master@192.168.122.164:55618 I0929 16:58:29.543707 3502 slave.cpp:992] Will retry registration in 11.795692ms if necessary I0929 16:58:29.543179 3503 master.cpp:3777] Successfully authenticated principal 'test-principal' at slave(57)@192.168.122.164:55618 I0929 16:58:29.544255 3503 master.cpp:2930] Registering slave at slave(57)@192.168.122.164:55618 (fedora-20) with id 20140929-165829-2759502016-55618-3486-0 I0929 16:58:29.544587 3503 registrar.cpp:421] Attempting to update the 'registry' I0929 16:58:29.545816 3500 log.cpp:680] Attempting to append 299 bytes to the log I0929 16:58:29.546267 3500 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 3 I0929 16:58:29.546749 3500 replica.cpp:508] Replica received write request for position 3 I0929 16:58:29.547030 3500 leveldb.cpp:343] Persisting action (318 bytes) to leveldb took 31759ns I0929 16:58:29.547236 3500 replica.cpp:676] Persisted action at 3 I0929 16:58:29.548902 3506 replica.cpp:655] Replica received learned notice for position 3 I0929 16:58:29.549139 3506 leveldb.cpp:343] Persisting action (320 bytes) to leveldb took 25595ns I0929 16:58:29.549343 3506 replica.cpp:676] Persisted action at 3 I0929 16:58:29.549607 3506 replica.cpp:661] Replica learned APPEND action at position 3 I0929 16:58:29.550081 3506 log.cpp:699] Attempting to truncate the log to 3 I0929 16:58:29.550497 3506 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 4 I0929 16:58:29.550943 3506 replica.cpp:508] Replica received write request for position 4 I0929 16:58:29.551198 3506 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 20852ns I0929 16:58:29.551409 3506 replica.cpp:676] Persisted action at 4 I0929 16:58:29.551795 3506 replica.cpp:655] Replica received learned notice for position 4 I0929 16:58:29.552094 3506 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 22182ns I0929 16:58:29.552320 3506 leveldb.cpp:401] Deleting ~2 keys from leveldb took 18503ns I0929 16:58:29.552525 3506 replica.cpp:676] Persisted action at 4 I0929 16:58:29.552781 3506 replica.cpp:661] Replica learned TRUNCATE action at position 4 I0929 16:58:29.550289 3503 registrar.cpp:478] Successfully updated 'registry' I0929 16:58:29.553553 3503 master.cpp:2970] Registered slave 20140929-165829-2759502016-55618-3486-0 at slave(57)@192.168.122.164:55618 (fedora-20) I0929 16:58:29.553807 3503 master.cpp:4180] Adding slave 20140929-165829-2759502016-55618-3486-0 at slave(57)@192.168.122.164:55618 (fedora-20) with cpus(*):2; mem(*):1024; disk(*):752; ports(*):[31000-32000] I0929 16:58:29.554152 3503 slave.cpp:763] Registered with master master@192.168.122.164:55618; given slave ID 20140929-165829-2759502016-55618-3486-0 I0929 16:58:29.554455 3503 slave.cpp:2345] Received ping from slave-observer(56)@192.168.122.164:55618 I0929 16:58:29.554707 3504 hierarchical_allocator_process.hpp:442] Added slave 20140929-165829-2759502016-55618-3486-0 (fedora-20) with cpus(*):2; mem(*):1024; disk(*):752; ports(*):[31000-32000] (and cpus(*):2; mem(*):1024; disk(*):752; ports(*):[31000-32000] available) I0929 16:58:29.555064 3504 hierarchical_allocator_process.hpp:679] Performed allocation for slave 20140929-165829-2759502016-55618-3486-0 in 13111ns I0929 16:58:29.558220 3486 sched.cpp:137] Version: 0.21.0 I0929 16:58:29.558821 3501 sched.cpp:233] New master detected at master@192.168.122.164:55618 I0929 16:58:29.559054 3501 sched.cpp:283] Authenticating with master master@192.168.122.164:55618 I0929 16:58:29.559360 3501 authenticatee.hpp:128] Creating new client SASL connection I0929 16:58:29.560096 3501 master.cpp:3737] Authenticating scheduler-c8df3f3b-2552-476f-9daf-9aa2f012ad28@192.168.122.164:55618 I0929 16:58:29.560430 3501 authenticator.hpp:156] Creating new server SASL connection I0929 16:58:29.561141 3501 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5 I0929 16:58:29.561465 3501 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5' I0929 16:58:29.561743 3501 authenticator.hpp:262] Received SASL authentication start I0929 16:58:29.562098 3501 authenticator.hpp:384] Authentication requires more steps I0929 16:58:29.562353 3501 authenticatee.hpp:265] Received SASL authentication step I0929 16:58:29.562721 3507 authenticator.hpp:290] Received SASL authentication step I0929 16:58:29.563022 3507 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'fedora-20' server FQDN: 'fedora-20' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0929 16:58:29.563254 3507 auxprop.cpp:153] Looking up auxiliary property '*userPassword' I0929 16:58:29.563484 3507 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0929 16:58:29.563736 3507 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'fedora-20' server FQDN: 'fedora-20' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0929 16:58:29.563976 3507 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0929 16:58:29.564188 3507 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0929 16:58:29.564415 3507 authenticator.hpp:376] Authentication success I0929 16:58:29.564673 3507 master.cpp:3777] Successfully authenticated principal 'test-principal' at scheduler-c8df3f3b-2552-476f-9daf-9aa2f012ad28@192.168.122.164:55618 I0929 16:58:29.568681 3501 authenticatee.hpp:305] Authentication success I0929 16:58:29.569046 3501 sched.cpp:357] Successfully authenticated with master master@192.168.122.164:55618 I0929 16:58:29.569286 3501 sched.cpp:476] Sending registration request to master@192.168.122.164:55618 I0929 16:58:29.569581 3507 master.cpp:1360] Received registration request from scheduler-c8df3f3b-2552-476f-9daf-9aa2f012ad28@192.168.122.164:55618 I0929 16:58:29.569846 3507 master.cpp:1320] Authorizing framework principal 'test-principal' to receive offers for role '*' I0929 16:58:29.570219 3507 master.cpp:1419] Registering framework 20140929-165829-2759502016-55618-3486-0000 at scheduler-c8df3f3b-2552-476f-9daf-9aa2f012ad28@192.168.122.164:55618 I0929 16:58:29.570543 3506 sched.cpp:407] Framework registered with 20140929-165829-2759502016-55618-3486-0000 I0929 16:58:29.570811 3506 sched.cpp:421] Scheduler::registered took 13811ns I0929 16:58:29.571135 3502 hierarchical_allocator_process.hpp:329] Added framework 20140929-165829-2759502016-55618-3486-0000 I0929 16:58:29.571393 3502 hierarchical_allocator_process.hpp:734] Offering cpus(*):2; mem(*):1024; disk(*):752; ports(*):[31000-32000] on slave 20140929-165829-2759502016-55618-3486-0 to framework 20140929-165829-2759502016-55618-3486-0000 I0929 16:58:29.571723 3502 hierarchical_allocator_process.hpp:659] Performed allocation for 1 slaves in 368547ns I0929 16:58:29.572125 3507 master.hpp:868] Adding offer 20140929-165829-2759502016-55618-3486-0 with resources cpus(*):2; mem(*):1024; disk(*):752; ports(*):[31000-32000] on slave 20140929-165829-2759502016-55618-3486-0 (fedora-20) I0929 16:58:29.572374 3507 master.cpp:3679] Sending 1 offers to framework 20140929-165829-2759502016-55618-3486-0000 I0929 16:58:29.572841 3503 sched.cpp:544] Scheduler::resourceOffers took 114306ns I0929 16:58:29.573197 3507 master.hpp:877] Removing offer 20140929-165829-2759502016-55618-3486-0 with resources cpus(*):2; mem(*):1024; disk(*):752; ports(*):[31000-32000] on slave 20140929-165829-2759502016-55618-3486-0 (fedora-20) I0929 16:58:29.573457 3507 master.cpp:2274] Processing reply for offers: [ 20140929-165829-2759502016-55618-3486-0 ] on slave 20140929-165829-2759502016-55618-3486-0 at slave(57)@192.168.122.164:55618 (fedora-20) for framework 20140929-165829-2759502016-55618-3486-0000 W0929 16:58:29.573717 3507 master.cpp:1944] Executor default for task 0 uses less CPUs (None) than the minimum required (0.01). Please update your executor, as this will be mandatory in future releases. W0929 16:58:29.573953 3507 master.cpp:1955] Executor default for task 0 uses less memory (None) than the minimum required (32MB). Please update your executor, as this will be mandatory in future releases. I0929 16:58:29.574177 3507 master.cpp:2357] Authorizing framework principal 'test-principal' to launch task 0 as user 'jenkins' I0929 16:58:29.574745 3507 master.hpp:845] Adding task 0 with resources cpus(*):2; mem(*):512 on slave 20140929-165829-2759502016-55618-3486-0 (fedora-20) I0929 16:58:29.574992 3507 master.cpp:2423] Launching task 0 of framework 20140929-165829-2759502016-55618-3486-0000 with resources cpus(*):2; mem(*):512 on slave 20140929-165829-2759502016-55618-3486-0 at slave(57)@192.168.122.164:55618 (fedora-20) I0929 16:58:29.575315 3503 slave.cpp:1023] Got assigned task 0 for framework 20140929-165829-2759502016-55618-3486-0000 I0929 16:58:29.575724 3503 slave.cpp:1133] Launching task 0 for framework 20140929-165829-2759502016-55618-3486-0000 I0929 16:58:29.578129 3503 exec.cpp:132] Version: 0.21.0 I0929 16:58:29.578505 3504 exec.cpp:182] Executor started at: executor(30)@192.168.122.164:55618 with pid 3486 I0929 16:58:29.578867 3503 slave.cpp:1246] Queuing task '0' for executor default of framework '20140929-165829-2759502016-55618-3486-0000 I0929 16:58:29.579144 3503 slave.cpp:554] Successfully attached file '/tmp/AllocatorTest_0_SlaveLost_xdXHfg/slaves/20140929-165829-2759502016-55618-3486-0/frameworks/20140929-165829-2759502016-55618-3486-0000/executors/default/runs/b0de9759-7054-4763-90f4-889ddc3a8524' I0929 16:58:29.579401 3503 slave.cpp:1756] Got registration for executor 'default' of framework 20140929-165829-2759502016-55618-3486-0000 fro...",1 MESOS-1853,"Remove /proc and /sys remounts from port_mapping isolator","/proc/net reflects a new network namespace regardless and remount doesn't actually do what we expected anyway, i.e., it's not sufficient for a new pid namespace and a new mount is required.",3 MESOS-1855,"Mesos 0.20.1 doesn't compile","The compilation of Mesos 0.20.1 fails on Ubuntu Trusty with the following error - slave/containerizer/mesos/containerizer.cpp -fPIC -DPIC -o slave/containerizer/mesos/.libs/libmesos_no_3rdparty_la-containerizer.o In file included from ./linux/routing/filter/ip.hpp:36:0, from ./slave/containerizer/isolators/network/port_mapping.hpp:42, from slave/containerizer/mesos/containerizer.cpp:44: ./linux/routing/filter/filter.hpp:29:43: fatal error: linux/routing/filter/handle.hpp: No such file or directory #include ""linux/routing/filter/handle.hpp"" ^",1 MESOS-1856,"Support specifying libnl3 install location.","LIBNL_CFLAGS uses a hard-coded path in the configure script, instead of detecting the location.",2 MESOS-1858,"Leaked file descriptors in StatusUpdateStream.","https://github.com/apache/mesos/blob/master/src/slave/status_update_manager.hpp#L180 We should set cloexec for 'fd'.",1 MESOS-1862,"Performance regression in the Master's http metrics.","As part of the change to hold on to terminal unacknowledged tasks in the master, we introduced a performance regression during the following patch: https://github.com/apache/mesos/commit/0760b007ad65bc91e8cea377339978c78d36d247 {noformat} commit 0760b007ad65bc91e8cea377339978c78d36d247 Author: Benjamin Mahler Date: Thu Sep 11 10:48:20 2014 -0700 Minor cleanups to the Master code. Review: https://reviews.apache.org/r/25566 {noformat} Rather than keeping a running count of allocated resources, we now compute resources on-demand. This was done in order to ignore terminal task's resources. As a result of this change, the /stats.json and /metrics/snapshot endpoints on the master have slowed down substantially on large clusters. {noformat} $ time curl localhost:5050/health real 0m0.004s user 0m0.001s sys 0m0.002s $ time curl localhost:5050/stats.json > /dev/null real 0m15.402s user 0m0.001s sys 0m0.003s $ time curl localhost:5050/metrics/snapshot > /dev/null real 0m6.059s user 0m0.002s sys 0m0.002s {noformat} {{perf top}} reveals some of the resource computation during a request to stats.json: {noformat: perf top} Events: 36K cycles 10.53% libc-2.5.so [.] _int_free 9.90% libc-2.5.so [.] malloc 8.56% libmesos-0.21.0.so [.] std::_Rb_tree, std::less, std::allocator >:: 8.23% libc-2.5.so [.] _int_malloc 5.80% libstdc++.so.6.0.8 [.] std::_Rb_tree_increment(std::_Rb_tree_node_base*) 5.33% [kernel] [k] _raw_spin_lock 3.13% libstdc++.so.6.0.8 [.] std::string::assign(std::string const&) 2.95% libmesos-0.21.0.so [.] process::SocketManager::exited(process::ProcessBase*) 2.43% libmesos-0.21.0.so [.] mesos::Resource::MergeFrom(mesos::Resource const&) 1.88% libmesos-0.21.0.so [.] mesos::internal::master::Slave::used() const 1.48% libstdc++.so.6.0.8 [.] __gnu_cxx::__atomic_add(int volatile*, int) 1.45% [kernel] [k] find_busiest_group 1.41% libc-2.5.so [.] free 1.38% libmesos-0.21.0.so [.] mesos::Value_Range::MergeFrom(mesos::Value_Range const&) 1.13% libmesos-0.21.0.so [.] mesos::Value_Scalar::MergeFrom(mesos::Value_Scalar const&) 1.12% libmesos-0.21.0.so [.] mesos::Resource::SharedDtor() 1.07% libstdc++.so.6.0.8 [.] __gnu_cxx::__exchange_and_add(int volatile*, int) 0.94% libmesos-0.21.0.so [.] google::protobuf::UnknownFieldSet::MergeFrom(google::protobuf::UnknownFieldSet const&) 0.92% libstdc++.so.6.0.8 [.] operator new(unsigned long) 0.88% libmesos-0.21.0.so [.] mesos::Value_Ranges::MergeFrom(mesos::Value_Ranges const&) 0.75% libmesos-0.21.0.so [.] mesos::matches(mesos::Resource const&, mesos::Resource const&) {noformat}",3 MESOS-1863,"Split launch tasks and decline offers metrics","Both launchTasks() and declineOffers() scheduler driver calls end up in ""messages_launch_tasks"" metric on the master. It would be nice to split them for differentiating these two calls.",1 MESOS-1865,"Redirect to the leader master when current master is not a leader","Some of the API endpoints, for example /master/tasks.json, will return bogus information if you query a non-leading master: {code} [steven@Anesthetize:~]% curl http://master1.mesos-vpcqa.otenv.com:5050/master/tasks.json | jq . | head -n 10 { ""tasks"": [] } [steven@Anesthetize:~]% curl http://master2.mesos-vpcqa.otenv.com:5050/master/tasks.json | jq . | head -n 10 { ""tasks"": [] } [steven@Anesthetize:~]% curl http://master3.mesos-vpcqa.otenv.com:5050/master/tasks.json | jq . | head -n 10 { ""tasks"": [ { ""executor_id"": """", ""framework_id"": ""20140724-231003-419644938-5050-1707-0000"", ""id"": ""pp.guestcenterwebhealthmonitor.606cd6ee-4b50-11e4-825b-5212e05f35db"", ""name"": ""pp.guestcenterwebhealthmonitor.606cd6ee-4b50-11e4-825b-5212e05f35db"", ""resources"": { ""cpus"": 0.25, ""disk"": 0, {code} This is very hard for end-users to work around. For example if I query ""which master is leading"" followed by ""leader: which tasks are running"" it is possible that the leader fails over in between, leaving me with an incorrect answer and no way to know that this happened. In my opinion the API should return the correct response (by asking the current leader?) or an error (500 Not the leader?) but it's unacceptable to return a successful wrong answer. ",3 MESOS-1866,"Race between ~Authenticator() and Authenticator::authenticate() can lead to schedulers/slaves to never get authenticated","The master might get a duplicate authenticate() request while a previous authentication attempt is in progress. Depending on what the AuthenticatorProcess is executing at the time, there are 2 possible race conditions which will cause scheduler/slave to continuously retry authentication but never succeed. We have seen both the race conditions in a heavily loaded production cluster. Race1: ---------- --> An authenticate() event was dispatched to AuthenticatorProcess (Master::authenticate() called Authenticator::authenticate()) --> A terminate() event was then injected into the front of the AuthenticatorProcess queue (duplicate Master::authenticate() did ~Authenticator) before the above authenticate() event was executed. --> Due to the bug in libprocess, the future returned by Master::authenticate() was never transitioned to discarded (Master::_authenticate() was never called). --> This caused all the subsequent authentication retries to be enqueued on the master waiting for Master::_authenticate() to be executed. Fix: Transition the dispatched future to discarded if the libprocess is terminated (https://reviews.apache.org/r/25945/) Race 2: ----------- --> An authenticate() event was dispatched to AuthenticatorProcess (Master::authenticate() called Authenticator::authenticate()) --> AuthenticatorProcess::authenticate() executed and set promise.onDiscard(defer(self, Self::discarded)). NOTE: The internal promise of AuthenticatorProcess is discarded in AuthenticatorProcess::discarded() --> A terminate() event was then injected into the front of the AuthenticatorProcess queue (duplicate Master::authenticate() did ~Authenticator) before the above discarded() event was executed) --> ~AuthenticatorProcess is destructed without ever discarding the internal promise (Master::_authenticate() was never called). --> This caused all the subsequent authentication retries to be enqueued on the master waiting for Master::_authenticate() to be executed. Fix: The fix here is to discard the internal promise when the AuthenticatorProcess is destructed.",2 MESOS-1869,"UpdateFramework message might reach the slave before Reregistered message and get dropped","In reregisterSlave() we send 'SlaveReregisteredMessage' before we link the slave pid, which means a temporary socket will be created and used. Subsequently, after linking, we send the UpdateFrameworkMessage, which creates and uses a persistent socket. This might lead to out-of-order delivery, resulting in UpdateFrameworkMessage reaching the slave before the SlaveReregisteredMessage and getting dropped because the slave is not yet (re-)registered.",1 MESOS-1875,"os::killtree() incorrectly returns early if pid has terminated","If groups == true and/or sessions == true then os::killtree() should continue to signal all processes in the process group and/or session, even if the leading pid has terminated.",2 MESOS-1901,"Slave resources obtained from localhost:5051/state.json is not correct.","The 'resources' field in Slave is uninitialized. Also, seems that 'attributes' field in Slave is redundant as we store slave info. ",2 MESOS-1903,"Add backoff to framework re-registration retries","To avoid so many duplicate framework re-registration attempts (and thus offer rescinds) we should add backoff to re-registration retries.",3 MESOS-1913,"Create libevent/SSL-backed Socket implementation",NULL,13 MESOS-1941,"Make executor's user owner of executor's cgroup directory","Currently, when cgroups are enabled, and executor is spawned, it's mounted under, for ex: /sys/fs/cgroup/cpu/mesos/. This directory in current implementation is only writable by root user. This prevents process launched by executor to mount its child processes under this cgroup, because the cgroup directory is only writable by root. To enable a executor spawned process to mount it's child processes under it's cgroup directory, the cgroup directory should be made writable by the user which spawns the executor.",3 MESOS-1943,"Add event queue size metrics to scheduler driver","In the master process, we expose metrics for event queue sizes for various event types. We should do the same for the scheduler driver process.",2 MESOS-1955,"Specification for Executor and Task life cycles in Slave","We should create a precise specification of what the Mesos source code is supposed to be implementing wrt. the life cycle of executors and tasks. And in addition, we should document why certain design decisions have been made one way or another, to provide guidance for future code changes. With such a source code-independent specification, we could write unbiased regression and scale tests, which would be instrumental in maintaining high quality. Furthermore, this would make the source code more amenable. Why pick this particular area of the source code? Shouldn't more of Mesos have a thorough specification? Probably so. But we need to start somewhere and this area seems to be a good choice, given both its intricacy and its importance. ",5 MESOS-1964,"0.21.0 release","Mesos release 0.21.0 will include the following major feature(s): - Provide state reconciliation for frameworks. [(MESOS-1407)|https://issues.apache.org/jira/browse/MESOS-1407] Possible features to include: - Isolation of system directories (/tmp) for Mesos containers [(MESOS-1586)|https://issues.apache.org/jira/browse/MESOS-1586] - Expose reason for TASK_KILLED [(1930)|https://issues.apache.org/jira/browse/MESOS-1930] This ticket will be used to track blockers to this release. ",5 MESOS-1967,"Test RoutingTest.INETSockets fails on some machine","{noformat} [ RUN ] RoutingTest.INETSockets ../../../mesos/src/tests/routing_tests.cpp:238: Failure infos: Input data out of range ABORT: (../../../mesos/3rdparty/libprocess/3rdparty/stout/include/stout/try.hpp:92): Try::get() but state == ERROR: Input data out of range*** Aborted at 1414000937 (unix time) try ""date -d @1414000937"" if you are using GNU date *** PC: @ 0x7f2c2d509fc5 __GI_raise *** SIGABRT (@0x1b49000040b1) received by PID 16561 (TID 0x7f2c31031720) from PID 16561; stack trace: *** @ 0x7f2c2f0d4ca0 (unknown) @ 0x7f2c2d509fc5 __GI_raise @ 0x7f2c2d50ba70 __GI_abort @ 0x4cf782 _Abort() @ 0x4cf7bc _Abort() @ 0x99459e RoutingTest_INETSockets_Test::TestBody() @ 0xa1c363 testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0xa13617 testing::Test::Run() @ 0xa136be testing::TestInfo::Run() @ 0xa137c5 testing::TestCase::Run() @ 0xa13a68 testing::internal::UnitTestImpl::RunAllTests() @ 0xa13cf7 testing::UnitTest::Run() @ 0x49bc4b main @ 0x7f2c2d4f79f4 __libc_start_main @ 0x4aad79 (unknown) make[3]: *** [check-local] Aborted {noformat}",2 MESOS-1969,"RBT only takes revision ranges as args for versions >= 0.6","the {{support/post-reviews.py}} script doesn't differentiate between RBT versions although the calling conventions for passing revision ranges are different. ",1 MESOS-1970,"slave and offer ids are indistinguishable in the logs","It is currently impossible to tell slave ids and offer ids apart when looking at logs. Adding some differentiator will make log reading a little simpler.",1 MESOS-1972,"Move TASK_LOST generations due to invalid tasks from scheduler driver to master","As we move towards pure scheduler/executor clients, it is imperative that the scheduler driver doesn't do validation of tasks and generate TASK_LOST messages itself. All that logic should live in the master. Schedulers should reconcile dropped messages via reconciliation. ",3 MESOS-1974,"Refactor the C++ Resources abstraction for DiskInfo","As we introduce DiskInfo and reservation for Resource. We need to change the C++ Resources abstraction to properly deal with merge/split of resources with those additional fields. Also, the existing C++ 'Resources' interfaces are poorly designed. Some of them are confusing and unintuitive. Some of them are overloaded with too many functionalities. For instance, {noformat} bool operator <= (const Resource& left, const Resource& right); {noformat} This interface in non-intuitive because A <= B doesn't imply !(B <= A). {noformat} Resource operator + (const Resource& left, const Resource& right); {noformat} This one is also non-intuitive because if 'left' is not compatible with 'right', the result is 'left' (why not right???). Similar for operator '-'. {noformat} Option Resources::get(const Resource& r) const; {noformat} This one assume Resources is flattened, but it might not be. As we start to introduce persistent disk resources (MESOS-1554), things will get more complicated. For example, one may want to get two types of 'disk()' functions: one returns the ephemeral disk bytes (with no disk info), one returns the total disk bytes (including ones that have disk info). We may wanna introduce a concept about Resource that indicates that a resource cannot be merged or split (e.g., atomic?). Since we need to change this class anyway. I wanna take this chance to refactor it.",8 MESOS-1984,"Documentation for Egress Control Limit",NULL,1 MESOS-1989,"Container network stats reported by the port mapping isolator is the reverse of the actual network stats.","Looks like the TX/RX network stats reported is the reverse of the actual network stats. The reason is because we simply get TX/RX data from veth on the host. Since veth pair is a tunnel, the ingress of veth on host is the egress of eth0 in container (and vice versa). Therefore, we need to flip the data we got from veth. {noformat} [jyu@... ~]$ sudo ip netns exec 24926 /sbin/ip -s link show dev eth0 2: eth0: mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000 link/ether f0:4d:a2:75:74:05 brd ff:ff:ff:ff:ff:ff RX: bytes packets errors dropped overrun mcast 46030857691178 12561038581 0 0 0 0 TX: bytes packets errors dropped carrier collsns 29792886058561 15036798198 0 0 0 0 [jyu@... ~]$ ip -s link show dev mesos24926 7412: mesos24926: mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000 link/ether f0:4d:a2:75:74:05 brd ff:ff:ff:ff:ff:ff RX: bytes packets errors dropped overrun mcast 29793066979551 15036894749 0 0 0 0 TX: bytes packets errors dropped carrier collsns 46031126366116 12561113732 0 0 0 0 {noformat}",1 MESOS-2007,"AllocatorTest/0.SlaveReregistersFirst is flaky","{noformat:title=} [ RUN ] AllocatorTest/0.SlaveReregistersFirst Using temporary directory '/tmp/AllocatorTest_0_SlaveReregistersFirst_YPe61d' I1028 23:48:22.360447 31190 leveldb.cpp:176] Opened db in 2.192575ms I1028 23:48:22.361253 31190 leveldb.cpp:183] Compacted db in 760753ns I1028 23:48:22.361320 31190 leveldb.cpp:198] Created db iterator in 22188ns I1028 23:48:22.361340 31190 leveldb.cpp:204] Seeked to beginning of db in 1950ns I1028 23:48:22.361351 31190 leveldb.cpp:273] Iterated through 0 keys in the db in 345ns I1028 23:48:22.361403 31190 replica.cpp:741] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I1028 23:48:22.362185 31217 recover.cpp:437] Starting replica recovery I1028 23:48:22.362764 31219 recover.cpp:463] Replica is in EMPTY status I1028 23:48:22.363955 31210 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I1028 23:48:22.364320 31217 recover.cpp:188] Received a recover response from a replica in EMPTY status I1028 23:48:22.364820 31211 recover.cpp:554] Updating replica status to STARTING I1028 23:48:22.365365 31215 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 418991ns I1028 23:48:22.365391 31215 replica.cpp:320] Persisted replica status to STARTING I1028 23:48:22.365617 31217 recover.cpp:463] Replica is in STARTING status I1028 23:48:22.366328 31206 master.cpp:312] Master 20141028-234822-3193029443-50043-31190 (pietas.apache.org) started on 67.195.81.190:50043 I1028 23:48:22.366377 31206 master.cpp:358] Master only allowing authenticated frameworks to register I1028 23:48:22.366391 31206 master.cpp:363] Master only allowing authenticated slaves to register I1028 23:48:22.366402 31206 credentials.hpp:36] Loading credentials for authentication from '/tmp/AllocatorTest_0_SlaveReregistersFirst_YPe61d/credentials' I1028 23:48:22.366708 31206 master.cpp:392] Authorization enabled I1028 23:48:22.366886 31209 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I1028 23:48:22.367311 31208 master.cpp:120] No whitelist given. Advertising offers for all slaves I1028 23:48:22.367312 31207 recover.cpp:188] Received a recover response from a replica in STARTING status I1028 23:48:22.367686 31211 hierarchical_allocator_process.hpp:299] Initializing hierarchical allocator process with master : master@67.195.81.190:50043 I1028 23:48:22.367863 31212 recover.cpp:554] Updating replica status to VOTING I1028 23:48:22.368477 31218 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 375527ns I1028 23:48:22.368505 31218 replica.cpp:320] Persisted replica status to VOTING I1028 23:48:22.368517 31204 master.cpp:1242] The newly elected leader is master@67.195.81.190:50043 with id 20141028-234822-3193029443-50043-31190 I1028 23:48:22.368549 31204 master.cpp:1255] Elected as the leading master! I1028 23:48:22.368567 31204 master.cpp:1073] Recovering from registrar I1028 23:48:22.368621 31215 recover.cpp:568] Successfully joined the Paxos group I1028 23:48:22.368716 31219 registrar.cpp:313] Recovering registrar I1028 23:48:22.369000 31215 recover.cpp:452] Recover process terminated I1028 23:48:22.369523 31208 log.cpp:656] Attempting to start the writer I1028 23:48:22.370909 31205 replica.cpp:474] Replica received implicit promise request with proposal 1 I1028 23:48:22.371266 31205 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 325016ns I1028 23:48:22.371290 31205 replica.cpp:342] Persisted promised to 1 I1028 23:48:22.371979 31218 coordinator.cpp:230] Coordinator attemping to fill missing position I1028 23:48:22.373378 31210 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2 I1028 23:48:22.373746 31210 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 329018ns I1028 23:48:22.373772 31210 replica.cpp:676] Persisted action at 0 I1028 23:48:22.374897 31214 replica.cpp:508] Replica received write request for position 0 I1028 23:48:22.374951 31214 leveldb.cpp:438] Reading position from leveldb took 26002ns I1028 23:48:22.375272 31214 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 289094ns I1028 23:48:22.375298 31214 replica.cpp:676] Persisted action at 0 I1028 23:48:22.375886 31204 replica.cpp:655] Replica received learned notice for position 0 I1028 23:48:22.376258 31204 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 346650ns I1028 23:48:22.376277 31204 replica.cpp:676] Persisted action at 0 I1028 23:48:22.376298 31204 replica.cpp:661] Replica learned NOP action at position 0 I1028 23:48:22.376843 31215 log.cpp:672] Writer started with ending position 0 I1028 23:48:22.378056 31205 leveldb.cpp:438] Reading position from leveldb took 28265ns I1028 23:48:22.380323 31217 registrar.cpp:346] Successfully fetched the registry (0B) in 11.55584ms I1028 23:48:22.380466 31217 registrar.cpp:445] Applied 1 operations in 50632ns; attempting to update the 'registry' I1028 23:48:22.382472 31217 log.cpp:680] Attempting to append 139 bytes to the log I1028 23:48:22.382715 31210 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 1 I1028 23:48:22.383463 31210 replica.cpp:508] Replica received write request for position 1 I1028 23:48:22.383857 31210 leveldb.cpp:343] Persisting action (158 bytes) to leveldb took 363758ns I1028 23:48:22.383875 31210 replica.cpp:676] Persisted action at 1 I1028 23:48:22.384397 31218 replica.cpp:655] Replica received learned notice for position 1 I1028 23:48:22.384840 31218 leveldb.cpp:343] Persisting action (160 bytes) to leveldb took 420161ns I1028 23:48:22.384862 31218 replica.cpp:676] Persisted action at 1 I1028 23:48:22.384882 31218 replica.cpp:661] Replica learned APPEND action at position 1 I1028 23:48:22.385684 31211 registrar.cpp:490] Successfully updated the 'registry' in 5.158144ms I1028 23:48:22.385818 31211 registrar.cpp:376] Successfully recovered registrar I1028 23:48:22.385912 31214 log.cpp:699] Attempting to truncate the log to 1 I1028 23:48:22.386101 31218 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 2 I1028 23:48:22.386124 31211 master.cpp:1100] Recovered 0 slaves from the Registry (101B) ; allowing 10mins for slaves to re-register I1028 23:48:22.387398 31209 replica.cpp:508] Replica received write request for position 2 I1028 23:48:22.387758 31209 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 334969ns I1028 23:48:22.387776 31209 replica.cpp:676] Persisted action at 2 I1028 23:48:22.388272 31204 replica.cpp:655] Replica received learned notice for position 2 I1028 23:48:22.388453 31204 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 159390ns I1028 23:48:22.388501 31204 leveldb.cpp:401] Deleting ~1 keys from leveldb took 30409ns I1028 23:48:22.388516 31204 replica.cpp:676] Persisted action at 2 I1028 23:48:22.388531 31204 replica.cpp:661] Replica learned TRUNCATE action at position 2 I1028 23:48:22.400737 31207 slave.cpp:169] Slave started on 34)@67.195.81.190:50043 I1028 23:48:22.400786 31207 credentials.hpp:84] Loading credential for authentication from '/tmp/AllocatorTest_0_SlaveReregistersFirst_QPPV21/credential' I1028 23:48:22.400996 31207 slave.cpp:276] Slave using credential for: test-principal I1028 23:48:22.401304 31207 slave.cpp:289] Slave resources: cpus(*):2; mem(*):1024; disk(*):3.70122e+06; ports(*):[31000-32000] I1028 23:48:22.401413 31207 slave.cpp:318] Slave hostname: pietas.apache.org I1028 23:48:22.401520 31207 slave.cpp:319] Slave checkpoint: false W1028 23:48:22.401535 31207 slave.cpp:321] Disabling checkpointing is deprecated and the --checkpoint flag will be removed in a future release. Please avoid using this flag I1028 23:48:22.402349 31207 state.cpp:33] Recovering state from '/tmp/AllocatorTest_0_SlaveReregistersFirst_QPPV21/meta' I1028 23:48:22.402678 31207 status_update_manager.cpp:197] Recovering status update manager I1028 23:48:22.403048 31211 slave.cpp:3456] Finished recovery I1028 23:48:22.403815 31215 slave.cpp:602] New master detected at master@67.195.81.190:50043 I1028 23:48:22.403852 31215 slave.cpp:665] Authenticating with master master@67.195.81.190:50043 I1028 23:48:22.403875 31206 status_update_manager.cpp:171] Pausing sending status updates I1028 23:48:22.403961 31215 slave.cpp:638] Detecting new master I1028 23:48:22.404016 31211 authenticatee.hpp:133] Creating new client SASL connection I1028 23:48:22.404230 31204 master.cpp:3853] Authenticating slave(34)@67.195.81.190:50043 I1028 23:48:22.404464 31205 authenticator.hpp:161] Creating new server SASL connection I1028 23:48:22.404613 31211 authenticatee.hpp:224] Received SASL authentication mechanisms: CRAM-MD5 I1028 23:48:22.404649 31211 authenticatee.hpp:250] Attempting to authenticate with mechanism 'CRAM-MD5' I1028 23:48:22.404734 31211 authenticator.hpp:267] Received SASL authentication start I1028 23:48:22.404783 31211 authenticator.hpp:389] Authentication requires more steps I1028 23:48:22.404898 31215 authenticatee.hpp:270] Received SASL authentication step I1028 23:48:22.404999 31215 authenticator.hpp:295] Received SASL authentication step I1028 23:48:22.405030 31215 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'pietas.apache.org' server FQDN: 'pietas.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I1028 23:48:22.405047 31215 auxprop.cpp:153] Looking up auxiliary property '*userPassword' I1028 23:48:22.405086 31215 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I1028 23:48:22.405109 31215 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'pietas.apache.org' server FQDN: 'pietas.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I1028 23:48:22.405122 31215 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I1028 23:48:22.405129 31215 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I1028 23:48:22.405146 31215 authenticator.hpp:381] Authentication success I1028 23:48:22.405243 31213 authenticatee.hpp:310] Authentication success I1028 23:48:22.405253 31214 master.cpp:3893] Successfully authenticated principal 'test-principal' at slave(34)@67.195.81.190:50043 I1028 23:48:22.405505 31213 slave.cpp:722] Successfully authenticated with master master@67.195.81.190:50043 I1028 23:48:22.405619 31213 slave.cpp:1050] Will retry registration in 17.050994ms if necessary I1028 23:48:22.405819 31215 master.cpp:3032] Registering slave at slave(34)@67.195.81.190:50043 (pietas.apache.org) with id 20141028-234822-3193029443-50043-31190-S0 I1028 23:48:22.406262 31216 registrar.cpp:445] Applied 1 operations in 52647ns; attempting to update the 'registry' I1028 23:48:22.406697 31190 sched.cpp:137] Version: 0.21.0 I1028 23:48:22.407083 31211 sched.cpp:233] New master detected at master@67.195.81.190:50043 I1028 23:48:22.407114 31211 sched.cpp:283] Authenticating with master master@67.195.81.190:50043 I1028 23:48:22.407290 31214 authenticatee.hpp:133] Creating new client SASL connection I1028 23:48:22.407424 31214 master.cpp:3853] Authenticating scheduler-0aa33fc7-0d29-487c-80eb-f933681f9c95@67.195.81.190:50043 I1028 23:48:22.407659 31207 authenticator.hpp:161] Creating new server SASL connection I1028 23:48:22.407757 31207 authenticatee.hpp:224] Received SASL authentication mechanisms: CRAM-MD5 I1028 23:48:22.407774 31207 authenticatee.hpp:250] Attempting to authenticate with mechanism 'CRAM-MD5' I1028 23:48:22.407830 31207 authenticator.hpp:267] Received SASL authentication start I1028 23:48:22.407868 31207 authenticator.hpp:389] Authentication requires more steps I1028 23:48:22.407927 31207 authenticatee.hpp:270] Received SASL authentication step I1028 23:48:22.408015 31212 authenticator.hpp:295] Received SASL authentication step I1028 23:48:22.408037 31212 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'pietas.apache.org' server FQDN: 'pietas.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I1028 23:48:22.408046 31212 auxprop.cpp:153] Looking up auxiliary property '*userPassword' I1028 23:48:22.408072 31212 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I1028 23:48:22.408092 31212 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'pietas.apache.org' server FQDN: 'pietas.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I1028 23:48:22.408100 31212 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I1028 23:48:22.408105 31212 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I1028 23:48:22.408116 31212 authenticator.hpp:381] Authentication success I1028 23:48:22.408192 31210 authenticatee.hpp:310] Authentication success I1028 23:48:22.408210 31217 master.cpp:3893] Successfully authenticated principal 'test-principal' at scheduler-0aa33fc7-0d29-487c-80eb-f933681f9c95@67.195.81.190:50043 I1028 23:48:22.408419 31210 sched.cpp:357] Successfully authenticated with master master@67.195.81.190:50043 I1028 23:48:22.408460 31210 sched.cpp:476] Sending registration request to master@67.195.81.190:50043 I1028 23:48:22.408568 31217 master.cpp:1362] Received registration request for framework 'default' at scheduler-0aa33fc7-0d29-487c-80eb-f933681f9c95@67.195.81.190:50043 I1028 23:48:22.408617 31217 master.cpp:1321] Authorizing framework principal 'test-principal' to receive offers for role '*' I1028 23:48:22.408937 31214 master.cpp:1426] Registering framework 20141028-234822-3193029443-50043-31190-0000 (default) at scheduler-0aa33fc7-0d29-487c-80eb-f933681f9c95@67.195.81.190:50043 I1028 23:48:22.409265 31213 sched.cpp:407] Framework registered with 20141028-234822-3193029443-50043-31190-0000 I1028 23:48:22.409267 31212 hierarchical_allocator_process.hpp:329] Added framework 20141028-234822-3193029443-50043-31190-0000 I1028 23:48:22.409312 31212 hierarchical_allocator_process.hpp:697] No resources available to allocate! I1028 23:48:22.409324 31215 log.cpp:680] Attempting to append 316 bytes to the log I1028 23:48:22.409333 31213 sched.cpp:421] Scheduler::registered took 38591ns I1028 23:48:22.409327 31212 hierarchical_allocator_process.hpp:659] Performed allocation for 0 slaves in 24107ns I1028 23:48:22.409518 31205 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 3 I1028 23:48:22.410127 31206 replica.cpp:508] Replica received write request for position 3 I1028 23:48:22.410706 31206 leveldb.cpp:343] Persisting action (335 bytes) to leveldb took 554098ns I1028 23:48:22.410725 31206 replica.cpp:676] Persisted action at 3 I1028 23:48:22.411151 31217 replica.cpp:655] Replica received learned notice for position 3 I1028 23:48:22.411499 31217 leveldb.cpp:343] Persisting action (337 bytes) to leveldb took 326572ns I1028 23:48:22.411519 31217 replica.cpp:676] Persisted action at 3 I1028 23:48:22.411533 31217 replica.cpp:661] Replica learned APPEND action at position 3 I1028 23:48:22.412292 31219 registrar.cpp:490] Successfully updated the 'registry' in 5.972992ms I1028 23:48:22.412518 31218 log.cpp:699] Attempting to truncate the log to 3 I1028 23:48:22.412621 31213 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 4 I1028 23:48:22.412734 31219 slave.cpp:2522] Received ping from slave-observer(38)@67.195.81.190:50043 I1028 23:48:22.412787 31206 master.cpp:3086] Registered slave 20141028-234822-3193029443-50043-31190-S0 at slave(34)@67.195.81.190:50043 (pietas.apache.org) with cpus(*):2; mem(*):1024; disk(*):3.70122e+06; ports(*):[31000-32000] I1028 23:48:22.412858 31219 slave.cpp:756] Registered with master master@67.195.81.190:50043; given slave ID 20141028-234822-3193029443-50043-31190-S0 I1028 23:48:22.412994 31210 status_update_manager.cpp:178] Resuming sending status updates I1028 23:48:22.413014 31211 hierarchical_allocator_process.hpp:442] Added slave 20141028-234822-3193029443-50043-31190-S0 (pietas.apache.org) with cpus(*):2; mem(*):1024; disk(*):3.70122e+06; ports(*):[31000-32000] (and cpus(*):2; mem(*):1024; disk(*):3.70122e+06; ports(*):[31000-32000] available) I1028 23:48:22.413159 31211 hierarchical_allocator_process.hpp:734] Offering cpus(*):2; mem(*):1024; disk(*):3.70122e+06; ports(*):[31000-32000] on slave 20141028-234822-3193029443-50043-31190-S0 to framework 20141028-234822-3193029443-50043-31190-0000 I1028 23:48:22.413290 31208 replica.cpp:508] Replica received write request for position 4 I1028 23:48:22.413421 31211 hierarchical_allocator_process.hpp:679] Performed allocation for slave 20141028-234822-3193029443-50043-31190-S0 in 346658ns I1028 23:48:22.413650 31208 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 336067ns I1028 23:48:22.413668 31208 replica.cpp:676] Persisted action at 4 I1028 23:48:22.413797 31216 master.cpp:3795] Sending 1 offers to framework 20141028-234822-3193029443-50043-31190-0000 (default) at scheduler-0aa33fc7-0d29-487c-80eb-f933681f9c95@67.195.81.190:50043 I1028 23:48:22.414077 31212 replica.cpp:655] Replica received learned notice for position 4 I1028 23:48:22.414356 31212 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 260401ns I1028 23:48:22.414403 31212 leveldb.cpp:401] Deleting ~2 keys from leveldb took 28541ns I1028 23:48:22.414417 31212 replica.cpp:676] Persisted action at 4 I1028 23:48:22.414446 31212 replica.cpp:661] Replica learned TRUNCATE action at position 4 I1028 23:48:22.414422 31207 sched.cpp:544] Scheduler::resourceOffers took 310278ns I1028 23:48:22.415086 31214 master.cpp:2321] Processing reply for offers: [ 20141028-234822-3193029443-50043-31190-O0 ] on slave 20141028-234822-3193029443-50043-31190-S0 at slave(34)@67.195.81.190:50043 (pietas.apache.org) for framework 20141028-234822-3193029443-50043-31190-0000 (default) at scheduler-0aa33fc7-0d29-487c-80eb-f933681f9c95@67.195.81.190:50043 W1028 23:48:22.415163 31214 master.cpp:1969] Executor default for task 0 uses less CPUs (None) than the minimum required (0.01). Please update your executor, as this will be mandatory in future releases. W1028 23:48:22.415186 31214 master.cpp:1980] Executor default for task 0 uses less memory (None) than the minimum required (32MB). Please update your executor, as this will be mandatory in future releases. I1028 23:48:22.415256 31214 master.cpp:2417] Authorizing framework principal 'test-principal' to launch task 0 as user 'jenkins' I1028 23:48:22.416033 31219 master.hpp:877] Adding task 0 with resources cpus(*):1; mem(*):500 on slave 20141028-234822-3193029443-50043-31190-S0 (pietas.apache.org) I1028 23:48:22.416084 31219 master.cpp:2480] Launching task 0 of framework 20141028-234822-3193029443-50043-31190-0000 (default) at scheduler-0aa33fc7-0d29-487c-80eb-f933681f9c95@67.195.81.190:50043 with resources cpus(*):1; mem(*):500 on slave 20141028-234822-3193029443-50043-31190-S0 at slave(34)@67.195.81.190:50043 (pietas.apache.org) I1028 23:48:22.416317 31214 slave.cpp:1081] Got assigned task 0 for framework 20141028-234822-3193029443-50043-31190-0000 I1028 23:48:22.416679 31215 hierarchical_allocator_process.hpp:563] Recovered cpus(*):1; mem(*):524; disk(*):3.70122e+06; ports(*):[31000-32000] (total allocatable: cpus(*):1; mem(*):524; disk(*):3.70122e+06; ports(*):[31000-32000]) on slave 20141028-234822-3193029443-50043-31190-S0 from framework 20141028-234822-3193029443-50043-31190-0000 I1028 23:48:22.416721 31215 hierarchical_allocator_process.hpp:599] Framework 20141028-234822-3193029443-50043-31190-0000 filtered slave 2...",2 MESOS-2008,"MasterAuthorizationTest.DuplicateReregistration is flaky","{noformat:title=} [ RUN ] MasterAuthorizationTest.DuplicateReregistration Using temporary directory '/tmp/MasterAuthorizationTest_DuplicateReregistration_DLOmYX' I1029 08:25:26.021766 32232 leveldb.cpp:176] Opened db in 3.066621ms I1029 08:25:26.022734 32232 leveldb.cpp:183] Compacted db in 935019ns I1029 08:25:26.022766 32232 leveldb.cpp:198] Created db iterator in 4350ns I1029 08:25:26.022785 32232 leveldb.cpp:204] Seeked to beginning of db in 902ns I1029 08:25:26.022799 32232 leveldb.cpp:273] Iterated through 0 keys in the db in 387ns I1029 08:25:26.022831 32232 replica.cpp:741] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I1029 08:25:26.023305 32248 recover.cpp:437] Starting replica recovery I1029 08:25:26.023598 32248 recover.cpp:463] Replica is in EMPTY status I1029 08:25:26.025059 32260 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I1029 08:25:26.025320 32247 recover.cpp:188] Received a recover response from a replica in EMPTY status I1029 08:25:26.025585 32256 recover.cpp:554] Updating replica status to STARTING I1029 08:25:26.026546 32249 master.cpp:312] Master 20141029-082526-3142697795-40696-32232 (pomona.apache.org) started on 67.195.81.187:40696 I1029 08:25:26.026561 32261 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 694444ns I1029 08:25:26.026592 32249 master.cpp:358] Master only allowing authenticated frameworks to register I1029 08:25:26.026592 32261 replica.cpp:320] Persisted replica status to STARTING I1029 08:25:26.026605 32249 master.cpp:363] Master only allowing authenticated slaves to register I1029 08:25:26.026639 32249 credentials.hpp:36] Loading credentials for authentication from '/tmp/MasterAuthorizationTest_DuplicateReregistration_DLOmYX/credentials' I1029 08:25:26.026877 32249 master.cpp:392] Authorization enabled I1029 08:25:26.026901 32260 recover.cpp:463] Replica is in STARTING status I1029 08:25:26.027498 32261 master.cpp:120] No whitelist given. Advertising offers for all slaves I1029 08:25:26.027541 32248 hierarchical_allocator_process.hpp:299] Initializing hierarchical allocator process with master : master@67.195.81.187:40696 I1029 08:25:26.028055 32252 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I1029 08:25:26.028451 32247 recover.cpp:188] Received a recover response from a replica in STARTING status I1029 08:25:26.028733 32249 master.cpp:1242] The newly elected leader is master@67.195.81.187:40696 with id 20141029-082526-3142697795-40696-32232 I1029 08:25:26.028764 32249 master.cpp:1255] Elected as the leading master! I1029 08:25:26.028781 32249 master.cpp:1073] Recovering from registrar I1029 08:25:26.028904 32246 recover.cpp:554] Updating replica status to VOTING I1029 08:25:26.029163 32257 registrar.cpp:313] Recovering registrar I1029 08:25:26.029556 32251 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 485711ns I1029 08:25:26.029588 32251 replica.cpp:320] Persisted replica status to VOTING I1029 08:25:26.029726 32253 recover.cpp:568] Successfully joined the Paxos group I1029 08:25:26.029932 32253 recover.cpp:452] Recover process terminated I1029 08:25:26.030436 32250 log.cpp:656] Attempting to start the writer I1029 08:25:26.032152 32248 replica.cpp:474] Replica received implicit promise request with proposal 1 I1029 08:25:26.032778 32248 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 597030ns I1029 08:25:26.032807 32248 replica.cpp:342] Persisted promised to 1 I1029 08:25:26.033481 32254 coordinator.cpp:230] Coordinator attemping to fill missing position I1029 08:25:26.035429 32247 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2 I1029 08:25:26.036154 32247 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 690208ns I1029 08:25:26.036181 32247 replica.cpp:676] Persisted action at 0 I1029 08:25:26.037344 32249 replica.cpp:508] Replica received write request for position 0 I1029 08:25:26.037395 32249 leveldb.cpp:438] Reading position from leveldb took 22607ns I1029 08:25:26.038074 32249 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 647429ns I1029 08:25:26.038105 32249 replica.cpp:676] Persisted action at 0 I1029 08:25:26.038683 32247 replica.cpp:655] Replica received learned notice for position 0 I1029 08:25:26.039378 32247 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 664911ns I1029 08:25:26.039407 32247 replica.cpp:676] Persisted action at 0 I1029 08:25:26.039433 32247 replica.cpp:661] Replica learned NOP action at position 0 I1029 08:25:26.040045 32252 log.cpp:672] Writer started with ending position 0 I1029 08:25:26.041378 32251 leveldb.cpp:438] Reading position from leveldb took 25625ns I1029 08:25:26.044642 32246 registrar.cpp:346] Successfully fetched the registry (0B) in 15.433984ms I1029 08:25:26.044742 32246 registrar.cpp:445] Applied 1 operations in 16444ns; attempting to update the 'registry' I1029 08:25:26.047538 32256 log.cpp:680] Attempting to append 139 bytes to the log I1029 08:25:26.156330 32247 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 1 I1029 08:25:26.158460 32261 replica.cpp:508] Replica received write request for position 1 I1029 08:25:26.159277 32261 leveldb.cpp:343] Persisting action (158 bytes) to leveldb took 782308ns I1029 08:25:26.159328 32261 replica.cpp:676] Persisted action at 1 I1029 08:25:26.160267 32255 replica.cpp:655] Replica received learned notice for position 1 I1029 08:25:26.161070 32255 leveldb.cpp:343] Persisting action (160 bytes) to leveldb took 750259ns I1029 08:25:26.161100 32255 replica.cpp:676] Persisted action at 1 I1029 08:25:26.161125 32255 replica.cpp:661] Replica learned APPEND action at position 1 I1029 08:25:26.162199 32253 registrar.cpp:490] Successfully updated the 'registry' in 117.40416ms I1029 08:25:26.162400 32253 registrar.cpp:376] Successfully recovered registrar I1029 08:25:26.162724 32249 master.cpp:1100] Recovered 0 slaves from the Registry (101B) ; allowing 10mins for slaves to re-register I1029 08:25:26.162757 32253 log.cpp:699] Attempting to truncate the log to 1 I1029 08:25:26.162919 32256 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 2 I1029 08:25:26.163949 32250 replica.cpp:508] Replica received write request for position 2 I1029 08:25:26.164589 32250 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 603175ns I1029 08:25:26.164618 32250 replica.cpp:676] Persisted action at 2 I1029 08:25:26.165385 32251 replica.cpp:655] Replica received learned notice for position 2 I1029 08:25:26.166007 32251 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 594003ns I1029 08:25:26.166056 32251 leveldb.cpp:401] Deleting ~1 keys from leveldb took 23309ns I1029 08:25:26.166077 32251 replica.cpp:676] Persisted action at 2 I1029 08:25:26.166100 32251 replica.cpp:661] Replica learned TRUNCATE action at position 2 I1029 08:25:26.178493 32232 sched.cpp:137] Version: 0.21.0 I1029 08:25:26.179029 32256 sched.cpp:233] New master detected at master@67.195.81.187:40696 I1029 08:25:26.179078 32256 sched.cpp:283] Authenticating with master master@67.195.81.187:40696 I1029 08:25:26.179424 32246 authenticatee.hpp:133] Creating new client SASL connection I1029 08:25:26.179678 32259 master.cpp:3853] Authenticating scheduler-9ba6b803-40b4-48b9-bcef-45a329f6b2a4@67.195.81.187:40696 I1029 08:25:26.179970 32250 authenticator.hpp:161] Creating new server SASL connection I1029 08:25:26.180165 32250 authenticatee.hpp:224] Received SASL authentication mechanisms: CRAM-MD5 I1029 08:25:26.180191 32250 authenticatee.hpp:250] Attempting to authenticate with mechanism 'CRAM-MD5' I1029 08:25:26.180272 32250 authenticator.hpp:267] Received SASL authentication start I1029 08:25:26.180378 32250 authenticator.hpp:389] Authentication requires more steps I1029 08:25:26.180557 32260 authenticatee.hpp:270] Received SASL authentication step I1029 08:25:26.180704 32254 authenticator.hpp:295] Received SASL authentication step I1029 08:25:26.180737 32254 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'pomona.apache.org' server FQDN: 'pomona.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I1029 08:25:26.180748 32254 auxprop.cpp:153] Looking up auxiliary property '*userPassword' I1029 08:25:26.180780 32254 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I1029 08:25:26.180804 32254 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'pomona.apache.org' server FQDN: 'pomona.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I1029 08:25:26.180816 32254 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I1029 08:25:26.180824 32254 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I1029 08:25:26.180841 32254 authenticator.hpp:381] Authentication success I1029 08:25:26.180937 32259 authenticatee.hpp:310] Authentication success I1029 08:25:26.180991 32260 master.cpp:3893] Successfully authenticated principal 'test-principal' at scheduler-9ba6b803-40b4-48b9-bcef-45a329f6b2a4@67.195.81.187:40696 I1029 08:25:26.181422 32259 sched.cpp:357] Successfully authenticated with master master@67.195.81.187:40696 I1029 08:25:26.181449 32259 sched.cpp:476] Sending registration request to master@67.195.81.187:40696 I1029 08:25:26.181697 32260 master.cpp:1362] Received registration request for framework 'default' at scheduler-9ba6b803-40b4-48b9-bcef-45a329f6b2a4@67.195.81.187:40696 I1029 08:25:26.181758 32260 master.cpp:1321] Authorizing framework principal 'test-principal' to receive offers for role '*' I1029 08:25:26.182063 32260 master.cpp:1426] Registering framework 20141029-082526-3142697795-40696-32232-0000 (default) at scheduler-9ba6b803-40b4-48b9-bcef-45a329f6b2a4@67.195.81.187:40696 I1029 08:25:26.182430 32248 hierarchical_allocator_process.hpp:329] Added framework 20141029-082526-3142697795-40696-32232-0000 I1029 08:25:26.182462 32248 hierarchical_allocator_process.hpp:697] No resources available to allocate! I1029 08:25:26.182462 32261 sched.cpp:407] Framework registered with 20141029-082526-3142697795-40696-32232-0000 I1029 08:25:26.182473 32248 hierarchical_allocator_process.hpp:659] Performed allocation for 0 slaves in 15372ns I1029 08:25:26.182554 32261 sched.cpp:421] Scheduler::registered took 60059ns I1029 08:25:26.185515 32260 sched.cpp:227] Scheduler::disconnected took 16607ns I1029 08:25:26.185538 32260 sched.cpp:233] New master detected at master@67.195.81.187:40696 I1029 08:25:26.185567 32260 sched.cpp:283] Authenticating with master master@67.195.81.187:40696 I1029 08:25:26.185783 32246 authenticatee.hpp:133] Creating new client SASL connection I1029 08:25:26.186218 32250 master.cpp:3853] Authenticating scheduler-9ba6b803-40b4-48b9-bcef-45a329f6b2a4@67.195.81.187:40696 I1029 08:25:26.186456 32247 authenticator.hpp:161] Creating new server SASL connection I1029 08:25:26.186594 32250 authenticatee.hpp:224] Received SASL authentication mechanisms: CRAM-MD5 I1029 08:25:26.186621 32250 authenticatee.hpp:250] Attempting to authenticate with mechanism 'CRAM-MD5' I1029 08:25:26.186745 32259 authenticator.hpp:267] Received SASL authentication start I1029 08:25:26.186800 32259 authenticator.hpp:389] Authentication requires more steps I1029 08:25:26.186936 32260 authenticatee.hpp:270] Received SASL authentication step I1029 08:25:26.187062 32249 authenticator.hpp:295] Received SASL authentication step I1029 08:25:26.187095 32249 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'pomona.apache.org' server FQDN: 'pomona.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I1029 08:25:26.187108 32249 auxprop.cpp:153] Looking up auxiliary property '*userPassword' I1029 08:25:26.187137 32249 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I1029 08:25:26.187162 32249 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'pomona.apache.org' server FQDN: 'pomona.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I1029 08:25:26.187175 32249 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I1029 08:25:26.187182 32249 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I1029 08:25:26.187199 32249 authenticator.hpp:381] Authentication success I1029 08:25:26.187327 32249 authenticatee.hpp:310] Authentication success I1029 08:25:26.187366 32260 master.cpp:3893] Successfully authenticated principal 'test-principal' at scheduler-9ba6b803-40b4-48b9-bcef-45a329f6b2a4@67.195.81.187:40696 I1029 08:25:26.187631 32249 sched.cpp:357] Successfully authenticated with master master@67.195.81.187:40696 I1029 08:25:26.187659 32249 sched.cpp:476] Sending registration request to master@67.195.81.187:40696 I1029 08:25:27.028445 32251 hierarchical_allocator_process.hpp:697] No resources available to allocate! I1029 08:25:28.045682 32251 hierarchical_allocator_process.hpp:659] Performed allocation for 0 slaves in 1.017231941secs I1029 08:25:28.045760 32249 sched.cpp:476] Sending registration request to master@67.195.81.187:40696 I1029 08:25:28.045900 32253 master.cpp:1499] Received re-registration request from framework 20141029-082526-3142697795-40696-32232-0000 (default) at scheduler-9ba6b803-40b4-48b9-bcef-45a329f6b2a4@67.195.81.187:40696 I1029 08:25:28.045989 32253 master.cpp:1321] Authorizing framework principal 'test-principal' to receive offers for role '*' I1029 08:25:28.046455 32253 master.cpp:1499] Received re-registration request from framework 20141029-082526-3142697795-40696-32232-0000 (default) at scheduler-9ba6b803-40b4-48b9-bcef-45a329f6b2a4@67.195.81.187:40696 I1029 08:25:28.046529 32253 master.cpp:1321] Authorizing framework principal 'test-principal' to receive offers for role '*' I1029 08:25:28.050155 32247 sched.cpp:233] New master detected at master@67.195.81.187:40696 I1029 08:25:28.050217 32247 sched.cpp:283] Authenticating with master master@67.195.81.187:40696 I1029 08:25:28.050405 32252 master.cpp:1552] Re-registering framework 20141029-082526-3142697795-40696-32232-0000 (default) at scheduler-9ba6b803-40b4-48b9-bcef-45a329f6b2a4@67.195.81.187:40696 I1029 08:25:28.050509 32253 authenticatee.hpp:133] Creating new client SASL connection I1029 08:25:28.050566 32252 master.cpp:1592] Allowing framework 20141029-082526-3142697795-40696-32232-0000 (default) at scheduler-9ba6b803-40b4-48b9-bcef-45a329f6b2a4@67.195.81.187:40696 to re-register with an already used id I1029 08:25:28.051084 32257 sched.cpp:449] Framework re-registered with 20141029-082526-3142697795-40696-32232-0000 I1029 08:25:28.051151 32252 master.cpp:3853] Authenticating scheduler-9ba6b803-40b4-48b9-bcef-45a329f6b2a4@67.195.81.187:40696 I1029 08:25:28.051167 32257 sched.cpp:463] Scheduler::reregistered took 52801ns I1029 08:25:28.051723 32261 authenticator.hpp:161] Creating new server SASL connection I1029 08:25:28.052042 32249 authenticatee.hpp:224] Received SASL authentication mechanisms: CRAM-MD5 I1029 08:25:28.052077 32249 authenticatee.hpp:250] Attempting to authenticate with mechanism 'CRAM-MD5' I1029 08:25:28.052170 32249 master.cpp:1534] Dropping re-registration request of framework 20141029-082526-3142697795-40696-32232-0000 (default) at scheduler-9ba6b803-40b4-48b9-bcef-45a329f6b2a4@67.195.81.187:40696 because new authentication attempt is in progress I1029 08:25:28.052218 32257 authenticator.hpp:267] Received SASL authentication start I1029 08:25:28.052325 32257 authenticator.hpp:389] Authentication requires more steps I1029 08:25:28.052428 32257 authenticatee.hpp:270] Received SASL authentication step I1029 08:25:28.052641 32246 authenticator.hpp:295] Received SASL authentication step I1029 08:25:28.052685 32246 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'pomona.apache.org' server FQDN: 'pomona.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I1029 08:25:28.052701 32246 auxprop.cpp:153] Looking up auxiliary property '*userPassword' I1029 08:25:28.052739 32246 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I1029 08:25:28.052767 32246 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'pomona.apache.org' server FQDN: 'pomona.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I1029 08:25:28.052779 32246 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I1029 08:25:28.052788 32246 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I1029 08:25:28.052804 32246 authenticator.hpp:381] Authentication success I1029 08:25:28.052947 32252 authenticatee.hpp:310] Authentication success I1029 08:25:28.053020 32246 master.cpp:3893] Successfully authenticated principal 'test-principal' at scheduler-9ba6b803-40b4-48b9-bcef-45a329f6b2a4@67.195.81.187:40696 I1029 08:25:28.053462 32247 sched.cpp:357] Successfully authenticated with master master@67.195.81.187:40696 I1029 08:25:29.046855 32261 hierarchical_allocator_process.hpp:697] No resources available to allocate! I1029 08:25:29.046880 32261 hierarchical_allocator_process.hpp:659] Performed allocation for 0 slaves in 35632ns I1029 08:25:30.047458 32253 hierarchical_allocator_process.hpp:697] No resources available to allocate! I1029 08:25:30.047487 32253 hierarchical_allocator_process.hpp:659] Performed allocation for 0 slaves in 43031ns I1029 08:25:31.028373 32261 master.cpp:120] No whitelist given. Advertising offers for all slaves I1029 08:25:31.048673 32249 hierarchical_allocator_process.hpp:697] No resources available to allocate! I1029 08:25:31.048702 32249 hierarchical_allocator_process.hpp:659] Performed allocation for 0 slaves in 44769ns I1029 08:25:32.049576 32259 hierarchical_allocator_process.hpp:697] No resources available to allocate! I1029 08:25:32.049604 32259 hierarchical_allocator_process.hpp:659] Performed allocation for 0 slaves in 51919ns I1029 08:25:33.050864 32249 hierarchical_allocator_process.hpp:697] No resources available to allocate! I1029 08:25:33.050896 32249 hierarchical_allocator_process.hpp:659] Performed allocation for 0 slaves in 38019ns I1029 08:25:34.051961 32251 hierarchical_allocator_process.hpp:697] No resources available to allocate! I1029 08:25:34.051993 32251 hierarchical_allocator_process.hpp:659] Performed allocation for 0 slaves in 64619ns I1029 08:25:35.052196 32249 hierarchical_allocator_process.hpp:697] No resources available to allocate! I1029 08:25:35.052223 32249 hierarchical_allocator_process.hpp:659] Performed allocation for 0 slaves in 34475ns I1029 08:25:36.029101 32259 master.cpp:120] No whitelist given. Advertising offers for all slaves I1029 08:25:36.053067 32249 hierarchical_allocator_process.hpp:697] No resources available to allocate! I1029 08:25:36.053095 32249 hierarchical_allocator_process.hpp:659] Performed allocation for 0 slaves in 38354ns I1029 08:25:37.053506 32259 hierarchical_allocator_process.hpp:697] No resources available to allocate! I1029 08:25:37.053536 32259 hierarchical_allocator_process.hpp:659] Performed allocation for 0 slaves in 38249ns tests/master_authorization_tests.cpp:877: Failure Failed to wait 10secs for frameworkReregisteredMessage I1029 08:25:38.053241 32259 master.cpp:768] Framework 2014...",2 MESOS-2017,"Segfault with ""Pure virtual method called"" when tests fail","The most recent one: {noformat:title=DRFAllocatorTest.DRFAllocatorProcess} [ RUN ] DRFAllocatorTest.DRFAllocatorProcess Using temporary directory '/tmp/DRFAllocatorTest_DRFAllocatorProcess_BI905j' I1030 05:55:06.934813 24459 leveldb.cpp:176] Opened db in 3.175202ms I1030 05:55:06.935925 24459 leveldb.cpp:183] Compacted db in 1.077924ms I1030 05:55:06.935976 24459 leveldb.cpp:198] Created db iterator in 16460ns I1030 05:55:06.935995 24459 leveldb.cpp:204] Seeked to beginning of db in 2018ns I1030 05:55:06.936005 24459 leveldb.cpp:273] Iterated through 0 keys in the db in 335ns I1030 05:55:06.936039 24459 replica.cpp:741] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I1030 05:55:06.936705 24480 recover.cpp:437] Starting replica recovery I1030 05:55:06.937023 24480 recover.cpp:463] Replica is in EMPTY status I1030 05:55:06.938158 24475 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I1030 05:55:06.938859 24482 recover.cpp:188] Received a recover response from a replica in EMPTY status I1030 05:55:06.939486 24474 recover.cpp:554] Updating replica status to STARTING I1030 05:55:06.940249 24489 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 591981ns I1030 05:55:06.940274 24489 replica.cpp:320] Persisted replica status to STARTING I1030 05:55:06.940752 24481 recover.cpp:463] Replica is in STARTING status I1030 05:55:06.940820 24489 master.cpp:312] Master 20141030-055506-3142697795-40429-24459 (pomona.apache.org) started on 67.195.81.187:40429 I1030 05:55:06.940871 24489 master.cpp:358] Master only allowing authenticated frameworks to register I1030 05:55:06.940891 24489 master.cpp:363] Master only allowing authenticated slaves to register I1030 05:55:06.940908 24489 credentials.hpp:36] Loading credentials for authentication from '/tmp/DRFAllocatorTest_DRFAllocatorProcess_BI905j/credentials' I1030 05:55:06.941215 24489 master.cpp:392] Authorization enabled I1030 05:55:06.941751 24475 master.cpp:120] No whitelist given. Advertising offers for all slaves I1030 05:55:06.942227 24474 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I1030 05:55:06.942401 24476 hierarchical_allocator_process.hpp:299] Initializing hierarchical allocator process with master : master@67.195.81.187:40429 I1030 05:55:06.942895 24483 recover.cpp:188] Received a recover response from a replica in STARTING status I1030 05:55:06.943035 24474 master.cpp:1242] The newly elected leader is master@67.195.81.187:40429 with id 20141030-055506-3142697795-40429-24459 I1030 05:55:06.943063 24474 master.cpp:1255] Elected as the leading master! I1030 05:55:06.943079 24474 master.cpp:1073] Recovering from registrar I1030 05:55:06.943313 24480 registrar.cpp:313] Recovering registrar I1030 05:55:06.943455 24475 recover.cpp:554] Updating replica status to VOTING I1030 05:55:06.944144 24474 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 536365ns I1030 05:55:06.944172 24474 replica.cpp:320] Persisted replica status to VOTING I1030 05:55:06.944355 24489 recover.cpp:568] Successfully joined the Paxos group I1030 05:55:06.944576 24489 recover.cpp:452] Recover process terminated I1030 05:55:06.945155 24486 log.cpp:656] Attempting to start the writer I1030 05:55:06.947013 24473 replica.cpp:474] Replica received implicit promise request with proposal 1 I1030 05:55:06.947854 24473 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 806463ns I1030 05:55:06.947883 24473 replica.cpp:342] Persisted promised to 1 I1030 05:55:06.948547 24481 coordinator.cpp:230] Coordinator attemping to fill missing position I1030 05:55:06.950269 24479 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2 I1030 05:55:06.950933 24479 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 603843ns I1030 05:55:06.950961 24479 replica.cpp:676] Persisted action at 0 I1030 05:55:06.952180 24476 replica.cpp:508] Replica received write request for position 0 I1030 05:55:06.952239 24476 leveldb.cpp:438] Reading position from leveldb took 28437ns I1030 05:55:06.952896 24476 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 623980ns I1030 05:55:06.952926 24476 replica.cpp:676] Persisted action at 0 I1030 05:55:06.953543 24485 replica.cpp:655] Replica received learned notice for position 0 I1030 05:55:06.954082 24485 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 511807ns I1030 05:55:06.954107 24485 replica.cpp:676] Persisted action at 0 I1030 05:55:06.954128 24485 replica.cpp:661] Replica learned NOP action at position 0 I1030 05:55:06.954710 24473 log.cpp:672] Writer started with ending position 0 I1030 05:55:06.956215 24478 leveldb.cpp:438] Reading position from leveldb took 33085ns I1030 05:55:06.959481 24475 registrar.cpp:346] Successfully fetched the registry (0B) in 16.11904ms I1030 05:55:06.959616 24475 registrar.cpp:445] Applied 1 operations in 28239ns; attempting to update the 'registry' I1030 05:55:06.962514 24487 log.cpp:680] Attempting to append 139 bytes to the log I1030 05:55:06.962646 24474 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 1 I1030 05:55:06.964146 24486 replica.cpp:508] Replica received write request for position 1 I1030 05:55:06.964962 24486 leveldb.cpp:343] Persisting action (158 bytes) to leveldb took 743389ns I1030 05:55:06.964993 24486 replica.cpp:676] Persisted action at 1 I1030 05:55:06.965895 24473 replica.cpp:655] Replica received learned notice for position 1 I1030 05:55:06.966531 24473 leveldb.cpp:343] Persisting action (160 bytes) to leveldb took 607242ns I1030 05:55:06.966555 24473 replica.cpp:676] Persisted action at 1 I1030 05:55:06.966578 24473 replica.cpp:661] Replica learned APPEND action at position 1 I1030 05:55:06.967706 24481 registrar.cpp:490] Successfully updated the 'registry' in 8.036096ms I1030 05:55:06.967895 24481 registrar.cpp:376] Successfully recovered registrar I1030 05:55:06.967993 24482 log.cpp:699] Attempting to truncate the log to 1 I1030 05:55:06.968258 24479 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 2 I1030 05:55:06.968268 24475 master.cpp:1100] Recovered 0 slaves from the Registry (101B) ; allowing 10mins for slaves to re-register I1030 05:55:06.969156 24476 replica.cpp:508] Replica received write request for position 2 I1030 05:55:06.969678 24476 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 491913ns I1030 05:55:06.969703 24476 replica.cpp:676] Persisted action at 2 I1030 05:55:06.970459 24478 replica.cpp:655] Replica received learned notice for position 2 I1030 05:55:06.971060 24478 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 573076ns I1030 05:55:06.971124 24478 leveldb.cpp:401] Deleting ~1 keys from leveldb took 35339ns I1030 05:55:06.971145 24478 replica.cpp:676] Persisted action at 2 I1030 05:55:06.971168 24478 replica.cpp:661] Replica learned TRUNCATE action at position 2 I1030 05:55:06.980211 24459 containerizer.cpp:100] Using isolation: posix/cpu,posix/mem I1030 05:55:06.984153 24473 slave.cpp:169] Slave started on 203)@67.195.81.187:40429 I1030 05:55:07.055308 24473 credentials.hpp:84] Loading credential for authentication from '/tmp/DRFAllocatorTest_DRFAllocatorProcess_wULx31/credential' I1030 05:55:06.988750 24459 sched.cpp:137] Version: 0.21.0 I1030 05:55:07.055521 24473 slave.cpp:276] Slave using credential for: test-principal I1030 05:55:07.055726 24473 slave.cpp:289] Slave resources: cpus(*):2; mem(*):1024; disk(*):0; ports(*):[31000-32000] I1030 05:55:07.055865 24473 slave.cpp:318] Slave hostname: pomona.apache.org I1030 05:55:07.055881 24473 slave.cpp:319] Slave checkpoint: false W1030 05:55:07.055889 24473 slave.cpp:321] Disabling checkpointing is deprecated and the --checkpoint flag will be removed in a future release. Please avoid using this flag I1030 05:55:07.056172 24485 sched.cpp:233] New master detected at master@67.195.81.187:40429 I1030 05:55:07.056222 24485 sched.cpp:283] Authenticating with master master@67.195.81.187:40429 I1030 05:55:07.056717 24485 state.cpp:33] Recovering state from '/tmp/DRFAllocatorTest_DRFAllocatorProcess_wULx31/meta' I1030 05:55:07.056851 24475 authenticatee.hpp:133] Creating new client SASL connection I1030 05:55:07.057003 24473 status_update_manager.cpp:197] Recovering status update manager I1030 05:55:07.057252 24488 master.cpp:3853] Authenticating scheduler-c98e7aac-d03f-464a-aa75-61208600e196@67.195.81.187:40429 I1030 05:55:07.057502 24489 containerizer.cpp:281] Recovering containerizer I1030 05:55:07.057524 24475 authenticator.hpp:161] Creating new server SASL connection I1030 05:55:07.057688 24475 authenticatee.hpp:224] Received SASL authentication mechanisms: CRAM-MD5 I1030 05:55:07.057719 24475 authenticatee.hpp:250] Attempting to authenticate with mechanism 'CRAM-MD5' I1030 05:55:07.057919 24481 authenticator.hpp:267] Received SASL authentication start I1030 05:55:07.057968 24481 authenticator.hpp:389] Authentication requires more steps I1030 05:55:07.058070 24473 authenticatee.hpp:270] Received SASL authentication step I1030 05:55:07.058199 24485 authenticator.hpp:295] Received SASL authentication step I1030 05:55:07.058223 24485 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'pomona.apache.org' server FQDN: 'pomona.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I1030 05:55:07.058233 24485 auxprop.cpp:153] Looking up auxiliary property '*userPassword' I1030 05:55:07.058259 24485 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I1030 05:55:07.058290 24485 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'pomona.apache.org' server FQDN: 'pomona.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I1030 05:55:07.058302 24485 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I1030 05:55:07.058307 24485 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I1030 05:55:07.058320 24485 authenticator.hpp:381] Authentication success I1030 05:55:07.058467 24480 master.cpp:3893] Successfully authenticated principal 'test-principal' at scheduler-c98e7aac-d03f-464a-aa75-61208600e196@67.195.81.187:40429 I1030 05:55:07.058493 24485 slave.cpp:3456] Finished recovery I1030 05:55:07.058593 24478 authenticatee.hpp:310] Authentication success I1030 05:55:07.058838 24478 sched.cpp:357] Successfully authenticated with master master@67.195.81.187:40429 I1030 05:55:07.058861 24478 sched.cpp:476] Sending registration request to master@67.195.81.187:40429 I1030 05:55:07.058969 24475 slave.cpp:602] New master detected at master@67.195.81.187:40429 I1030 05:55:07.058969 24487 status_update_manager.cpp:171] Pausing sending status updates I1030 05:55:07.059026 24475 slave.cpp:665] Authenticating with master master@67.195.81.187:40429 I1030 05:55:07.059061 24481 master.cpp:1362] Received registration request for framework 'framework1' at scheduler-c98e7aac-d03f-464a-aa75-61208600e196@67.195.81.187:40429 I1030 05:55:07.059131 24481 master.cpp:1321] Authorizing framework principal 'test-principal' to receive offers for role 'role1' I1030 05:55:07.059171 24475 slave.cpp:638] Detecting new master I1030 05:55:07.059214 24482 authenticatee.hpp:133] Creating new client SASL connection I1030 05:55:07.059550 24481 master.cpp:3853] Authenticating slave(203)@67.195.81.187:40429 I1030 05:55:07.059787 24487 authenticator.hpp:161] Creating new server SASL connection I1030 05:55:07.059922 24481 master.cpp:1426] Registering framework 20141030-055506-3142697795-40429-24459-0000 (framework1) at scheduler-c98e7aac-d03f-464a-aa75-61208600e196@67.195.81.187:40429 I1030 05:55:07.059996 24474 authenticatee.hpp:224] Received SASL authentication mechanisms: CRAM-MD5 I1030 05:55:07.060034 24474 authenticatee.hpp:250] Attempting to authenticate with mechanism 'CRAM-MD5' I1030 05:55:07.060117 24474 authenticator.hpp:267] Received SASL authentication start I1030 05:55:07.060165 24474 authenticator.hpp:389] Authentication requires more steps I1030 05:55:07.060377 24476 hierarchical_allocator_process.hpp:329] Added framework 20141030-055506-3142697795-40429-24459-0000 I1030 05:55:07.060394 24488 sched.cpp:407] Framework registered with 20141030-055506-3142697795-40429-24459-0000 I1030 05:55:07.060403 24476 hierarchical_allocator_process.hpp:697] No resources available to allocate! I1030 05:55:07.060431 24476 hierarchical_allocator_process.hpp:659] Performed allocation for 0 slaves in 29857ns I1030 05:55:07.060443 24488 sched.cpp:421] Scheduler::registered took 19407ns I1030 05:55:07.060545 24478 authenticatee.hpp:270] Received SASL authentication step I1030 05:55:07.060645 24478 authenticator.hpp:295] Received SASL authentication step I1030 05:55:07.060673 24478 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'pomona.apache.org' server FQDN: 'pomona.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I1030 05:55:07.060685 24478 auxprop.cpp:153] Looking up auxiliary property '*userPassword' I1030 05:55:07.060714 24478 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I1030 05:55:07.060740 24478 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'pomona.apache.org' server FQDN: 'pomona.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I1030 05:55:07.060760 24478 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I1030 05:55:07.060770 24478 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I1030 05:55:07.060788 24478 authenticator.hpp:381] Authentication success I1030 05:55:07.060920 24474 authenticatee.hpp:310] Authentication success I1030 05:55:07.060945 24485 master.cpp:3893] Successfully authenticated principal 'test-principal' at slave(203)@67.195.81.187:40429 I1030 05:55:07.061388 24489 slave.cpp:722] Successfully authenticated with master master@67.195.81.187:40429 I1030 05:55:07.061504 24489 slave.cpp:1050] Will retry registration in 4.778336ms if necessary I1030 05:55:07.061718 24480 master.cpp:3032] Registering slave at slave(203)@67.195.81.187:40429 (pomona.apache.org) with id 20141030-055506-3142697795-40429-24459-S0 I1030 05:55:07.062119 24489 registrar.cpp:445] Applied 1 operations in 53691ns; attempting to update the 'registry' I1030 05:55:07.065182 24479 log.cpp:680] Attempting to append 316 bytes to the log I1030 05:55:07.065337 24487 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 3 I1030 05:55:07.066359 24474 replica.cpp:508] Replica received write request for position 3 I1030 05:55:07.066643 24474 leveldb.cpp:343] Persisting action (335 bytes) to leveldb took 249579ns I1030 05:55:07.066671 24474 replica.cpp:676] Persisted action at 3 I../../src/tests/allocator_tests.cpp:120: Failure Failed to wait 10secs for offers1 1030 05:55:07.067101 24477 slave.cpp:1050] Will retry registration in 24.08243ms if necessary I1030 05:55:07.067140 24473 master.cpp:3020] Ignoring register slave message from slave(203)@67.195.81.187:40429 (pomona.apache.org) as admission is already in progress I1030 05:55:07.067395 24488 replica.cpp:655] Replica received learned notice for position 3 I1030 05:55:07.943416 24478 hierarchical_allocator_process.hpp:697] No resources available to allocate! I1030 05:55:19.804687 24478 hierarchical_allocator_process.hpp:659] Performed allocation for 0 slaves in 11.861261123secs I1030 05:55:11.942713 24474 master.cpp:120] No whitelist given. Advertising offers for all slaves I1030 05:55:19.805850 24488 leveldb.cpp:343] Persisting action (337 bytes) to leveldb took 1.067224ms I1030 05:55:19.806012 24488 replica.cpp:676] Persisted action at 3 ../../src/tests/allocator_tests.cpp:115: Failure Actual function call count doesn't match EXPECT_CALL(sched1, resourceOffers(_, _))... Expected: to be called once Actual: never called - unsatisfied and active I1030 05:55:19.806144 24488 replica.cpp:661] Replica learned APPEND action at position 3 I1030 05:55:19.806695 24473 master.cpp:768] Framework 20141030-055506-3142697795-40429-24459-0000 (framework1) at scheduler-c98e7aac-d03f-464a-aa75-61208600e196@67.195.81.187:40429 disconnected I1030 05:55:19.806726 24473 master.cpp:1731] Disconnecting framework 20141030-055506-3142697795-40429-24459-0000 (framework1) at scheduler-c98e7aac-d03f-464a-aa75-61208600e196@67.195.81.187:40429 I1030 05:55:19.806751 24473 master.cpp:1747] Deactivating framework 20141030-055506-3142697795-40429-24459-0000 (framework1) at scheduler-c98e7aac-d03f-464a-aa75-61208600e196@67.195.81.187:40429 I1030 05:55:19.806967 24473 master.cpp:790] Giving framework 20141030-055506-3142697795-40429-24459-0000 (framework1) at scheduler-c98e7aac-d03f-464a-aa75-61208600e196@67.195.81.187:40429 0ns to failover ../../src/tests/allocator_tests.cpp:94: Failure Actual function call count doesn't match EXPECT_CALL(allocator, slaveAdded(_, _, _))... Expected: to be called once Actual: never called - unsatisfied and active F1030 05:55:19.806967 24480 logging.cpp:57] RAW: Pure virtual method called I1030 05:55:19.807348 24488 master.cpp:3665] Framework failover timeout, removing framework 20141030-055506-3142697795-40429-24459-0000 (framework1) at scheduler-c98e7aac-d03f-464a-aa75-61208600e196@67.195.81.187:40429 I1030 05:55:19.807370 24488 master.cpp:4201] Removing framework 20141030-055506-3142697795-40429-24459-0000 (framework1) at scheduler-c98e7aac-d03f-464a-aa75-61208600e196@67.195.81.187:40429 *** Aborted at 1414648519 (unix time) try ""date -d @1414648519"" if you are using GNU date *** PC: @ 0x91bc86 process::PID<>::PID() *** SIGSEGV (@0x0) received by PID 24459 (TID 0x2b86c919a700) from PID 0; stack trace: *** I1030 05:55:19.808631 24489 registrar.cpp:490] Successfully updated the 'registry' in 12.746377984secs @ 0x2b86c55fc340 (unknown) I1030 05:55:19.808938 24473 log.cpp:699] Attempting to truncate the log to 3 @ 0x2b86c3327174 google::LogMessage::Fail() I1030 05:55:19.809084 24481 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 4 @ 0x91bc86 process::PID<>::PID() @ 0x2b86c332c868 google::RawLog__() I1030 05:55:19.810191 24479 replica.cpp:508] Replica received write request for position 4 I1030 05:55:19.810899 24479 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 678090ns I1030 05:55:19.810919 24479 replica.cpp:676] Persisted action at 4 @ 0x91bf24 process::Process<>::self() I1030 05:55:19.811635 24485 replica.cpp:655] Replica received learned notice for position 4 I1030 05:55:19.812180 24485 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 523927ns I1030 05:55:19.812228 24485 leveldb.cpp:401] Deleting ~2 keys from leveldb took 29523ns I1030 05:55:19.812242 24485 replica.cpp:676] Persisted action at 4 I @ 0x2b86c29d2a36 __cxa_pure_virtual 1030 05:55:19.812258 24485 replica.cpp:661] Replica learned TRUNCATE action at position 4 @ 0x1046936 testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith() I1030 05:55:19.829655 24474 slave.cpp:1050] Will retry registration in 31.785967ms if necessary @ 0x9c0633 testing::internal::FunctionMockerBase<>::InvokeWith() @ 0x9b6152 testing::internal::FunctionMocker<>::Invoke() @ 0x9abdeb mesos::internal::tests::MockAllocatorPr...",5 MESOS-2029,"Allow slave to checkpoint resources.","The checkpointed resources are independent of the slave lifecycle. In other words, even if the slave host reboots, it'll still recover the checkpointed resources (unlike other checkpointed data). The slave needs to verify during startup that the checkpointed resources are compatible with the resources of the slave (specified using --resources flag).",5 MESOS-2030,"Maintain persistent disk resources in master memory.","Maintain an in-memory data structure to track persistent disk resources on each slave. Update this data structure when slaves register/re-register/disconnect, etc.",3 MESOS-2031,"Manage persistent directories on slave.","Whenever a slave sees a persistent disk resource (in ExecutorInfo or TaskInfo) that is new to it, it will create a persistent directory which is for tasks to store persistent data. The slave needs to do the following after it's created: 1) symlink into the executor sandbox so that tasks/executor can see it 2) garbage collect it once it is released by the framework",5 MESOS-2032,"Update Maintenance design to account for persistent resources.","With persistent resources and dynamic reservations, frameworks need to know how long the resources will be unavailable for maintenance operations. This is because for persistent resources, the framework needs to understand how long the persistent resource will be unavailable. For example, if there will be a 10 minute reboot for a kernel upgrade, the framework will not want to re-replicate all of it's persistent data on the machine. Rather, tolerating one unavailable replica for the maintenance window would be preferred. I'd like to do a revisit of the design to ensure it works well for persistent resources as well.",13 MESOS-2033,"Documentation for isolator filesystem/shared.",NULL,1 MESOS-2034,"Documentation for isolator namespaces/pid.",NULL,1 MESOS-2035,"Add reason to containerizer proto Termination","When an isolator kills a task, the reason is unknown. As part of MESOS-1830, the reason is set to a general one but ideally we would have the termination reason to pass through to the status update.",5 MESOS-2043,"Framework auth fail with timeout error and never get authenticated","I'm facing this issue in master as of https://github.com/apache/mesos/commit/74ea59e144d131814c66972fb0cc14784d3503d4 As [~adam-mesos] mentioned in IRC, this sounds similar to MESOS-1866. I'm running 1 master and 1 scheduler (aurora). The framework authentication fail due to time out: error on mesos master: {code} I1104 19:37:17.741449 8329 master.cpp:3874] Authenticating scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083 I1104 19:37:17.741585 8329 master.cpp:3885] Using default CRAM-MD5 authenticator I1104 19:37:17.742106 8336 authenticator.hpp:169] Creating new server SASL connection W1104 19:37:22.742959 8329 master.cpp:3953] Authentication timed out W1104 19:37:22.743548 8329 master.cpp:3930] Failed to authenticate scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083: Authentication discarded {code} scheduler error: {code} I1104 19:38:57.885486 49012 sched.cpp:283] Authenticating with master master@MASTER_IP:PORT I1104 19:38:57.885928 49002 authenticatee.hpp:133] Creating new client SASL connection I1104 19:38:57.890581 49007 authenticatee.hpp:224] Received SASL authentication mechanisms: CRAM-MD5 I1104 19:38:57.890656 49007 authenticatee.hpp:250] Attempting to authenticate with mechanism 'CRAM-MD5' W1104 19:39:02.891196 49005 sched.cpp:378] Authentication timed out I1104 19:39:02.891850 49018 sched.cpp:338] Failed to authenticate with master master@MASTER_IP:PORT: Authentication discarded {code} Looks like 2 instances {{scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94}} & {{scheduler-d2d4437b-d375-4467-a583-362152fe065a}} of same framework is trying to authenticate and fail. {code} W1104 19:36:30.769420 8319 master.cpp:3930] Failed to authenticate scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94@SCHEDULER_IP:8083: Failed to communicate with authenticatee I1104 19:36:42.701441 8328 master.cpp:3860] Queuing up authentication request from scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083 because authentication is still in progress {code} Restarting master and scheduler didn't fix it. This particular issue happen with 1 master and 1 scheduler after MESOS-1866 is fixed.",5 MESOS-2044,"Use one IP address per container for network isolation","If there are enough IP addresses, either IPv4 or IPv6, we should use one IP address per container, instead of the ugly port range based solution. One problem with this is the IP address management, usually it is managed by a DHCP server, maybe we need to manage them in mesos master/slave. Also, maybe use macvlan instead of veth for better isolation.",40 MESOS-2051,"Pull Metrics struct out of Master and Slave to improve readability",NULL,2 MESOS-2052,"RunState::recover should always recover 'completed'","RunState::recover() will return partial state if it cannot find or open the libprocess pid file. Specifically, it does not recover the 'completed' flag. However, if the slave has removed the executor (because launch failed or the executor failed to register) the sentinel flag will be set and this fact should be recovered. This ensures that container recovery is not attempted later. This was discovered when the LinuxLauncher failed to recover because it was asked to recover two containers with the same forkedPid. Investigation showed the executors both OOM'ed before registering, i.e., no libprocess pid file was present. However, the containerizer had detected the OOM, destroyed the container, and notified the slave which cleaned everything up: failing the task and calling removeExecutor (which writes the completed sentinel file.)",1 MESOS-2055,"MesosContainerizerExecuteTest.IoRedirection test is flaky","Observed this on ASF CI: {code} [ RUN ] MesosContainerizerExecuteTest.IoRedirection Using temporary directory '/tmp/MesosContainerizerExecuteTest_IoRedirection_PbBn8a' I1108 00:34:25.820514 30391 containerizer.cpp:100] Using isolation: posix/cpu,posix/mem I1108 00:34:25.821048 30411 containerizer.cpp:424] Starting container 'test_container' for executor 'executor' of framework '' I1108 00:34:25.824015 30411 launcher.cpp:137] Forked child with pid '4221' for container 'test_container' I1108 00:34:25.825438 30408 containerizer.cpp:571] Fetching URIs for container 'test_container' using command '/home/jenkins/jenkins-slave/workspace/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/build/src/mesos-fetcher' I1108 00:34:25.984254 30419 containerizer.cpp:1117] Executor for container 'test_container' has exited I1108 00:34:25.984341 30419 containerizer.cpp:946] Destroying container 'test_container' ../../src/tests/containerizer_tests.cpp:487: Failure Value of: (os::read(path::join(directory, ""stderr""))).get() Actual: ""I1108 00:34:25.872990 4224 logging.cpp:177] Logging to STDERR\nthis is stderr\n"" Expected: errMsg + ""\n"" Which is: ""this is stderr\n"" [ FAILED ] MesosContainerizerExecuteTest.IoRedirection (185 ms) [----------] 1 test from MesosContainerizerExecuteTest (185 ms total) {code}",1 MESOS-2056,"Refactor fetcher code in preparation for fetcher cache","Refactor/rearrange fetcher-related code so that cache functionality can be dropped in. One could do both together in one go. This is splitting up reviews into smaller chunks. It will not immediately be obvious how this change will be used later, but it will look better-factored and still do the exact same thing as before. In particular, a download routine to be reused several times in launcher/fetcher will be factored out and the remainder of fetcher-related code can be moved from the containerizer realm into fetcher.cpp.",1 MESOS-2057,"Concurrency control for fetcher cache","Having added a URI flag to CommandInfo messages (in MESOS-2069) that indicates caching, caching files downloaded by the fetcher in a repository, now ensure that when a URI is ""cached"", it is only ever downloaded once for the same user on the same slave as long as the slave keeps running. This even holds if multiple tasks request the same URI concurrently. If multiple requests for the same URI occur, perform only one of them and reuse the result. Make concurrent requests for the same URI wait for the one download. Different URIs from different CommandInfos can be downloaded concurrently. No cache eviction, cleanup or failover will be handled for now. Additional tickets will be filed for these enhancements. (So don't use this feature in production until the whole epic is complete.) Note that implementing this does not suffice for production use. This ticket contains the main part of the fetcher logic, though. See the epic MESOS-336 for the rest of the features that lead to a fully functional fetcher cache. The proposed general approach is to keep all bookkeeping about what is in which stage of being fetched and where it resides in the slave's MesosContainerizerProcess, so that all concurrent access is disambiguated and controlled by an ""actor"" (aka libprocess ""process""). Depends on MESOS-2056 and MESOS-2069. ",8 MESOS-2058,"Deprecate stats.json endpoints for Master and Slave","With the introduction of the libprocess {{/metrics/snapshot}} endpoint, metrics are now duplicated in the Master and Slave between this and {{stats.json}}. We should deprecate the {{stats.json}} endpoints. Manual inspection of {{stats.json}} shows that all metrics are now covered by the new endpoint for Master and Slave.",1 MESOS-2061,"Add InverseOffer protobuf message.","InverseOffer was defined as part of the maintenance work in MESOS-1474, design doc here: https://docs.google.com/document/d/16k0lVwpSGVOyxPSyXKmGC-gbNmRlisNEe4p-fAUSojk/edit?usp=sharing {code} /** * A request to return some resources occupied by a framework. */ message InverseOffer { required OfferID id = 1; required FrameworkID framework_id = 2; // A list of resources being requested back from the framework. repeated Resource resources = 3; // Specified if the resources need to be released from a particular slave. optional SlaveID slave_id = 4; // The resources in this InverseOffer are part of a planned maintenance // schedule in the specified window. Any tasks running using these // resources may be killed when the window arrives. optional Interval unavailability = 5; } {code} This ticket is to capture the addition of the InverseOffer protobuf to mesos.proto, the necessary API changes for Event/Call and the language bindings will be tracked separately.",3 MESOS-2062,"Add InverseOffer to Event/Call API.","The initial use case for InverseOffer in the framework API will be the maintenance primitives in mesos: MESOS-1474. One way to add this is to tack it on to the OFFERS Event: {code} message Offers { repeated Offer offers = 1; repeated InverseOffer inverse_offers = 2; } {code}",3 MESOS-2063,"Add InverseOffer to C++ Scheduler API.","The initial use case for InverseOffer in the framework API will be the maintenance primitives in mesos: MESOS-1474. One way to add these to the C++ Scheduler API is to add a new callback: {code} virtual void inverseResourceOffers( SchedulerDriver* driver, const std::vector& inverseOffers) = 0; {code} libmesos compatibility will need to be figured out here. We may want to leave the C++ binding untouched in favor of Event/Call, in order to not break API compatibility for schedulers.",5 MESOS-2064,"Add InverseOffer to Java Scheduler API.","The initial use case for InverseOffer in the framework API will be the maintenance primitives in mesos: MESOS-1474. One way to add these to the Java Scheduler API is to add a new callback: {code} void inverseResourceOffers( SchedulerDriver driver, List inverseOffers); {code} JAR / libmesos compatibility will need to be figured out here. We may want to leave the Java binding untouched in favor of Event/Call, in order to not break API compatibility for schedulers.",5 MESOS-2065,"Add InverseOffer to Python Scheduler API.","The initial use case for InverseOffer in the framework API will be the maintenance primitives in mesos: MESOS-1474. One way to add these to the Python Scheduler API is to add a new callback: {code} def inverseResourceOffers(self, driver, inverse_offers): {code} Egg / libmesos compatibility will need to be figured out here. We may want to leave the Python binding untouched in favor of Event/Call, in order to not break API compatibility for schedulers.",5 MESOS-2066,"Add optional 'Unavailability' to resource offers to provide maintenance awareness.","In order to inform frameworks about upcoming maintenance on offered resources, per MESOS-1474, we'd like to add an optional 'Unavailability' information to offers: {code} message Interval { optional double start = 1; // Time, in seconds since the Epoch. optional double duration = 2; // Time, in seconds. } message Offer { // Existing fields ... // Signifies that the resources in this Offer are part of a planned // maintenance schedule in the specified window. Any tasks launched // using these resources may be killed when the window arrives. // This field gives additional information about the maintenance. // The maintenance may not necessarily start at exactly at this interval, // nor last for exactly the duration of this interval. optional Interval unavailability = 9; } {code}",3 MESOS-2067,"Add HTTP API to the master for maintenance operations.","Based on MESOS-1474, we'd like to provide an HTTP API on the master for the maintenance primitives in mesos. For the MVP, we'll want something like this for manipulating the schedule: {code} /maintenance/schedule GET - returns the schedule, which will include the various maintenance windows. POST - create or update the schedule with a JSON blob (see below). /maintenance/status GET - returns a list of machines and their maintenance mode. /maintenance/start POST - Transition a set of machines from Draining into Deactivated mode. /maintenance/stop POST - Transition a set of machines from Deactivated into Normal mode. /maintenance/consensus <- (Not sure what the right name is. matrix? acceptance?) GET - Returns the latest info on which frameworks have accepted or declined the maintenance schedule. {code} (Note: The slashes in URLs might not be supported yet.) A schedule might look like: {code} { ""windows"" : [ { ""machines"" : [ { ""ip"" : ""192.168.0.1"" }, { ""hostname"" : ""localhost"" }, ... ], ""unavailability"" : { ""start"" : 12345, // Epoch seconds. ""duration"" : 1000 // Seconds. } }, ... ] } {code} There should be firewall settings such that only those with access to master can use these endpoints.",8 MESOS-2069,"Basic fetcher cache functionality","Add a flag to CommandInfo URI protobufs that indicates that files downloaded by the fetcher shall be cached in a repository. To be followed by MESOS-2057 for concurrency control. Also see MESOS-336 for the overall goals for the fetcher cache.",8 MESOS-2070,"Implement simple slave recovery behavior for fetcher cache","Clean the fetcher cache completely upon slave restart/recovery. This implements correct, albeit not ideal behavior. More efficient schemes that restore knowledge about cached files or even resume downloads can be added later. ",2 MESOS-2072,"Fetcher cache eviction","Delete files from the fetcher cache so that a given cache size is never exceeded. Succeed in doing so while concurrent downloads are on their way and new requests are pouring in. Idea: measure the size of each download before it begins, make enough room before the download. This means that only download mechanisms that divulge the size before the main download will be supported. AFAWK, those in use so far have this property. The calculation of how much space to free needs to be under concurrency control, accumulating all space needed for competing, incomplete download requests. (The Python script that performs fetcher caching for Aurora does not seem to implement this. See https://gist.github.com/zmanji/f41df77510ef9d00265a, imagine several of these programs running concurrently, each one's _cache_eviction() call succeeding, each perceiving the SAME free space being available.) Ultimately, a conflict resolution strategy is needed if just the downloads underway already exceed the cache capacity. Then, as a fallback, direct download into the work directory will be used for some tasks. TBD how to pick which task gets treated how. At first, only support copying of any downloaded files to the work directory for task execution. This isolates the task life cycle after starting a task from cache eviction considerations. (Later, we can add symbolic links that avoid copying. But then eviction of fetched files used by ongoing tasks must be blocked, which adds complexity. another future extension is MESOS-1667 ""Extract from URI while downloading into work dir""). ",8 MESOS-2074,"Fetcher cache test fixture","To accelerate providing good test coverage for the fetcher cache (MESOS-336), we can provide a framework that canonicalizes creating and running a number of tasks and allows easy parametrization with combinations of the following: - whether to cache or not - whether make what has been downloaded executable or not - whether to extract from an archive or not - whether to download from a file system, http, or... We can create a simple HHTP server in the test fixture to support the latter. Furthermore, the tests need to be robust wrt. varying numbers of StatusUpdate messages. An accumulating update message sink that reports the final state is needed. All this has already been programmed in this patch, just needs to be rebased: https://reviews.apache.org/r/21316/",5 MESOS-2075,"Add maintenance information to the replicated registry.","To achieve fault-tolerance for the maintenance primitives, we will need to add the maintenance information to the registry. The registry currently stores all of the slave information, which is quite large (~ 17MB for 50,000 slaves from my testing), which results in a protobuf object that is extremely expensive to copy. As far as I can tell, reads / writes to maintenance information is independent of reads / writes to the existing 'registry' information. So there are two approach here: h4. Add maintenance information to 'maintenance' key: # The advantage of this approach is that we don't further grow the large Registry object. # This approach assumes that writes to 'maintenance' are independent of writes to the 'registry'. -If these writes are not independent, this approach requires that we add transactional support to the State abstraction.- # -This approach requires adding compaction to LogStorage.- # This approach likely requires some refactoring to the Registrar. h4. Add maintenance information to 'registry' key: (This is the chosen method.) # The advantage of this approach is that it's the easiest to implement. # This will further grow the single 'registry' object, but doesn't preclude it being split apart in the future. # This approach may require using the diff support in LogStorage and/or adding compression support to LogStorage snapshots to deal with the increased size of the registry.",13 MESOS-2076,"Implement maintenance primitives in the Master.","The master will need to do a number of things to implement the maintenance primitives: # For machines that have a maintenance window: #* Disambiguate machines to agents. #* For unused resources, offers must be augmented with an Unavailability. #* For used resources, inverse offers must be sent. # For inverse offers: #* Filter them before sending them again. #* For declined inverse offers, do something with the reason (store or log). # Recover the maintenance information upon failover. Note: Some amount of this logic will need to be placed in the allocator.",13 MESOS-2077,"Ensure that TASK_LOSTs for a hard slave drain (SIGUSR1) include a Reason.","For maintenance, sometimes operators will force the drain of a slave (via SIGUSR1), when deemed safe (e.g. non-critical tasks running) and/or necessary (e.g. bad hardware). To eliminate alerting noise, we'd like to add a 'Reason' that expresses the forced drain of the slave, so that these are not considered to be a generic slave removal TASK_LOST.",3 MESOS-2078,"Scheduler driver may ACK status updates when the scheduler threw an exception","[~vinodkone] discovered that this can happen if the scheduler calls {{SchedulerDriver#stop}} before or while handling {{Scheduler#statusUpdate}}. In src/sched/sched.cpp: The driver invokes {{statusUpdate}} and later checks the {{aborted}} flag to determine whether to send an ACK. {code} void statusUpdate( const UPID& from, const StatusUpdate& update, const UPID& pid) { ... scheduler->statusUpdate(driver, status); VLOG(1) << ""Scheduler::statusUpdate took "" << stopwatch.elapsed(); // Note that we need to look at the volatile 'aborted' here to // so that we don't acknowledge the update if the driver was // aborted during the processing of the update. if (aborted) { VLOG(1) << ""Not sending status update acknowledgment message because "" << ""the driver is aborted!""; return; } ... {code} In src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp: The {{statusUpdate}} implementation checks for an exception and invokes {{driver->abort()}}. {code} void JNIScheduler::statusUpdate(SchedulerDriver* driver, const TaskStatus& status) { jvm->AttachCurrentThread(JNIENV_CAST(&env), NULL); jclass clazz = env->GetObjectClass(jdriver); jfieldID scheduler = env->GetFieldID(clazz, ""scheduler"", ""Lorg/apache/mesos/Scheduler;""); jobject jscheduler = env->GetObjectField(jdriver, scheduler); clazz = env->GetObjectClass(jscheduler); // scheduler.statusUpdate(driver, status); jmethodID statusUpdate = env->GetMethodID(clazz, ""statusUpdate"", ""(Lorg/apache/mesos/SchedulerDriver;"" ""Lorg/apache/mesos/Protos$TaskStatus;)V""); jobject jstatus = convert(env, status); env->ExceptionClear(); env->CallVoidMethod(jscheduler, statusUpdate, jdriver, jstatus); if (env->ExceptionCheck()) { env->ExceptionDescribe(); env->ExceptionClear(); jvm->DetachCurrentThread(); driver->abort(); return; } jvm->DetachCurrentThread(); } {code} In src/sched/sched.cpp: The {{abort()}} implementation exits early if {{status != DRIVER_RUNNING}}, and *does not set the aborted flag*. {code} Status MesosSchedulerDriver::abort() { Lock lock(&mutex); if (status != DRIVER_RUNNING) { return status; } CHECK(process != NULL); // We set the volatile aborted to true here to prevent any further // messages from being processed in the SchedulerProcess. However, // if abort() is called from another thread as the SchedulerProcess, // there may be at most one additional message processed. // TODO(bmahler): Use an atomic boolean. process->aborted = true; // Dispatching here ensures that we still process the outstanding // requests *from* the scheduler, since those do proceed when // aborted is true. dispatch(process, &SchedulerProcess::abort); return status = DRIVER_ABORTED; } {code} As a result, the code will ACK despite an exception being thrown.",3 MESOS-2080,"Add master metrics for maintenance.","We'll need metrics in order to gain visibility into the maintenance functionality. This will also allow operators to add alerting on these metrics, in particular: # Number of scheduled hosts. # Number of active windows. # Number of expired windows. # Number of successful drains. # Number of failed drains. As an example of an alert guideline, we would want to know the number of expired windows as a gauge to ensure that it is not growing excessively. This allows alerting to catch when operators are not properly unscheduling maintenance once it is complete.",3 MESOS-2081,"Add safety constraints for maintenance primitives.","In order to ensure that the maintenance primitives can be used safely by operators, we want to put a few safety mechanisms in place. Some ideas from the [design doc|https://docs.google.com/a/twitter.com/document/d/16k0lVwpSGVOyxPSyXKmGC-gbNmRlisNEe4p-fAUSojk/]: # Prevent bad schedules from being constructed: schedules with more than x% overlap in slaves are rejected. # Prevent bad maintenance from proceeding unchecked: if x% of the slaves are not being unscheduled, or are not re-registering, cancel the schedule. These will likely be configurable via flags.",8 MESOS-2082,"Update the webui to include maintenance information.","The simplest thing here would probably be to include another tab in the header for maintenance information. We could also consider adding maintenance information inline to the slaves table. Depending on how this is done, the maintenance tab could actually be a subset of the slaves table; only those slaves for which there is maintenance information.",5 MESOS-2083,"Add documentation for maintenance primitives.","We should provide some guiding documentation around the upcoming maintenance primitives in Mesos. Specifically, we should ensure that general users, framework developers, and operators understand the notion of maintenance in Mesos. Some guidance and recommendations for the latter two audiences will be necessary.",8 MESOS-2085,"Add support encrypted and non-encrypted communication in parallel for cluster upgrade","During cluster upgrade from non-encrypted to encrypted communication, we need to support an interim where: 1) A master can have connections to both encrypted and non-encrypted slaves 2) A slave that supports encrypted communication connects to a master that has not yet been upgraded. 3) Frameworks are encrypted but the master has not been upgraded yet. 4) Master has been upgraded but frameworks haven't. 5) A slave process has upgraded but running executor processes haven't.",13 MESOS-2097,"Update Resource protobuf with DiskInfo","{noformat} message Resource { required string name = 1; required Value.Type type = 2; optional Value.Scalar scalar = 3; optional Value.Ranges ranges = 4; optional Value.Set set = 5; optional string role = 6 [default = ""*""]; // Used for describing persistent disk resource. message DiskInfo { // A unique identifier for the persistent disk resource. The id // needs to be unique within a role for a slave. required string id = 1; // The volume mapping for the persistent disk resource. required Volume volume = 2; } optional DiskInfo disk = 8; } {noformat}",1 MESOS-2098,"Update task validation to be after task authorization.","So that we can simply the task validation because we no longer need to check with pendingTasks.",3 MESOS-2099,"Support acquiring/releasing resources with DiskInfo in allocator.","The allocator needs to be changed because the resources are changing while we acquiring or releasing persistent disk resources (resources with DiskInfo). For example, when we release a persistent disk resource, we are changing the release with DiskInfo to a resource with the DiskInfo.",8 MESOS-2100,"Implement master to slave protocol for persistent disk resources.","We need to do the following: 1) Slave needs to send persisted resources when registering (or re-registering). 2) Master needs to send total persisted resources to slave by either re-using RunTask/UpdateFrameworkInfo or introduce new type of messages (like UpdateResources).",8 MESOS-2101,"Add the persistent resources release primitive to the framework API","We are thinking about introducing a Release protobuf message which specifies persistent disk resources (w/ DiskInfo) to release. The Release message could be piggybacked on the Launch/Decline message. This probably will overlap with the dynamic reservation work (MESOS-2018).",3 MESOS-2103,"Expose number of processes and threads in a container","The CFS cpu statistics (cpus_nr_throttled, cpus_nr_periods, cpus_throttled_time) are difficult to interpret. 1) nr_throttled is the number of intervals where *any* throttling occurred 2) throttled_time is the aggregate time *across all runnable tasks* (tasks in the Linux sense). For example, in a typical 60 second sampling interval: nr_periods = 600, nr_throttled could be 60, i.e., 10% of intervals, but throttled_time could be much higher than (60/600) * 60 = 6 seconds if there is more than one task that is runnable but throttled. *Each* throttled task contributes to the total throttled time. Small test to demonstrate throttled_time > nr_periods * quota_interval: 5 x {{'openssl speed'}} running with quota=100ms: {noformat} cat cpu.stat && sleep 1 && cat cpu.stat nr_periods 3228 nr_throttled 1276 throttled_time 528843772540 nr_periods 3238 nr_throttled 1286 throttled_time 531668964667 {noformat} All 10 intervals throttled (100%) for total time of 2.8 seconds in 1 second (""more than 100%"" of the time interval) It would be helpful to expose the number of processes and tasks in the container cgroup. This would be at a very coarse granularity but would give some guidance.",2 MESOS-2104,"Correct naming of cgroup memory statistics","mem_rss_bytes is *not* RSS but is the total memory usage (memory.usage_in_bytes) of the cgroup, including file cache etc. Actual RSS is reported as mem_anon_bytes. These, and others, should be consistently named.",3 MESOS-2108,"Add configure flag or environment variable to enable SSL/libevent Socket",NULL,1 MESOS-2110,"Configurable Ping Timeouts","After a series of ping-failures, the master considers the slave lost and calls shutdownSlave, requiring such a slave that reconnects to kill its tasks and re-register as a new slaveId. On the other side, after a similar timeout, the slave will consider the master lost and try to detect a new master. These timeouts are currently hardcoded constants (5 * 15s), which may not be well-suited for all scenarios. - Some clusters may tolerate a longer slave process restart period, and wouldn't want tasks to be killed upon reconnect. - Some clusters may have higher-latency networks (e.g. cross-datacenter, or for volunteer computing efforts), and would like to tolerate longer periods without communication. We should provide flags/mechanisms on the master to control its tolerance for non-communicative slaves, and (less importantly?) on the slave to tolerate missing masters.",8 MESOS-2119,"Add Socket tests","Add more Socket specific tests to get coverage while doing libev to libevent (w and wo SSL) move",5 MESOS-2123,"Document changes in C++ Resources API in CHANGELOG.","With the refactor introduced in MESOS-1974, we need to document those API changes in CHANGELOG. ",2 MESOS-2127,"killTask() should perform reconciliation for unknown tasks.","Currently, {{killTask}} uses its own reconciliation logic, which has diverged from the {{reconcileTasks}} logic. Specifically, when the task is unknown and a non-strict registry is in use, {{killTask}} will not send TASK_LOST whereas {{reconcileTask}} will. We should make these consistent. ",3 MESOS-2128,"Turning on cgroups_limit_swap effectively disables memory isolation","Our test runs show that enabling cgroups_limit_swap effectively disables memory isolation altogether. Per: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-memory.html ""It is important to set the memory.limit_in_bytes parameter before setting the memory.memsw.limit_in_bytes parameter: attempting to do so in the reverse order results in an error. This is because memory.memsw.limit_in_bytes becomes available only after all memory limitations (previously set in memory.limit_in_bytes) are exhausted."" Looks like the flag sets ""memory.memsw.limit_in_bytes"" if true and ""memory.limit_in_bytes"" if false, but should always set ""memory.limit_in_bytes"" and in addition set ""memory.memsw.limit_in_bytes"" if true. Otherwise the limits won't be set and enforced. See: https://github.com/apache/mesos/blob/c8598f7f5a24a01b6a68e0f060b79662ee97af89/src/slave/containerizer/isolators/cgroups/mem.cpp#L365 ",2 MESOS-2135,"Support DiskInfo in C++ Resources","We need to change the following functions: 1) addable 2) subtractable 3) validate We probably shouldn't add two disk resources with the same persistence id because they must come from different ""namespaces"". We can add more checks in the validate functions (for protobufs).",3 MESOS-2136,"Expose per-cgroup memory pressure","The cgroup memory controller can provide information on the memory pressure of a cgroup. This is in the form of an event based notification where events of (low, medium, critical) are generated when the kernel makes specific actions to allocate memory. This signal is probably more informative than comparing memory usage to memory limit. ",5 MESOS-2139,"Enable the master to handle reservation operations","master's {{_accept}} function currently only handles {{Create}} and {{Destroy}} operations which exist for persistent volumes. We need to handle the {{Reserve}} and {{Unreserve}} operations for dynamic reservations as well. In addition, we need to add {{validate}} functions for the reservation operations.",5 MESOS-2144,"Segmentation Fault in ExamplesTest.LowLevelSchedulerPthread","Occured on review bot review of: https://reviews.apache.org/r/28262/#review62333 The review doesn't touch code related to the test (And doesn't break libprocess in general) [ RUN ] ExamplesTest.LowLevelSchedulerPthread ../../src/tests/script.cpp:83: Failure Failed low_level_scheduler_pthread_test.sh terminated with signal Segmentation fault [ FAILED ] ExamplesTest.LowLevelSchedulerPthread (7561 ms) The test ",8 MESOS-2147,"Large number of connections slows statistics.json responses.","We observed that in our production environment with network monitoring being turned on. If there are many connections (> 10^4) in a container, getting socket information is expensive. It might take 1min to process all the socket information. One of the reason is that the library we are using (libnl) is not so optimized. Cong Wang has already submitted a patch: http://lists.infradead.org/pipermail/libnl/2014-November/001715.html",2 MESOS-2157,"Add /master/slaves and /master/frameworks/{framework}/tasks/{task} endpoints","master/state.json exports the entire state of the cluster and can, for large clusters, become massive (tens of megabytes of JSON). Often, a client only need information about subsets of the entire state, for example all connected slaves, or information (registration info, tasks, etc) belonging to a particular framework. We can partition state.json into many smaller endpoints, but for starters, being able to get slave information and tasks information per framework would be useful.",5 MESOS-2166," PerfEventIsolatorTest.ROOT_CGROUPS_Sample requires 'perf' to be installed","The perf::valid() relies on the 'perf' command being installed. This isn't always the case. Configure should probably check for the perf command exists.",1 MESOS-2176,"Hierarchical allocator inconsistently accounts for reserved resources. ","Looking through the allocator code for MESOS-2099, I see an issue with respect to accounting reserved resources in the sorters: Within {{HierarchicalAllocatorProcess::allocate}}, only unreserved resources are accounted for in the sorters, whereas everywhere else (add/remove framework, add/remove slave) we account for both reserved and unreserved. From git blame, it looks like this issue was introduced over a long course of refactoring and fixes to the allocator. My guess is that this was never caught due to the lack of unit-testability of the allocator (unnecessarily requires a master PID to use an allocator). From my understanding, the two levels of the hierarchical sorter should have the following semantics: # Level 1 sorts across roles. Only unreserved resources are shared across roles, and therefore the ""role sorter"" for level 1 should only account for the unreserved resource pool. # Level 2 sorts across frameworks, within a role. Both unreserved and reserved resources are shared across frameworks within a role, and therefore the ""framework sorters"" for level 2 should each account for the reserved resource pool for the role, as well as the unreserved resources _allocated_ inside the role.",5 MESOS-2182,"Performance issue in libprocess SocketManager.","Noticed an issue in production under which the master is slow to respond after failover for ~15 minutes. After looking at some perf data, the top offender is: {noformat} 12.02% mesos-master libmesos-0.21.0-rc3.so [.] std::_Rb_tree, std::less, std::allocator >::erase(process::ProcessBase* const&) ... 3.29% mesos-master libmesos-0.21.0-rc3.so [.] process::SocketManager::exited(process::ProcessBase*) {noformat} It appears that in the SocketManager, whenever an internal Process exits, we loop over all the links unnecessarily: {code} void SocketManager::exited(ProcessBase* process) { // An exited event is enough to cause the process to get deleted // (e.g., by the garbage collector), which means we can't // dereference process (or even use the address) after we enqueue at // least one exited event. Thus, we save the process pid. const UPID pid = process->pid; // Likewise, we need to save the current time of the process so we // can update the clocks of linked processes as appropriate. const Time time = Clock::now(process); synchronized (this) { // Iterate through the links, removing any links the process might // have had and creating exited events for any linked processes. foreachpair (const UPID& linkee, set& processes, links) { processes.erase(process); if (linkee == pid) { foreach (ProcessBase* linker, processes) { CHECK(linker != process) << ""Process linked with itself""; synchronized (timeouts) { if (Clock::paused()) { Clock::update(linker, time); } } linker->enqueue(new ExitedEvent(linkee)); } } } links.erase(pid); } } {code} On clusters with 10,000s of slaves, this means we hold the socket manager lock for a very expensive loop erasing nothing from a set! This is because, the master contains links from the Master Process to each slave. However, when a random ephemeral Process terminates, we don't need to loop over each slave link. While we hold this lock, the following calls will block: {code} class SocketManager { public: Socket accepted(int s); void link(ProcessBase* process, const UPID& to); PID proxy(const Socket& socket); void send(Encoder* encoder, bool persist); void send(const Response& response, const Request& request, const Socket& socket); void send(Message* message); Encoder* next(int s); void close(int s); void exited(const Node& node); void exited(ProcessBase* process); ... {code} As a result, the slave observers and the master can block calling send()! Short term, we will try to fix this issue by removing the unnecessary looping. Longer term, it would be nice to avoid all this locking when sending on independent sockets.",3 MESOS-2184,"deprecate unused flag 'cgroups_subsystems'","cgroups_subsystems is a slave flag that is no longer used and should be deprecated.",1 MESOS-2191,"Add ContainerId to the TaskStatus message","{{TaskStatus}} provides the frameworks with certain information ({{executorId}}, {{slaveId}}, etc.) which is useful when collecting statistics about cluster performance; however, it is difficult to associate tasks to the container it is executed since this information stays always within mesos itself. Therefore it would be good to provide the framework scheduler with this information, adding a new field in the {{TaskStatus}} message. See comments for a use case.",3 MESOS-2199,"Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser","Appears that running the executor as {{nobody}} is not supported. [~nnielsen] can you take a look? Executor log: {noformat} [root@hostname build]# cat /tmp/SlaveTest_ROOT_RunTaskWithCommandInfoWithUser_cxF1dY/slaves/20141219-005206-2081170186-60487-11862-S0/frameworks/20141219-005206-2081170186-60 487-11862-0000/executors/1/runs/latest/std* sh: /home/idownes/workspace/mesos/build/src/mesos-executor: Permission denied {noformat} Test output: {noformat} [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from SlaveTest [ RUN ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser ../../src/tests/slave_tests.cpp:680: Failure Value of: statusRunning.get().state() Actual: TASK_FAILED Expected: TASK_RUNNING ../../src/tests/slave_tests.cpp:682: Failure Failed to wait 10secs for statusFinished ../../src/tests/slave_tests.cpp:673: Failure Actual function call count doesn't match EXPECT_CALL(sched, statusUpdate(&driver, _))... Expected: to be called twice Actual: called once - unsatisfied and active [ FAILED ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser (10641 ms) [----------] 1 test from SlaveTest (10641 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test case ran. (10658 ms total) {noformat}",2 MESOS-2200,"bogus docker images result in bad error message to scheduler","When a scheduler specifies a bogus image in ContainerInfo mesos doesn't tell the scheduler that the docker pull failed or why. This error is logged in the mesos-slave log, but it isn't given to the scheduler (as far as I can tell): {noformat} E1218 23:50:55.406230 8123 slave.cpp:2730] Container '8f70784c-3e40-4072-9ca2-9daed23f15ff' for executor 'thermos-1418946354013-xxx-xxx-curl-0-f500cc41-dd0a-4338-8cbc-d631cb588bb1' of framework '20140522-213145-1749004561-5050-29512-0000' failed to start: Failed to 'docker pull docker-registry.example.com/doesntexist/hello1.1:latest': exit status = exited with status 1 stderr = 2014/12/18 23:50:55 Error: image doesntexist/hello1.1 not found {noformat} If the docker image is not in the registry, the scheduler should give the user an error message. If docker pull failed because of networking issues, it should be retried. Mesos should give the scheduler enough information to be able to make that decision.",2 MESOS-2201,"ReplicaTest.Restore fails with leveldb greater than v1.7.","I wanted to configure Mesos with system provided leveldb libraries when I ran into this issue. Apparently, if one does {{../configure --with-leveldb=/path/to/leveldb}}, compilation succeeds, however the ""ReplicaTest_Restore"" test fails with the following back trace: {code} [ RUN ] ReplicaTest.Restore Using temporary directory '/tmp/ReplicaTest_Restore_IZbbRR' I1222 14:16:49.517500 2927 leveldb.cpp:176] Opened db in 10.758917ms I1222 14:16:49.526495 2927 leveldb.cpp:183] Compacted db in 8.931146ms I1222 14:16:49.526523 2927 leveldb.cpp:198] Created db iterator in 5787ns I1222 14:16:49.526531 2927 leveldb.cpp:204] Seeked to beginning of db in 511ns I1222 14:16:49.526535 2927 leveldb.cpp:273] Iterated through 0 keys in the db in 197ns I1222 14:16:49.526623 2927 replica.cpp:741] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I1222 14:16:49.530972 2945 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 3.084458ms I1222 14:16:49.531008 2945 replica.cpp:320] Persisted replica status to VOTING I1222 14:16:49.541263 2927 leveldb.cpp:176] Opened db in 9.980586ms I1222 14:16:49.551636 2927 leveldb.cpp:183] Compacted db in 10.348096ms I1222 14:16:49.551683 2927 leveldb.cpp:198] Created db iterator in 3405ns I1222 14:16:49.551693 2927 leveldb.cpp:204] Seeked to beginning of db in 3559ns I1222 14:16:49.551728 2927 leveldb.cpp:273] Iterated through 1 keys in the db in 29722ns I1222 14:16:49.551751 2927 replica.cpp:741] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I1222 14:16:49.551996 2947 replica.cpp:474] Replica received implicit promise request with proposal 1 I1222 14:16:49.560921 2947 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 8.899591ms I1222 14:16:49.560940 2947 replica.cpp:342] Persisted promised to 1 I1222 14:16:49.561338 2943 replica.cpp:508] Replica received write request for position 1 I1222 14:16:49.568677 2943 leveldb.cpp:343] Persisting action (27 bytes) to leveldb took 7.287155ms I1222 14:16:49.568692 2943 replica.cpp:676] Persisted action at 1 I1222 14:16:49.569042 2942 leveldb.cpp:438] Reading position from leveldb took 26339ns F1222 14:16:49.569411 2927 replica.cpp:721] CHECK_SOME(state): IO error: lock /tmp/ReplicaTest_Restore_IZbbRR/.log/LOCK: already held by process Failed to recover the log *** Check failure stack trace: *** @ 0x7f7f6c53e688 google::LogMessage::Fail() @ 0x7f7f6c53e5e7 google::LogMessage::SendToLog() @ 0x7f7f6c53dff8 google::LogMessage::Flush() @ 0x7f7f6c540d2c google::LogMessageFatal::~LogMessageFatal() @ 0x90a520 _CheckFatal::~_CheckFatal() @ 0x7f7f6c400f4d mesos::internal::log::ReplicaProcess::restore() @ 0x7f7f6c3fd763 mesos::internal::log::ReplicaProcess::ReplicaProcess() @ 0x7f7f6c401271 mesos::internal::log::Replica::Replica() @ 0xcd7ca3 ReplicaTest_Restore_Test::TestBody() @ 0x10934b2 testing::internal::HandleSehExceptionsInMethodIfSupported<>() @ 0x108e584 testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x10768fd testing::Test::Run() @ 0x1077020 testing::TestInfo::Run() @ 0x10775a8 testing::TestCase::Run() @ 0x107c324 testing::internal::UnitTestImpl::RunAllTests() @ 0x1094348 testing::internal::HandleSehExceptionsInMethodIfSupported<>() @ 0x108f2b7 testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x107b1d4 testing::UnitTest::Run() @ 0xd344a9 main @ 0x7f7f66fdfb45 __libc_start_main @ 0x8f3549 (unknown) @ (nil) (unknown) [2] 2927 abort (core dumped) GLOG_logtostderr=1 GTEST_v=10 ./bin/mesos-tests.sh --verbose {code} The bundled version of leveldb is v1.4. I tested version 1.5 and that seems to work. However, v1.6 had some build issues and us unusable with Mesos. The next version v1.7, allows Mesos to compile fine but results in the above error.",3 MESOS-2205,"Add user documentation for reservations","Add a user guide for reservations which describes basic usage of them, how ACLs are used to specify who can unreserve whose resources, and few advanced usage cases.",2 MESOS-2210,"Disallow special characters in role.","As we introduce persistent volumes in MESOS-1524, we will use roles as directory names on the slave (https://reviews.apache.org/r/28562/). As a result, the master should disallow special characters (like space and slash) in role.",2 MESOS-2215,"The Docker containerizer attempts to recover any task when checkpointing is enabled, not just docker tasks.","Once the slave restarts and recovers the task, I see this error in the log for all tasks that were recovered every second or so. Note, these were NOT docker tasks: W0113 16:01:00.790323 773142 monitor.cpp:213] Failed to get resource usage for container 7b729b89-dc7e-4d08-af97-8cd1af560a21 for executor thermos-1421085237813-slipstream-prod-agent-3-8f769514-1835-4151-90d0-3f55dcc940dd of framework 20150109-161713-715350282-5050-290797-0000: Failed to 'docker inspect mesos-7b729b89-dc7e-4d08-af97-8cd1af560a21': exit status = exited with status 1 stderr = Error: No such image or container: mesos-7b729b89-dc7e-4d08-af97-8cd1af560a21 However the tasks themselves are still healthy and running. The slave was launched with --containerizers=mesos,docker ----- More info: it looks like the docker containerizer is a little too ambitious about recovering containers, again this was not a docker task: I0113 15:59:59.476145 773142 docker.cpp:814] Recovering container '7b729b89-dc7e-4d08-af97-8cd1af560a21' for executor 'thermos-1421085237813-slipstream-prod-agent-3-8f769514-1835-4151-90d0-3f55dcc940dd' of framework 20150109-161713-715350282-5050-290797-0000 Looking into the source, it looks like the problem is that the ComposingContainerize runs recover in parallel, but neither the docker containerizer nor mesos containerizer check if they should recover the task or not (because they were the ones that launched it). Perhaps this needs to be written into the checkpoint somewhere?",8 MESOS-2222,"Add ACLs for the maintenance HTTP endpoints.","In order to authorize the HTTP endpoints for maintenance (to be added in MESOS-2067), we will need to add an ACL definition for performing maintenance operations.",3 MESOS-2225,"FaultToleranceTest.ReregisterFrameworkExitedExecutor is flaky","Observed this on internal CI. {code} [ RUN ] FaultToleranceTest.ReregisterFrameworkExitedExecutor Using temporary directory '/tmp/FaultToleranceTest_ReregisterFrameworkExitedExecutor_yNprKi' I0114 18:50:51.461186 4720 leveldb.cpp:176] Opened db in 4.866948ms I0114 18:50:51.462057 4720 leveldb.cpp:183] Compacted db in 472256ns I0114 18:50:51.462514 4720 leveldb.cpp:198] Created db iterator in 42905ns I0114 18:50:51.462784 4720 leveldb.cpp:204] Seeked to beginning of db in 21630ns I0114 18:50:51.463068 4720 leveldb.cpp:273] Iterated through 0 keys in the db in 19967ns I0114 18:50:51.463485 4720 replica.cpp:744] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0114 18:50:51.464555 4737 recover.cpp:449] Starting replica recovery I0114 18:50:51.465188 4737 recover.cpp:475] Replica is in EMPTY status I0114 18:50:51.467324 4741 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request I0114 18:50:51.470118 4736 recover.cpp:195] Received a recover response from a replica in EMPTY status I0114 18:50:51.475424 4739 recover.cpp:566] Updating replica status to STARTING I0114 18:50:51.476553 4739 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 107545ns I0114 18:50:51.476862 4739 replica.cpp:323] Persisted replica status to STARTING I0114 18:50:51.477309 4739 recover.cpp:475] Replica is in STARTING status I0114 18:50:51.479109 4734 replica.cpp:641] Replica in STARTING status received a broadcasted recover request I0114 18:50:51.481274 4738 recover.cpp:195] Received a recover response from a replica in STARTING status I0114 18:50:51.482324 4738 recover.cpp:566] Updating replica status to VOTING I0114 18:50:51.482913 4738 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 66011ns I0114 18:50:51.483186 4738 replica.cpp:323] Persisted replica status to VOTING I0114 18:50:51.483608 4738 recover.cpp:580] Successfully joined the Paxos group I0114 18:50:51.484031 4738 recover.cpp:464] Recover process terminated I0114 18:50:51.554949 4734 master.cpp:262] Master 20150114-185051-2272962752-57018-4720 (fedora-19) started on 192.168.122.135:57018 I0114 18:50:51.555785 4734 master.cpp:308] Master only allowing authenticated frameworks to register I0114 18:50:51.556046 4734 master.cpp:313] Master only allowing authenticated slaves to register I0114 18:50:51.556426 4734 credentials.hpp:36] Loading credentials for authentication from '/tmp/FaultToleranceTest_ReregisterFrameworkExitedExecutor_yNprKi/credentials' I0114 18:50:51.557003 4734 master.cpp:357] Authorization enabled I0114 18:50:51.558007 4737 hierarchical_allocator_process.hpp:285] Initialized hierarchical allocator process I0114 18:50:51.558521 4741 whitelist_watcher.cpp:65] No whitelist given I0114 18:50:51.562185 4734 master.cpp:1219] The newly elected leader is master@192.168.122.135:57018 with id 20150114-185051-2272962752-57018-4720 I0114 18:50:51.562680 4734 master.cpp:1232] Elected as the leading master! I0114 18:50:51.562950 4734 master.cpp:1050] Recovering from registrar I0114 18:50:51.564506 4736 registrar.cpp:313] Recovering registrar I0114 18:50:51.566162 4737 log.cpp:660] Attempting to start the writer I0114 18:50:51.568691 4741 replica.cpp:477] Replica received implicit promise request with proposal 1 I0114 18:50:51.569154 4741 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 106885ns I0114 18:50:51.569504 4741 replica.cpp:345] Persisted promised to 1 I0114 18:50:51.573277 4740 coordinator.cpp:230] Coordinator attemping to fill missing position I0114 18:50:51.575623 4739 replica.cpp:378] Replica received explicit promise request for position 0 with proposal 2 I0114 18:50:51.576133 4739 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 86360ns I0114 18:50:51.576449 4739 replica.cpp:679] Persisted action at 0 I0114 18:50:51.586966 4736 replica.cpp:511] Replica received write request for position 0 I0114 18:50:51.587666 4736 leveldb.cpp:438] Reading position from leveldb took 60621ns I0114 18:50:51.588043 4736 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 81094ns I0114 18:50:51.588374 4736 replica.cpp:679] Persisted action at 0 I0114 18:50:51.589418 4736 replica.cpp:658] Replica received learned notice for position 0 I0114 18:50:51.590428 4736 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 106648ns I0114 18:50:51.590840 4736 replica.cpp:679] Persisted action at 0 I0114 18:50:51.591104 4736 replica.cpp:664] Replica learned NOP action at position 0 I0114 18:50:51.592260 4734 log.cpp:676] Writer started with ending position 0 I0114 18:50:51.594172 4739 leveldb.cpp:438] Reading position from leveldb took 52163ns I0114 18:50:51.600744 4736 registrar.cpp:346] Successfully fetched the registry (0B) in 35968us I0114 18:50:51.601646 4736 registrar.cpp:445] Applied 1 operations in 184502ns; attempting to update the 'registry' I0114 18:50:51.604329 4737 log.cpp:684] Attempting to append 130 bytes to the log I0114 18:50:51.604966 4737 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 1 I0114 18:50:51.606449 4737 replica.cpp:511] Replica received write request for position 1 I0114 18:50:51.606937 4737 leveldb.cpp:343] Persisting action (149 bytes) to leveldb took 84877ns I0114 18:50:51.607199 4737 replica.cpp:679] Persisted action at 1 I0114 18:50:51.611934 4741 replica.cpp:658] Replica received learned notice for position 1 I0114 18:50:51.612423 4741 leveldb.cpp:343] Persisting action (151 bytes) to leveldb took 113059ns I0114 18:50:51.612794 4741 replica.cpp:679] Persisted action at 1 I0114 18:50:51.613056 4741 replica.cpp:664] Replica learned APPEND action at position 1 I0114 18:50:51.614598 4741 log.cpp:703] Attempting to truncate the log to 1 I0114 18:50:51.615157 4741 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 2 I0114 18:50:51.616458 4737 replica.cpp:511] Replica received write request for position 2 I0114 18:50:51.616902 4737 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 71716ns I0114 18:50:51.617168 4737 replica.cpp:679] Persisted action at 2 I0114 18:50:51.618505 4740 replica.cpp:658] Replica received learned notice for position 2 I0114 18:50:51.619031 4740 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 78481ns I0114 18:50:51.619567 4740 leveldb.cpp:401] Deleting ~1 keys from leveldb took 59638ns I0114 18:50:51.619832 4740 replica.cpp:679] Persisted action at 2 I0114 18:50:51.620101 4740 replica.cpp:664] Replica learned TRUNCATE action at position 2 I0114 18:50:51.621757 4736 registrar.cpp:490] Successfully updated the 'registry' in 19.78496ms I0114 18:50:51.622658 4736 registrar.cpp:376] Successfully recovered registrar I0114 18:50:51.623261 4736 master.cpp:1077] Recovered 0 slaves from the Registry (94B) ; allowing 10mins for slaves to re-register I0114 18:50:51.670349 4739 slave.cpp:173] Slave started on 115)@192.168.122.135:57018 I0114 18:50:51.671133 4739 credentials.hpp:84] Loading credential for authentication from '/tmp/FaultToleranceTest_ReregisterFrameworkExitedExecutor_ONrVug/credential' I0114 18:50:51.671685 4739 slave.cpp:282] Slave using credential for: test-principal I0114 18:50:51.672245 4739 slave.cpp:300] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0114 18:50:51.673360 4739 slave.cpp:329] Slave hostname: fedora-19 I0114 18:50:51.673660 4739 slave.cpp:330] Slave checkpoint: false W0114 18:50:51.674052 4739 slave.cpp:332] Disabling checkpointing is deprecated and the --checkpoint flag will be removed in a future release. Please avoid using this flag I0114 18:50:51.677234 4737 state.cpp:33] Recovering state from '/tmp/FaultToleranceTest_ReregisterFrameworkExitedExecutor_ONrVug/meta' I0114 18:50:51.684973 4739 status_update_manager.cpp:197] Recovering status update manager I0114 18:50:51.687644 4739 slave.cpp:3519] Finished recovery I0114 18:50:51.688698 4737 slave.cpp:613] New master detected at master@192.168.122.135:57018 I0114 18:50:51.688902 4734 status_update_manager.cpp:171] Pausing sending status updates I0114 18:50:51.689482 4737 slave.cpp:676] Authenticating with master master@192.168.122.135:57018 I0114 18:50:51.689910 4737 slave.cpp:681] Using default CRAM-MD5 authenticatee I0114 18:50:51.690577 4741 authenticatee.hpp:138] Creating new client SASL connection I0114 18:50:51.691453 4737 slave.cpp:649] Detecting new master I0114 18:50:51.691864 4741 master.cpp:4130] Authenticating slave(115)@192.168.122.135:57018 I0114 18:50:51.692369 4741 master.cpp:4141] Using default CRAM-MD5 authenticator I0114 18:50:51.693208 4741 authenticator.hpp:170] Creating new server SASL connection I0114 18:50:51.694598 4738 authenticatee.hpp:229] Received SASL authentication mechanisms: CRAM-MD5 I0114 18:50:51.694893 4738 authenticatee.hpp:255] Attempting to authenticate with mechanism 'CRAM-MD5' I0114 18:50:51.695329 4741 authenticator.hpp:276] Received SASL authentication start I0114 18:50:51.695641 4741 authenticator.hpp:398] Authentication requires more steps I0114 18:50:51.696028 4736 authenticatee.hpp:275] Received SASL authentication step I0114 18:50:51.696486 4741 authenticator.hpp:304] Received SASL authentication step I0114 18:50:51.696753 4741 auxprop.cpp:99] Request to lookup properties for user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0114 18:50:51.697041 4741 auxprop.cpp:171] Looking up auxiliary property '*userPassword' I0114 18:50:51.697343 4741 auxprop.cpp:171] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0114 18:50:51.697685 4741 auxprop.cpp:99] Request to lookup properties for user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0114 18:50:51.697998 4741 auxprop.cpp:121] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0114 18:50:51.698251 4741 auxprop.cpp:121] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0114 18:50:51.698580 4741 authenticator.hpp:390] Authentication success I0114 18:50:51.698927 4735 authenticatee.hpp:315] Authentication success I0114 18:50:51.705123 4741 master.cpp:4188] Successfully authenticated principal 'test-principal' at slave(115)@192.168.122.135:57018 I0114 18:50:51.705847 4720 sched.cpp:151] Version: 0.22.0 I0114 18:50:51.707159 4736 sched.cpp:248] New master detected at master@192.168.122.135:57018 I0114 18:50:51.707523 4736 sched.cpp:304] Authenticating with master master@192.168.122.135:57018 I0114 18:50:51.707792 4736 sched.cpp:311] Using default CRAM-MD5 authenticatee I0114 18:50:51.708412 4736 authenticatee.hpp:138] Creating new client SASL connection I0114 18:50:51.709316 4735 slave.cpp:747] Successfully authenticated with master master@192.168.122.135:57018 I0114 18:50:51.709723 4737 master.cpp:4130] Authenticating scheduler-092fbbec-0938-4355-8187-fb92e5174c64@192.168.122.135:57018 I0114 18:50:51.710274 4737 master.cpp:4141] Using default CRAM-MD5 authenticator I0114 18:50:51.710739 4735 slave.cpp:1075] Will retry registration in 17.028024ms if necessary I0114 18:50:51.711304 4737 master.cpp:3276] Registering slave at slave(115)@192.168.122.135:57018 (fedora-19) with id 20150114-185051-2272962752-57018-4720-S0 I0114 18:50:51.711459 4738 authenticator.hpp:170] Creating new server SASL connection I0114 18:50:51.713142 4739 registrar.cpp:445] Applied 1 operations in 100530ns; attempting to update the 'registry' I0114 18:50:51.713465 4738 authenticatee.hpp:229] Received SASL authentication mechanisms: CRAM-MD5 I0114 18:50:51.715435 4738 authenticatee.hpp:255] Attempting to authenticate with mechanism 'CRAM-MD5' I0114 18:50:51.715963 4740 authenticator.hpp:276] Received SASL authentication start I0114 18:50:51.716258 4740 authenticator.hpp:398] Authentication requires more steps I0114 18:50:51.716524 4740 authenticatee.hpp:275] Received SASL authentication step I0114 18:50:51.716784 4740 authenticator.hpp:304] Received SASL authentication step I0114 18:50:51.716979 4740 auxprop.cpp:99] Request to lookup properties for user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0114 18:50:51.717139 4740 auxprop.cpp:171] Looking up auxiliary property '*userPassword' I0114 18:50:51.717315 4740 auxprop.cpp:171] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0114 18:50:51.717542 4740 auxprop.cpp:99] Request to lookup properties for user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0114 18:50:51.717703 4740 auxprop.cpp:121] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0114 18:50:51.717864 4740 auxprop.cpp:121] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0114 18:50:51.718040 4740 authenticator.hpp:390] Authentication success I0114 18:50:51.718292 4740 authenticatee.hpp:315] Authentication success I0114 18:50:51.718454 4738 master.cpp:4188] Successfully authenticated principal 'test-principal' at scheduler-092fbbec-0938-4355-8187-fb92e5174c64@192.168.122.135:57018 I0114 18:50:51.719012 4740 sched.cpp:392] Successfully authenticated with master master@192.168.122.135:57018 I0114 18:50:51.719364 4740 sched.cpp:515] Sending registration request to master@192.168.122.135:57018 I0114 18:50:51.719702 4740 sched.cpp:548] Will retry registration in 746.539282ms if necessary I0114 18:50:51.719902 4735 master.cpp:1417] Received registration request for framework 'default' at scheduler-092fbbec-0938-4355-8187-fb92e5174c64@192.168.122.135:57018 I0114 18:50:51.720232 4735 master.cpp:1298] Authorizing framework principal 'test-principal' to receive offers for role '*' I0114 18:50:51.722206 4735 master.cpp:1481] Registering framework 20150114-185051-2272962752-57018-4720-0000 (default) at scheduler-092fbbec-0938-4355-8187-fb92e5174c64@192.168.122.135:57018 I0114 18:50:51.720927 4737 log.cpp:684] Attempting to append 300 bytes to the log I0114 18:50:51.722924 4737 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 3 I0114 18:50:51.724269 4737 replica.cpp:511] Replica received write request for position 3 I0114 18:50:51.724817 4737 leveldb.cpp:343] Persisting action (319 bytes) to leveldb took 116638ns I0114 18:50:51.728560 4737 replica.cpp:679] Persisted action at 3 I0114 18:50:51.726066 4736 sched.cpp:442] Framework registered with 20150114-185051-2272962752-57018-4720-0000 I0114 18:50:51.728879 4736 sched.cpp:456] Scheduler::registered took 34885ns I0114 18:50:51.725520 4735 hierarchical_allocator_process.hpp:319] Added framework 20150114-185051-2272962752-57018-4720-0000 I0114 18:50:51.731864 4735 hierarchical_allocator_process.hpp:839] No resources available to allocate! I0114 18:50:51.732038 4735 hierarchical_allocator_process.hpp:746] Performed allocation for 0 slaves in 214728ns I0114 18:50:51.733106 4738 replica.cpp:658] Replica received learned notice for position 3 I0114 18:50:51.733340 4738 leveldb.cpp:343] Persisting action (321 bytes) to leveldb took 83165ns I0114 18:50:51.733538 4738 replica.cpp:679] Persisted action at 3 I0114 18:50:51.733705 4738 replica.cpp:664] Replica learned APPEND action at position 3 I0114 18:50:51.735610 4738 registrar.cpp:490] Successfully updated the 'registry' in 21.936128ms I0114 18:50:51.735805 4739 log.cpp:703] Attempting to truncate the log to 3 I0114 18:50:51.736445 4739 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 4 I0114 18:50:51.737664 4739 replica.cpp:511] Replica received write request for position 4 I0114 18:50:51.738013 4739 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 72906ns I0114 18:50:51.738255 4739 replica.cpp:679] Persisted action at 4 I0114 18:50:51.743397 4734 replica.cpp:658] Replica received learned notice for position 4 I0114 18:50:51.743628 4734 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 78832ns I0114 18:50:51.743837 4734 leveldb.cpp:401] Deleting ~2 keys from leveldb took 63991ns I0114 18:50:51.744004 4734 replica.cpp:679] Persisted action at 4 I0114 18:50:51.744168 4734 replica.cpp:664] Replica learned TRUNCATE action at position 4 I0114 18:50:51.745537 4738 master.cpp:3330] Registered slave 20150114-185051-2272962752-57018-4720-S0 at slave(115)@192.168.122.135:57018 (fedora-19) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0114 18:50:51.745968 4734 hierarchical_allocator_process.hpp:453] Added slave 20150114-185051-2272962752-57018-4720-S0 (fedora-19) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (and cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] available) I0114 18:50:51.746070 4735 slave.cpp:781] Registered with master master@192.168.122.135:57018; given slave ID 20150114-185051-2272962752-57018-4720-S0 I0114 18:50:51.751437 4741 status_update_manager.cpp:178] Resuming sending status updates I0114 18:50:51.752428 4740 master.cpp:4072] Sending 1 offers to framework 20150114-185051-2272962752-57018-4720-0000 (default) at scheduler-092fbbec-0938-4355-8187-fb92e5174c64@192.168.122.135:57018 I0114 18:50:51.753764 4740 sched.cpp:605] Scheduler::resourceOffers took 751714ns I0114 18:50:51.754812 4740 master.cpp:2541] Processing reply for offers: [ 20150114-185051-2272962752-57018-4720-O0 ] on slave 20150114-185051-2272962752-57018-4720-S0 at slave(115)@192.168.122.135:57018 (fedora-19) for framework 20150114-185051-2272962752-57018-4720-0000 (default) at scheduler-092fbbec-0938-4355-8187-fb92e5174c64@192.168.122.135:57018 I0114 18:50:51.755040 4740 master.cpp:2647] Authorizing framework principal 'test-principal' to launch task 0 as user 'jenkins' W0114 18:50:51.756431 4741 master.cpp:2124] Executor default for task 0 uses less CPUs (None) than the minimum required (0.01). Please update your executor, as this will be mandatory in future releases. W0114 18:50:51.756652 4741 master.cpp:2136] Executor default for task 0 uses less memory (None) than the minimum required (32MB). Please update your executor, as this will be mandatory in future releases. I0114 18:50:51.757284 4741 master.hpp:766] Adding task 0 with resources cpus(*):1; mem(*):16 on slave 20150114-185051-2272962752-57018-4720-S0 (fedora-19) I0114 18:50:51.757733 4734 hierarchical_allocator_process.hpp:764] Performed allocation for slave 20150114-185051-2272962752-57018-4720-S0 in 9.535066ms I0114 18:50:51.758117 4735 slave.cpp:2588] Received ping from slave-observer(95)@192.168.122.135:57018 I0114 18:50:51.758630 4741 master.cpp:2897] Launching task 0 of framework 20150114-185051-2272962752-57018-4720-0000 (default) at scheduler-092fbbec-0938-4355-8187-fb92e5174c64@192.168.122.135:57018 with resources cpus(*):1; mem(*):16 on slave 20150114-185051-2272962752-57018-4720-S0 at slave(115)@192.168.122.135:57018 (fedora-19) I0114 18:50:51.759526 4741 hierarchical_allocator_process.hpp:610] Updated allocation of framework 20150114-185051-2272962752-57018-4720-0000 on slave 20150114-185051-2272962752-57018-4720-S0 from cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] to cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0114 18:50:51.759796 4737 slave.cpp:1130] Got assigned task 0 for framework 20150114-185051-2272962752-57018-4720-0000 I0114 18:50:51.761184 4737 slave.cpp:1245] Launching task 0 for framework 20150...",2 MESOS-2226,"HookTest.VerifySlaveLaunchExecutorHook is flaky","Observed this on internal CI {code} [ RUN ] HookTest.VerifySlaveLaunchExecutorHook Using temporary directory '/tmp/HookTest_VerifySlaveLaunchExecutorHook_GjBgME' I0114 18:51:34.659353 4720 leveldb.cpp:176] Opened db in 1.255951ms I0114 18:51:34.662112 4720 leveldb.cpp:183] Compacted db in 596090ns I0114 18:51:34.662364 4720 leveldb.cpp:198] Created db iterator in 177877ns I0114 18:51:34.662719 4720 leveldb.cpp:204] Seeked to beginning of db in 19709ns I0114 18:51:34.663010 4720 leveldb.cpp:273] Iterated through 0 keys in the db in 18208ns I0114 18:51:34.663312 4720 replica.cpp:744] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0114 18:51:34.664266 4735 recover.cpp:449] Starting replica recovery I0114 18:51:34.664908 4735 recover.cpp:475] Replica is in EMPTY status I0114 18:51:34.667842 4734 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request I0114 18:51:34.669117 4735 recover.cpp:195] Received a recover response from a replica in EMPTY status I0114 18:51:34.677913 4735 recover.cpp:566] Updating replica status to STARTING I0114 18:51:34.683157 4735 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 137939ns I0114 18:51:34.683507 4735 replica.cpp:323] Persisted replica status to STARTING I0114 18:51:34.684013 4735 recover.cpp:475] Replica is in STARTING status I0114 18:51:34.685554 4738 replica.cpp:641] Replica in STARTING status received a broadcasted recover request I0114 18:51:34.696512 4736 recover.cpp:195] Received a recover response from a replica in STARTING status I0114 18:51:34.700552 4735 recover.cpp:566] Updating replica status to VOTING I0114 18:51:34.701128 4735 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 115624ns I0114 18:51:34.701478 4735 replica.cpp:323] Persisted replica status to VOTING I0114 18:51:34.701817 4735 recover.cpp:580] Successfully joined the Paxos group I0114 18:51:34.702569 4735 recover.cpp:464] Recover process terminated I0114 18:51:34.716439 4736 master.cpp:262] Master 20150114-185134-2272962752-57018-4720 (fedora-19) started on 192.168.122.135:57018 I0114 18:51:34.716913 4736 master.cpp:308] Master only allowing authenticated frameworks to register I0114 18:51:34.717136 4736 master.cpp:313] Master only allowing authenticated slaves to register I0114 18:51:34.717488 4736 credentials.hpp:36] Loading credentials for authentication from '/tmp/HookTest_VerifySlaveLaunchExecutorHook_GjBgME/credentials' I0114 18:51:34.718077 4736 master.cpp:357] Authorization enabled I0114 18:51:34.719238 4738 whitelist_watcher.cpp:65] No whitelist given I0114 18:51:34.719755 4737 hierarchical_allocator_process.hpp:285] Initialized hierarchical allocator process I0114 18:51:34.722584 4736 master.cpp:1219] The newly elected leader is master@192.168.122.135:57018 with id 20150114-185134-2272962752-57018-4720 I0114 18:51:34.722865 4736 master.cpp:1232] Elected as the leading master! I0114 18:51:34.723310 4736 master.cpp:1050] Recovering from registrar I0114 18:51:34.723760 4734 registrar.cpp:313] Recovering registrar I0114 18:51:34.725229 4740 log.cpp:660] Attempting to start the writer I0114 18:51:34.727893 4739 replica.cpp:477] Replica received implicit promise request with proposal 1 I0114 18:51:34.728425 4739 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 114781ns I0114 18:51:34.728662 4739 replica.cpp:345] Persisted promised to 1 I0114 18:51:34.731271 4741 coordinator.cpp:230] Coordinator attemping to fill missing position I0114 18:51:34.733223 4734 replica.cpp:378] Replica received explicit promise request for position 0 with proposal 2 I0114 18:51:34.734076 4734 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 87441ns I0114 18:51:34.734441 4734 replica.cpp:679] Persisted action at 0 I0114 18:51:34.740272 4739 replica.cpp:511] Replica received write request for position 0 I0114 18:51:34.740910 4739 leveldb.cpp:438] Reading position from leveldb took 59846ns I0114 18:51:34.741672 4739 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 189259ns I0114 18:51:34.741919 4739 replica.cpp:679] Persisted action at 0 I0114 18:51:34.743000 4739 replica.cpp:658] Replica received learned notice for position 0 I0114 18:51:34.746844 4739 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 328487ns I0114 18:51:34.747118 4739 replica.cpp:679] Persisted action at 0 I0114 18:51:34.747553 4739 replica.cpp:664] Replica learned NOP action at position 0 I0114 18:51:34.751344 4737 log.cpp:676] Writer started with ending position 0 I0114 18:51:34.753504 4734 leveldb.cpp:438] Reading position from leveldb took 61183ns I0114 18:51:34.762962 4737 registrar.cpp:346] Successfully fetched the registry (0B) in 38.907904ms I0114 18:51:34.763610 4737 registrar.cpp:445] Applied 1 operations in 67206ns; attempting to update the 'registry' I0114 18:51:34.766079 4736 log.cpp:684] Attempting to append 130 bytes to the log I0114 18:51:34.766769 4736 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 1 I0114 18:51:34.768215 4741 replica.cpp:511] Replica received write request for position 1 I0114 18:51:34.768759 4741 leveldb.cpp:343] Persisting action (149 bytes) to leveldb took 87970ns I0114 18:51:34.768995 4741 replica.cpp:679] Persisted action at 1 I0114 18:51:34.770691 4736 replica.cpp:658] Replica received learned notice for position 1 I0114 18:51:34.771273 4736 leveldb.cpp:343] Persisting action (151 bytes) to leveldb took 83590ns I0114 18:51:34.771579 4736 replica.cpp:679] Persisted action at 1 I0114 18:51:34.771917 4736 replica.cpp:664] Replica learned APPEND action at position 1 I0114 18:51:34.773252 4738 log.cpp:703] Attempting to truncate the log to 1 I0114 18:51:34.773756 4735 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 2 I0114 18:51:34.775552 4736 replica.cpp:511] Replica received write request for position 2 I0114 18:51:34.775846 4736 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 71503ns I0114 18:51:34.776695 4736 replica.cpp:679] Persisted action at 2 I0114 18:51:34.785259 4739 replica.cpp:658] Replica received learned notice for position 2 I0114 18:51:34.786252 4737 registrar.cpp:490] Successfully updated the 'registry' in 22.340864ms I0114 18:51:34.787094 4737 registrar.cpp:376] Successfully recovered registrar I0114 18:51:34.787749 4737 master.cpp:1077] Recovered 0 slaves from the Registry (94B) ; allowing 10mins for slaves to re-register I0114 18:51:34.787282 4739 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 707150ns I0114 18:51:34.788692 4739 leveldb.cpp:401] Deleting ~1 keys from leveldb took 60262ns I0114 18:51:34.789048 4739 replica.cpp:679] Persisted action at 2 I0114 18:51:34.789329 4739 replica.cpp:664] Replica learned TRUNCATE action at position 2 I0114 18:51:34.819548 4738 slave.cpp:173] Slave started on 171)@192.168.122.135:57018 I0114 18:51:34.820530 4738 credentials.hpp:84] Loading credential for authentication from '/tmp/HookTest_VerifySlaveLaunchExecutorHook_AYxNqe/credential' I0114 18:51:34.820952 4738 slave.cpp:282] Slave using credential for: test-principal I0114 18:51:34.821516 4738 slave.cpp:300] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0114 18:51:34.822217 4738 slave.cpp:329] Slave hostname: fedora-19 I0114 18:51:34.822502 4738 slave.cpp:330] Slave checkpoint: false W0114 18:51:34.822857 4738 slave.cpp:332] Disabling checkpointing is deprecated and the --checkpoint flag will be removed in a future release. Please avoid using this flag I0114 18:51:34.824998 4737 state.cpp:33] Recovering state from '/tmp/HookTest_VerifySlaveLaunchExecutorHook_AYxNqe/meta' I0114 18:51:34.834015 4738 status_update_manager.cpp:197] Recovering status update manager I0114 18:51:34.834810 4738 slave.cpp:3519] Finished recovery I0114 18:51:34.835906 4734 status_update_manager.cpp:171] Pausing sending status updates I0114 18:51:34.836423 4738 slave.cpp:613] New master detected at master@192.168.122.135:57018 I0114 18:51:34.836908 4738 slave.cpp:676] Authenticating with master master@192.168.122.135:57018 I0114 18:51:34.837190 4738 slave.cpp:681] Using default CRAM-MD5 authenticatee I0114 18:51:34.837820 4737 authenticatee.hpp:138] Creating new client SASL connection I0114 18:51:34.838784 4738 slave.cpp:649] Detecting new master I0114 18:51:34.839306 4740 master.cpp:4130] Authenticating slave(171)@192.168.122.135:57018 I0114 18:51:34.839957 4740 master.cpp:4141] Using default CRAM-MD5 authenticator I0114 18:51:34.841236 4740 authenticator.hpp:170] Creating new server SASL connection I0114 18:51:34.842681 4741 authenticatee.hpp:229] Received SASL authentication mechanisms: CRAM-MD5 I0114 18:51:34.843118 4741 authenticatee.hpp:255] Attempting to authenticate with mechanism 'CRAM-MD5' I0114 18:51:34.843581 4740 authenticator.hpp:276] Received SASL authentication start I0114 18:51:34.843962 4740 authenticator.hpp:398] Authentication requires more steps I0114 18:51:34.844357 4740 authenticatee.hpp:275] Received SASL authentication step I0114 18:51:34.844780 4740 authenticator.hpp:304] Received SASL authentication step I0114 18:51:34.845113 4740 auxprop.cpp:99] Request to lookup properties for user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0114 18:51:34.845507 4740 auxprop.cpp:171] Looking up auxiliary property '*userPassword' I0114 18:51:34.845835 4740 auxprop.cpp:171] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0114 18:51:34.846238 4740 auxprop.cpp:99] Request to lookup properties for user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0114 18:51:34.846542 4740 auxprop.cpp:121] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0114 18:51:34.846806 4740 auxprop.cpp:121] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0114 18:51:34.847110 4740 authenticator.hpp:390] Authentication success I0114 18:51:34.847808 4734 authenticatee.hpp:315] Authentication success I0114 18:51:34.851029 4734 slave.cpp:747] Successfully authenticated with master master@192.168.122.135:57018 I0114 18:51:34.851608 4737 master.cpp:4188] Successfully authenticated principal 'test-principal' at slave(171)@192.168.122.135:57018 I0114 18:51:34.854962 4720 sched.cpp:151] Version: 0.22.0 I0114 18:51:34.856674 4734 slave.cpp:1075] Will retry registration in 3.085482ms if necessary I0114 18:51:34.857434 4739 sched.cpp:248] New master detected at master@192.168.122.135:57018 I0114 18:51:34.861433 4739 sched.cpp:304] Authenticating with master master@192.168.122.135:57018 I0114 18:51:34.861693 4739 sched.cpp:311] Using default CRAM-MD5 authenticatee I0114 18:51:34.857795 4737 master.cpp:3276] Registering slave at slave(171)@192.168.122.135:57018 (fedora-19) with id 20150114-185134-2272962752-57018-4720-S0 I0114 18:51:34.862951 4737 authenticatee.hpp:138] Creating new client SASL connection I0114 18:51:34.863919 4735 registrar.cpp:445] Applied 1 operations in 120272ns; attempting to update the 'registry' I0114 18:51:34.864645 4738 master.cpp:4130] Authenticating scheduler-c45273e4-6eb5-44ee-bf45-71b353db648f@192.168.122.135:57018 I0114 18:51:34.865033 4738 master.cpp:4141] Using default CRAM-MD5 authenticator I0114 18:51:34.866904 4738 authenticator.hpp:170] Creating new server SASL connection I0114 18:51:34.868840 4737 authenticatee.hpp:229] Received SASL authentication mechanisms: CRAM-MD5 I0114 18:51:34.869125 4737 authenticatee.hpp:255] Attempting to authenticate with mechanism 'CRAM-MD5' I0114 18:51:34.869523 4737 authenticator.hpp:276] Received SASL authentication start I0114 18:51:34.869835 4737 authenticator.hpp:398] Authentication requires more steps I0114 18:51:34.870213 4737 authenticatee.hpp:275] Received SASL authentication step I0114 18:51:34.870622 4737 authenticator.hpp:304] Received SASL authentication step I0114 18:51:34.870946 4737 auxprop.cpp:99] Request to lookup properties for user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0114 18:51:34.871219 4737 auxprop.cpp:171] Looking up auxiliary property '*userPassword' I0114 18:51:34.871554 4737 auxprop.cpp:171] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0114 18:51:34.871968 4737 auxprop.cpp:99] Request to lookup properties for user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0114 18:51:34.872297 4737 auxprop.cpp:121] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0114 18:51:34.872655 4737 auxprop.cpp:121] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0114 18:51:34.873024 4737 authenticator.hpp:390] Authentication success I0114 18:51:34.873428 4737 authenticatee.hpp:315] Authentication success I0114 18:51:34.873632 4739 master.cpp:4188] Successfully authenticated principal 'test-principal' at scheduler-c45273e4-6eb5-44ee-bf45-71b353db648f@192.168.122.135:57018 I0114 18:51:34.875006 4740 sched.cpp:392] Successfully authenticated with master master@192.168.122.135:57018 I0114 18:51:34.875319 4740 sched.cpp:515] Sending registration request to master@192.168.122.135:57018 I0114 18:51:34.876200 4740 sched.cpp:548] Will retry registration in 1.952991346secs if necessary I0114 18:51:34.876729 4738 master.cpp:1417] Received registration request for framework 'default' at scheduler-c45273e4-6eb5-44ee-bf45-71b353db648f@192.168.122.135:57018 I0114 18:51:34.877040 4738 master.cpp:1298] Authorizing framework principal 'test-principal' to receive offers for role '*' I0114 18:51:34.878059 4738 master.cpp:1481] Registering framework 20150114-185134-2272962752-57018-4720-0000 (default) at scheduler-c45273e4-6eb5-44ee-bf45-71b353db648f@192.168.122.135:57018 I0114 18:51:34.878473 4739 log.cpp:684] Attempting to append 300 bytes to the log I0114 18:51:34.879464 4737 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 3 I0114 18:51:34.880116 4734 hierarchical_allocator_process.hpp:319] Added framework 20150114-185134-2272962752-57018-4720-0000 I0114 18:51:34.880470 4734 hierarchical_allocator_process.hpp:839] No resources available to allocate! I0114 18:51:34.882331 4734 hierarchical_allocator_process.hpp:746] Performed allocation for 0 slaves in 1.901284ms I0114 18:51:34.884024 4741 sched.cpp:442] Framework registered with 20150114-185134-2272962752-57018-4720-0000 I0114 18:51:34.884454 4741 sched.cpp:456] Scheduler::registered took 44320ns I0114 18:51:34.881965 4737 replica.cpp:511] Replica received write request for position 3 I0114 18:51:34.885218 4737 leveldb.cpp:343] Persisting action (319 bytes) to leveldb took 134480ns I0114 18:51:34.885716 4737 replica.cpp:679] Persisted action at 3 I0114 18:51:34.886034 4739 slave.cpp:1075] Will retry registration in 22.947772ms if necessary I0114 18:51:34.886291 4740 master.cpp:3264] Ignoring register slave message from slave(171)@192.168.122.135:57018 (fedora-19) as admission is already in progress I0114 18:51:34.894690 4736 replica.cpp:658] Replica received learned notice for position 3 I0114 18:51:34.898638 4736 leveldb.cpp:343] Persisting action (321 bytes) to leveldb took 215501ns I0114 18:51:34.899055 4736 replica.cpp:679] Persisted action at 3 I0114 18:51:34.899416 4736 replica.cpp:664] Replica learned APPEND action at position 3 I0114 18:51:34.911782 4736 registrar.cpp:490] Successfully updated the 'registry' in 46.176768ms I0114 18:51:34.912286 4740 log.cpp:703] Attempting to truncate the log to 3 I0114 18:51:34.913108 4740 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 4 I0114 18:51:34.915027 4736 master.cpp:3330] Registered slave 20150114-185134-2272962752-57018-4720-S0 at slave(171)@192.168.122.135:57018 (fedora-19) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0114 18:51:34.915642 4735 hierarchical_allocator_process.hpp:453] Added slave 20150114-185134-2272962752-57018-4720-S0 (fedora-19) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (and cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] available) I0114 18:51:34.917809 4735 hierarchical_allocator_process.hpp:764] Performed allocation for slave 20150114-185134-2272962752-57018-4720-S0 in 514027ns I0114 18:51:34.916689 4738 replica.cpp:511] Replica received write request for position 4 I0114 18:51:34.915784 4741 slave.cpp:781] Registered with master master@192.168.122.135:57018; given slave ID 20150114-185134-2272962752-57018-4720-S0 I0114 18:51:34.919293 4741 slave.cpp:2588] Received ping from slave-observer(156)@192.168.122.135:57018 I0114 18:51:34.919775 4740 status_update_manager.cpp:178] Resuming sending status updates I0114 18:51:34.920374 4736 master.cpp:4072] Sending 1 offers to framework 20150114-185134-2272962752-57018-4720-0000 (default) at scheduler-c45273e4-6eb5-44ee-bf45-71b353db648f@192.168.122.135:57018 I0114 18:51:34.920569 4738 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 1.540136ms I0114 18:51:34.921092 4738 replica.cpp:679] Persisted action at 4 I0114 18:51:34.927111 4735 replica.cpp:658] Replica received learned notice for position 4 I0114 18:51:34.927299 4734 sched.cpp:605] Scheduler::resourceOffers took 1.335524ms I0114 18:51:34.930418 4735 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 1.596377ms I0114 18:51:34.930882 4735 leveldb.cpp:401] Deleting ~2 keys from leveldb took 67578ns I0114 18:51:34.931115 4735 replica.cpp:679] Persisted action at 4 I0114 18:51:34.931529 4735 replica.cpp:664] Replica learned TRUNCATE action at position 4 I0114 18:51:34.930356 4734 master.cpp:2541] Processing reply for offers: [ 20150114-185134-2272962752-57018-4720-O0 ] on slave 20150114-185134-2272962752-57018-4720-S0 at slave(171)@192.168.122.135:57018 (fedora-19) for framework 20150114-185134-2272962752-57018-4720-0000 (default) at scheduler-c45273e4-6eb5-44ee-bf45-71b353db648f@192.168.122.135:57018 I0114 18:51:34.932834 4734 master.cpp:2647] Authorizing framework principal 'test-principal' to launch task 1 as user 'jenkins' W0114 18:51:34.934442 4736 master.cpp:2124] Executor default for task 1 uses less CPUs (None) than the minimum required (0.01). Please update your executor, as this will be mandatory in future releases. W0114 18:51:34.934960 4736 master.cpp:2136] Executor default for task 1 uses less memory (None) than the minimum required (32MB). Please update your executor, as this will be mandatory in future releases. I0114 18:51:34.935878 4736 master.hpp:766] Adding task 1 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20150114-185134-2272962752-57018-4720-S0 (fedora-19) I0114 18:51:34.939453 4738 hierarchical_allocator_process.hpp:610] Updated allocation of framework 20150114-185134-2272962752-57018-4720-0000 on slave 20150114-185134-2272962752-57018-4720-S0 from cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] to cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0114 18:51:34.939950 4736 master.cpp:2897] Launching task 1 of framework 20150114-185134-2272962752-57018-4720-0000 (default) at scheduler-c45273e4-6eb5-44ee-bf45-71b353db648f@192.168.122.135:57018 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 201...",3 MESOS-2228,"SlaveTest.MesosExecutorGracefulShutdown is flaky","Observed this on internal CI {noformat} [ RUN ] SlaveTest.MesosExecutorGracefulShutdown Using temporary directory '/tmp/SlaveTest_MesosExecutorGracefulShutdown_AWdtVJ' I0124 08:14:04.399211 7926 leveldb.cpp:176] Opened db in 27.364056ms I0124 08:14:04.402632 7926 leveldb.cpp:183] Compacted db in 3.357646ms I0124 08:14:04.402691 7926 leveldb.cpp:198] Created db iterator in 23822ns I0124 08:14:04.402708 7926 leveldb.cpp:204] Seeked to beginning of db in 1913ns I0124 08:14:04.402716 7926 leveldb.cpp:273] Iterated through 0 keys in the db in 458ns I0124 08:14:04.402767 7926 replica.cpp:744] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0124 08:14:04.403728 7951 recover.cpp:449] Starting replica recovery I0124 08:14:04.404011 7951 recover.cpp:475] Replica is in EMPTY status I0124 08:14:04.407765 7950 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request I0124 08:14:04.408710 7951 recover.cpp:195] Received a recover response from a replica in EMPTY status I0124 08:14:04.419666 7951 recover.cpp:566] Updating replica status to STARTING I0124 08:14:04.429719 7953 master.cpp:262] Master 20150124-081404-16842879-47787-7926 (utopic) started on 127.0.1.1:47787 I0124 08:14:04.429790 7953 master.cpp:308] Master only allowing authenticated frameworks to register I0124 08:14:04.429802 7953 master.cpp:313] Master only allowing authenticated slaves to register I0124 08:14:04.429826 7953 credentials.hpp:36] Loading credentials for authentication from '/tmp/SlaveTest_MesosExecutorGracefulShutdown_AWdtVJ/credentials' I0124 08:14:04.430277 7953 master.cpp:357] Authorization enabled I0124 08:14:04.432682 7953 master.cpp:1219] The newly elected leader is master@127.0.1.1:47787 with id 20150124-081404-16842879-47787-7926 I0124 08:14:04.432816 7953 master.cpp:1232] Elected as the leading master! I0124 08:14:04.432894 7953 master.cpp:1050] Recovering from registrar I0124 08:14:04.433212 7950 registrar.cpp:313] Recovering registrar I0124 08:14:04.434226 7951 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 14.323302ms I0124 08:14:04.434270 7951 replica.cpp:323] Persisted replica status to STARTING I0124 08:14:04.434489 7951 recover.cpp:475] Replica is in STARTING status I0124 08:14:04.436164 7951 replica.cpp:641] Replica in STARTING status received a broadcasted recover request I0124 08:14:04.439368 7947 recover.cpp:195] Received a recover response from a replica in STARTING status I0124 08:14:04.440626 7947 recover.cpp:566] Updating replica status to VOTING I0124 08:14:04.443667 7947 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 2.698664ms I0124 08:14:04.443759 7947 replica.cpp:323] Persisted replica status to VOTING I0124 08:14:04.443925 7947 recover.cpp:580] Successfully joined the Paxos group I0124 08:14:04.444160 7947 recover.cpp:464] Recover process terminated I0124 08:14:04.444543 7949 log.cpp:660] Attempting to start the writer I0124 08:14:04.446331 7949 replica.cpp:477] Replica received implicit promise request with proposal 1 I0124 08:14:04.449329 7949 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 2.690453ms I0124 08:14:04.449388 7949 replica.cpp:345] Persisted promised to 1 I0124 08:14:04.450637 7947 coordinator.cpp:230] Coordinator attemping to fill missing position I0124 08:14:04.452271 7949 replica.cpp:378] Replica received explicit promise request for position 0 with proposal 2 I0124 08:14:04.455124 7949 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 2.593522ms I0124 08:14:04.455157 7949 replica.cpp:679] Persisted action at 0 I0124 08:14:04.456594 7951 replica.cpp:511] Replica received write request for position 0 I0124 08:14:04.456657 7951 leveldb.cpp:438] Reading position from leveldb took 30358ns I0124 08:14:04.464860 7951 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 8.164646ms I0124 08:14:04.464903 7951 replica.cpp:679] Persisted action at 0 I0124 08:14:04.465947 7949 replica.cpp:658] Replica received learned notice for position 0 I0124 08:14:04.471567 7949 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 5.587838ms I0124 08:14:04.471601 7949 replica.cpp:679] Persisted action at 0 I0124 08:14:04.471622 7949 replica.cpp:664] Replica learned NOP action at position 0 I0124 08:14:04.472682 7951 log.cpp:676] Writer started with ending position 0 I0124 08:14:04.473919 7951 leveldb.cpp:438] Reading position from leveldb took 28676ns I0124 08:14:04.491591 7951 registrar.cpp:346] Successfully fetched the registry (0B) in 58.337024ms I0124 08:14:04.491704 7951 registrar.cpp:445] Applied 1 operations in 28163ns; attempting to update the 'registry' I0124 08:14:04.493938 7953 log.cpp:684] Attempting to append 118 bytes to the log I0124 08:14:04.494122 7953 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 1 I0124 08:14:04.495069 7953 replica.cpp:511] Replica received write request for position 1 I0124 08:14:04.500089 7953 leveldb.cpp:343] Persisting action (135 bytes) to leveldb took 4.989356ms I0124 08:14:04.500123 7953 replica.cpp:679] Persisted action at 1 I0124 08:14:04.501271 7950 replica.cpp:658] Replica received learned notice for position 1 I0124 08:14:04.505698 7950 leveldb.cpp:343] Persisting action (137 bytes) to leveldb took 4.396221ms I0124 08:14:04.505734 7950 replica.cpp:679] Persisted action at 1 I0124 08:14:04.505755 7950 replica.cpp:664] Replica learned APPEND action at position 1 I0124 08:14:04.507313 7950 registrar.cpp:490] Successfully updated the 'registry' in 15.52896ms I0124 08:14:04.507478 7953 log.cpp:703] Attempting to truncate the log to 1 I0124 08:14:04.507848 7953 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 2 I0124 08:14:04.508743 7953 replica.cpp:511] Replica received write request for position 2 I0124 08:14:04.509214 7950 registrar.cpp:376] Successfully recovered registrar I0124 08:14:04.509682 7946 master.cpp:1077] Recovered 0 slaves from the Registry (82B) ; allowing 10mins for slaves to re-register I0124 08:14:04.514654 7953 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 5.880031ms I0124 08:14:04.514689 7953 replica.cpp:679] Persisted action at 2 I0124 08:14:04.515736 7953 replica.cpp:658] Replica received learned notice for position 2 I0124 08:14:04.522014 7953 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 6.245138ms I0124 08:14:04.522086 7953 leveldb.cpp:401] Deleting ~1 keys from leveldb took 37803ns I0124 08:14:04.522107 7953 replica.cpp:679] Persisted action at 2 I0124 08:14:04.522128 7953 replica.cpp:664] Replica learned TRUNCATE action at position 2 I0124 08:14:04.531460 7926 containerizer.cpp:103] Using isolation: posix/cpu,posix/mem I0124 08:14:04.547194 7951 slave.cpp:173] Slave started on 208)@127.0.1.1:47787 I0124 08:14:04.555682 7951 credentials.hpp:84] Loading credential for authentication from '/tmp/SlaveTest_MesosExecutorGracefulShutdown_kB74xo/credential' I0124 08:14:04.556622 7951 slave.cpp:282] Slave using credential for: test-principal I0124 08:14:04.557052 7951 slave.cpp:300] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0124 08:14:04.557842 7951 slave.cpp:329] Slave hostname: utopic I0124 08:14:04.558091 7951 slave.cpp:330] Slave checkpoint: false W0124 08:14:04.558352 7951 slave.cpp:332] Disabling checkpointing is deprecated and the --checkpoint flag will be removed in a future release. Please avoid using this flag I0124 08:14:04.566864 7948 state.cpp:33] Recovering state from '/tmp/SlaveTest_MesosExecutorGracefulShutdown_kB74xo/meta' I0124 08:14:04.575711 7951 status_update_manager.cpp:197] Recovering status update manager I0124 08:14:04.575904 7951 containerizer.cpp:300] Recovering containerizer I0124 08:14:04.577112 7951 slave.cpp:3519] Finished recovery I0124 08:14:04.577374 7926 sched.cpp:151] Version: 0.22.0 I0124 08:14:04.578663 7950 sched.cpp:248] New master detected at master@127.0.1.1:47787 I0124 08:14:04.578759 7950 sched.cpp:304] Authenticating with master master@127.0.1.1:47787 I0124 08:14:04.578781 7950 sched.cpp:311] Using default CRAM-MD5 authenticatee I0124 08:14:04.579071 7950 authenticatee.hpp:138] Creating new client SASL connection I0124 08:14:04.579550 7947 master.cpp:4129] Authenticating scheduler-4a6c5cde-c54a-455a-aaad-6fc4e8ee99ef@127.0.1.1:47787 I0124 08:14:04.579582 7947 master.cpp:4140] Using default CRAM-MD5 authenticator I0124 08:14:04.580031 7947 authenticator.hpp:170] Creating new server SASL connection I0124 08:14:04.580402 7947 authenticatee.hpp:229] Received SASL authentication mechanisms: CRAM-MD5 I0124 08:14:04.580430 7947 authenticatee.hpp:255] Attempting to authenticate with mechanism 'CRAM-MD5' I0124 08:14:04.580538 7947 authenticator.hpp:276] Received SASL authentication start I0124 08:14:04.580581 7947 authenticator.hpp:398] Authentication requires more steps I0124 08:14:04.580651 7947 authenticatee.hpp:275] Received SASL authentication step I0124 08:14:04.580746 7947 authenticator.hpp:304] Received SASL authentication step I0124 08:14:04.580837 7947 authenticator.hpp:390] Authentication success I0124 08:14:04.580940 7947 authenticatee.hpp:315] Authentication success I0124 08:14:04.581009 7947 master.cpp:4187] Successfully authenticated principal 'test-principal' at scheduler-4a6c5cde-c54a-455a-aaad-6fc4e8ee99ef@127.0.1.1:47787 I0124 08:14:04.581328 7947 sched.cpp:392] Successfully authenticated with master master@127.0.1.1:47787 I0124 08:14:04.581509 7947 master.cpp:1420] Received registration request for framework 'default' at scheduler-4a6c5cde-c54a-455a-aaad-6fc4e8ee99ef@127.0.1.1:47787 I0124 08:14:04.581585 7947 master.cpp:1298] Authorizing framework principal 'test-principal' to receive offers for role '*' I0124 08:14:04.582033 7947 master.cpp:1484] Registering framework 20150124-081404-16842879-47787-7926-0000 (default) at scheduler-4a6c5cde-c54a-455a-aaad-6fc4e8ee99ef@127.0.1.1:47787 I0124 08:14:04.582595 7947 hierarchical_allocator_process.hpp:319] Added framework 20150124-081404-16842879-47787-7926-0000 I0124 08:14:04.583051 7947 sched.cpp:442] Framework registered with 20150124-081404-16842879-47787-7926-0000 I0124 08:14:04.584087 7951 slave.cpp:613] New master detected at master@127.0.1.1:47787 I0124 08:14:04.584388 7951 slave.cpp:676] Authenticating with master master@127.0.1.1:47787 I0124 08:14:04.584564 7951 slave.cpp:681] Using default CRAM-MD5 authenticatee I0124 08:14:04.584951 7951 slave.cpp:649] Detecting new master I0124 08:14:04.585219 7951 status_update_manager.cpp:171] Pausing sending status updates I0124 08:14:04.585604 7951 authenticatee.hpp:138] Creating new client SASL connection I0124 08:14:04.587666 7953 master.cpp:4129] Authenticating slave(208)@127.0.1.1:47787 I0124 08:14:04.587702 7953 master.cpp:4140] Using default CRAM-MD5 authenticator I0124 08:14:04.588434 7953 authenticator.hpp:170] Creating new server SASL connection I0124 08:14:04.588764 7953 authenticatee.hpp:229] Received SASL authentication mechanisms: CRAM-MD5 I0124 08:14:04.588790 7953 authenticatee.hpp:255] Attempting to authenticate with mechanism 'CRAM-MD5' I0124 08:14:04.588896 7953 authenticator.hpp:276] Received SASL authentication start I0124 08:14:04.588935 7953 authenticator.hpp:398] Authentication requires more steps I0124 08:14:04.589005 7953 authenticatee.hpp:275] Received SASL authentication step I0124 08:14:04.589082 7953 authenticator.hpp:304] Received SASL authentication step I0124 08:14:04.589140 7953 authenticator.hpp:390] Authentication success I0124 08:14:04.589232 7953 authenticatee.hpp:315] Authentication success I0124 08:14:04.589300 7953 master.cpp:4187] Successfully authenticated principal 'test-principal' at slave(208)@127.0.1.1:47787 I0124 08:14:04.589587 7953 slave.cpp:747] Successfully authenticated with master master@127.0.1.1:47787 I0124 08:14:04.589913 7953 master.cpp:3275] Registering slave at slave(208)@127.0.1.1:47787 (utopic) with id 20150124-081404-16842879-47787-7926-S0 I0124 08:14:04.590322 7953 registrar.cpp:445] Applied 1 operations in 60404ns; attempting to update the 'registry' I0124 08:14:04.595336 7948 log.cpp:684] Attempting to append 283 bytes to the log I0124 08:14:04.595552 7948 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 3 I0124 08:14:04.596535 7948 replica.cpp:511] Replica received write request for position 3 I0124 08:14:04.597846 7951 master.cpp:3263] Ignoring register slave message from slave(208)@127.0.1.1:47787 (utopic) as admission is already in progress I0124 08:14:04.602326 7948 leveldb.cpp:343] Persisting action (302 bytes) to leveldb took 5.758211ms I0124 08:14:04.602363 7948 replica.cpp:679] Persisted action at 3 I0124 08:14:04.603492 7951 replica.cpp:658] Replica received learned notice for position 3 I0124 08:14:04.608952 7951 leveldb.cpp:343] Persisting action (304 bytes) to leveldb took 5.427195ms I0124 08:14:04.608985 7951 replica.cpp:679] Persisted action at 3 I0124 08:14:04.609007 7951 replica.cpp:664] Replica learned APPEND action at position 3 I0124 08:14:04.610643 7951 registrar.cpp:490] Successfully updated the 'registry' in 20.258048ms I0124 08:14:04.610800 7948 log.cpp:703] Attempting to truncate the log to 3 I0124 08:14:04.611184 7948 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 4 I0124 08:14:04.612076 7948 replica.cpp:511] Replica received write request for position 4 I0124 08:14:04.613061 7946 master.cpp:3329] Registered slave 20150124-081404-16842879-47787-7926-S0 at slave(208)@127.0.1.1:47787 (utopic) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0124 08:14:04.613299 7946 hierarchical_allocator_process.hpp:453] Added slave 20150124-081404-16842879-47787-7926-S0 (utopic) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (and cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] available) I0124 08:14:04.613688 7946 slave.cpp:781] Registered with master master@127.0.1.1:47787; given slave ID 20150124-081404-16842879-47787-7926-S0 I0124 08:14:04.614112 7946 master.cpp:4071] Sending 1 offers to framework 20150124-081404-16842879-47787-7926-0000 (default) at scheduler-4a6c5cde-c54a-455a-aaad-6fc4e8ee99ef@127.0.1.1:47787 I0124 08:14:04.614228 7946 status_update_manager.cpp:178] Resuming sending status updates I0124 08:14:04.617481 7947 master.cpp:2677] Processing ACCEPT call for offers: [ 20150124-081404-16842879-47787-7926-O0 ] on slave 20150124-081404-16842879-47787-7926-S0 at slave(208)@127.0.1.1:47787 (utopic) for framework 20150124-081404-16842879-47787-7926-0000 (default) at scheduler-4a6c5cde-c54a-455a-aaad-6fc4e8ee99ef@127.0.1.1:47787 I0124 08:14:04.617535 7947 master.cpp:2513] Authorizing framework principal 'test-principal' to launch task 7c16772d-4aed-4719-81c4-658a2cc22543 as user 'jenkins' I0124 08:14:04.618736 7947 master.hpp:782] Adding task 7c16772d-4aed-4719-81c4-658a2cc22543 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20150124-081404-16842879-47787-7926-S0 (utopic) I0124 08:14:04.618854 7947 master.cpp:2885] Launching task 7c16772d-4aed-4719-81c4-658a2cc22543 of framework 20150124-081404-16842879-47787-7926-0000 (default) at scheduler-4a6c5cde-c54a-455a-aaad-6fc4e8ee99ef@127.0.1.1:47787 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20150124-081404-16842879-47787-7926-S0 at slave(208)@127.0.1.1:47787 (utopic) I0124 08:14:04.619209 7947 slave.cpp:1130] Got assigned task 7c16772d-4aed-4719-81c4-658a2cc22543 for framework 20150124-081404-16842879-47787-7926-0000 I0124 08:14:04.619472 7948 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 7.364828ms I0124 08:14:04.619941 7948 replica.cpp:679] Persisted action at 4 I0124 08:14:04.624851 7953 replica.cpp:658] Replica received learned notice for position 4 I0124 08:14:04.625757 7947 slave.cpp:1245] Launching task 7c16772d-4aed-4719-81c4-658a2cc22543 for framework 20150124-081404-16842879-47787-7926-0000 I0124 08:14:04.630590 7953 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 5.705336ms I0124 08:14:04.630805 7953 leveldb.cpp:401] Deleting ~2 keys from leveldb took 51263ns I0124 08:14:04.630828 7953 replica.cpp:679] Persisted action at 4 I0124 08:14:04.630851 7953 replica.cpp:664] Replica learned TRUNCATE action at position 4 I0124 08:14:04.633968 7947 slave.cpp:3921] Launching executor 7c16772d-4aed-4719-81c4-658a2cc22543 of framework 20150124-081404-16842879-47787-7926-0000 in work directory '/tmp/SlaveTest_MesosExecutorGracefulShutdown_kB74xo/slaves/20150124-081404-16842879-47787-7926-S0/frameworks/20150124-081404-16842879-47787-7926-0000/executors/7c16772d-4aed-4719-81c4-658a2cc22543/runs/53887a08-f11d-4a2f-a659-a715d9fcf3d2' I0124 08:14:04.634963 7951 containerizer.cpp:445] Starting container '53887a08-f11d-4a2f-a659-a715d9fcf3d2' for executor '7c16772d-4aed-4719-81c4-658a2cc22543' of framework '20150124-081404-16842879-47787-7926-0000' W0124 08:14:04.636931 7951 containerizer.cpp:296] CommandInfo.grace_period flag is not set, using default value: 3secs I0124 08:14:04.655591 7947 slave.cpp:1368] Queuing task '7c16772d-4aed-4719-81c4-658a2cc22543' for executor 7c16772d-4aed-4719-81c4-658a2cc22543 of framework '20150124-081404-16842879-47787-7926-0000 I0124 08:14:04.656992 7951 launcher.cpp:137] Forked child with pid '11030' for container '53887a08-f11d-4a2f-a659-a715d9fcf3d2' I0124 08:14:04.673646 7951 slave.cpp:2890] Monitoring executor '7c16772d-4aed-4719-81c4-658a2cc22543' of framework '20150124-081404-16842879-47787-7926-0000' in container '53887a08-f11d-4a2f-a659-a715d9fcf3d2' I0124 08:14:04.964946 11044 exec.cpp:147] Version: 0.22.0 I0124 08:14:05.113059 7948 slave.cpp:1912] Got registration for executor '7c16772d-4aed-4719-81c4-658a2cc22543' of framework 20150124-081404-16842879-47787-7926-0000 from executor(1)@127.0.1.1:49174 I0124 08:14:05.121086 7948 slave.cpp:2031] Flushing queued task 7c16772d-4aed-4719-81c4-658a2cc22543 for executor '7c16772d-4aed-4719-81c4-658a2cc22543' of framework 20150124-081404-16842879-47787-7926-0000 I0124 08:14:05.266849 11062 exec.cpp:221] Executor registered on slave 20150124-081404-16842879-47787-7926-S0 Shutdown timeout is set to 3secsRegistered executor on utopic Starting task 7c16772d-4aed-4719-81c4-658a2cc22543 Forked command at 11067 sh -c 'sleep 1000' I0124 08:14:05.492084 7953 slave.cpp:2265] Handling status update TASK_RUNNING (UUID: 54742a87-ef02-4e72-a19b-83b0eeb62568) for task 7c16772d-4aed-4719-81c4-658a2cc22543 of framework 20150124-081404-16842879-47787-7926-0000 from executor(1)@127.0.1.1:49174 I0124 08:14:05.492805 7953 status_update_manager.cpp:317] Received status update TASK_RUNNING (UUID: 54742a87-ef02-4e72-a19b-83b0eeb62568) for task 7c16772d-4aed-4719-81c4-658a2cc22543 of framework 20150124-081404-16842879-47787-7926-0000 I0124 08:14:05.493762 7953 slave.cpp:2508] Forwarding the update TASK_RUNNING (UUID: 54742a87-ef02-4e72-a19b-83b0eeb62568) for task 7c16772d-4aed-4719-81c4-658a2cc22543 of framework 20150124-081404-16842879-47787-7926-0000 to master@127.0.1.1:47787 I0124 08:14:05.493948 7953 slave.cpp:2441] Sending acknowledgement for status update TASK_RUNNING (UUID: 54742a87-ef02-4e72-a19b-83b0eeb62568) for task 7c16772d-4aed-4719-81c4-658a2cc22543 of framework 20150124-081404-16842879-47787-7926-0000 to executor(1)@127.0.1.1:49174 I0124 08:14:05.495378 7949 master.cpp:3652] Forwarding status update TASK_RUNNING (UUID: 54742a87-ef02-4e72-a19b-83b0eeb62568) for task 7c16772d-4aed-4719-81c4-658a2cc22543 of framework 20150124-081404-16842879-47787-7926-0000 I0124 08:14:05.495584 7949 master.cpp:3624]...",3 MESOS-2230,"Update RateLimiter to allow the acquired future to be discarded","Currently there is no way for the future returned by RateLimiter's acquire() to be discarded by the user of the limiter. This is useful in cases where the user is no longer interested in the permit. See MESOS-1148 for an example use case.",3 MESOS-2232,"Suppress MockAllocator::transformAllocation() warnings.","After transforming allocated resources feature was added to allocator, a number of warnings are popping out for allocator tests. Commits leading to this behaviour: {{dacc88292cc13d4b08fe8cda4df71110a96cb12a}} {{5a02d5bdc75d3b1149dcda519016374be06ec6bd}} corresponding reviews: https://reviews.apache.org/r/29083 https://reviews.apache.org/r/29084 Here is an example: {code} [ RUN ] MasterAllocatorTest/0.FrameworkReregistersFirst GMOCK WARNING: Uninteresting mock function call - taking default action specified at: ../../../src/tests/mesos.hpp:719: Function call: transformAllocation(@0x7fd3bb5274d8 20150115-185632-1677764800-59671-44186-0000, @0x7fd3bb5274f8 20150115-185632-1677764800-59671-44186-S0, @0x1119140e0 16-byte object ) Stack trace: [ OK ] MasterAllocatorTest/0.FrameworkReregistersFirst (204 ms) {code}",3 MESOS-2233,"Run ASF CI mesos builds inside docker","There are several limitations to mesos projects current state of CI, which is run on builds.a.o --> Only runs on Ubuntu --> Doesn't run any tests that deal with cgroups --> Doesn't run any tests that need root permissions Now that ASF CI supports docker (https://issues.apache.org/jira/browse/BUILDS-25), it would be great for the Mesos project to use it.",5 MESOS-2241,"DiskUsageCollectorTest.SymbolicLink test is flaky","Observed this on a local machine running linux w/ sudo. {code} [ RUN ] DiskUsageCollectorTest.SymbolicLink ../../src/tests/disk_quota_tests.cpp:138: Failure Expected: (usage1.get()) < (Kilobytes(16)), actual: 24KB vs 8-byte object <00-40 00-00 00-00 00-00> [ FAILED ] DiskUsageCollectorTest.SymbolicLink (201 ms) {code}",1 MESOS-2257,"Version the Operator/Admin API","As a consumer of the Mesos HTTP API, it is necessary for us to determine the current version of Mesos so that we can parse the JSON documents returned correctly (since they change from version to version). Currently we're doing this by fetching state.json, parsing it and pulling out the version field. A more idiomatic way to do this would be to filter on the content-type in the header itself. To give a more concrete example, currently the JSON documents returned by the HTTP API return the following headers: {code} HTTP/1.1 200 OK Date: Fri, 23 Jan 2015 21:31:37 GMT Content-Length: 9352 Content-Type: application/json {code} Something like the following (e.g. for master/state.json) would be easy to switch upon: {code} HTTP/1.1 200 OK Date: Fri, 23 Jan 2015 21:31:37 GMT Content-Length: 9352 Content-Type: application/vnd.mesos.master.state.v0.20.1+json; charset=utf-8 {code} The vnd prefix is typically used for vendor specific file types (see: http://en.wikipedia.org/wiki/Internet_media_type#Prefix_vnd). Charset=utf-8 is required for JSON documents and is currently being omitted. This content-type would change for each document type, for example: {code} application/vnd.mesos.master.state.v0.20.1+json; charset=utf-8 application/vnd.mesos.master.stats.v0.20.1+json; charset=utf-8 application/vnd.mesos.slave.state.v0.20.1+json; charset=utf-8 application/vnd.mesos.slave.stats.v0.20.1+json; charset=utf-8 {code} Alternatively, the version could be appended as an extra field: {code} application/vnd.mesos.master.state+json; charset=utf-8; version=v0.20.1 application/vnd.mesos.master.stats+json; charset=utf-8; version=v0.20.1 application/vnd.mesos.slave.state+json; charset=utf-8; version=v0.20.1 application/vnd.mesos.slave.stats+json; charset=utf-8; version=v0.20.1 {code} Thanks!",13 MESOS-2273,"Add ""tests"" target to Makefile for building-but-not-running tests.","'make check' allows one to build and run the test suite. However, often we just want to build the tests. Currently, this is done by setting GTEST_FILTER to an empty string. It will be nice to have a dedicated target such as 'make tests' that allows one to build the test suite without running it.",1 MESOS-2275,"Document header include rules in style guide","We have several ways of sorting, grouping and ordering headers includes in Mesos. We should agree on a rule set and do a style scan.",3 MESOS-2279,"Future callbacks should be cleared once the future has transitioned.","For example, when a future has transitioned into READY state, all onDiscard callbacks should be cleared to avoid potential cyclic dependency and memory leak. For instance: {noformat} Promise promise; Future f = promise.future(); f.onDiscard(lambda::bind(&SomeFunc, f)); promise.set(Nothing()); {noformat} The above code has a cyclic dependency because f.data has a reference to the future inside an std::function which has a reference to f.data.",2 MESOS-2281,"Deprecate plain text Credential format.","Currently two formats of credentials are supported: JSON {code} ""credentials"": [ { ""principal"": ""sherman"", ""secret"": ""kitesurf"" } {code} And a new line file: {code} principal1 secret1 pricipal2 secret2 {code} We should deprecate the new line format and remove support for the old format.",3 MESOS-2283,"SlaveRecoveryTest.ReconcileKillTask is flaky.","Saw this on an internal CI: {noformat} [ RUN ] SlaveRecoveryTest/0.ReconcileKillTask Using temporary directory '/tmp/SlaveRecoveryTest_0_ReconcileKillTask_D5wSwg' I0126 19:10:52.005317 13291 leveldb.cpp:176] Opened db in 978670ns I0126 19:10:52.006155 13291 leveldb.cpp:183] Compacted db in 541346ns I0126 19:10:52.006494 13291 leveldb.cpp:198] Created db iterator in 24562ns I0126 19:10:52.006798 13291 leveldb.cpp:204] Seeked to beginning of db in 3254ns I0126 19:10:52.007036 13291 leveldb.cpp:273] Iterated through 0 keys in the db in 949ns I0126 19:10:52.007369 13291 replica.cpp:744] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0126 19:10:52.008362 13308 recover.cpp:449] Starting replica recovery I0126 19:10:52.009141 13308 recover.cpp:475] Replica is in EMPTY status I0126 19:10:52.016494 13308 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request I0126 19:10:52.017333 13309 recover.cpp:195] Received a recover response from a replica in EMPTY status I0126 19:10:52.018244 13309 recover.cpp:566] Updating replica status to STARTING I0126 19:10:52.019064 13305 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 113577ns I0126 19:10:52.019487 13305 replica.cpp:323] Persisted replica status to STARTING I0126 19:10:52.019937 13309 recover.cpp:475] Replica is in STARTING status I0126 19:10:52.021492 13307 replica.cpp:641] Replica in STARTING status received a broadcasted recover request I0126 19:10:52.022665 13309 recover.cpp:195] Received a recover response from a replica in STARTING status I0126 19:10:52.027971 13312 recover.cpp:566] Updating replica status to VOTING I0126 19:10:52.028590 13312 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 78452ns I0126 19:10:52.028869 13312 replica.cpp:323] Persisted replica status to VOTING I0126 19:10:52.029252 13312 recover.cpp:580] Successfully joined the Paxos group I0126 19:10:52.030828 13307 recover.cpp:464] Recover process terminated I0126 19:10:52.049947 13306 master.cpp:262] Master 20150126-191052-2272962752-35545-13291 (fedora-19) started on 192.168.122.135:35545 I0126 19:10:52.050499 13306 master.cpp:308] Master only allowing authenticated frameworks to register I0126 19:10:52.050765 13306 master.cpp:313] Master only allowing authenticated slaves to register I0126 19:10:52.051048 13306 credentials.hpp:36] Loading credentials for authentication from '/tmp/SlaveRecoveryTest_0_ReconcileKillTask_D5wSwg/credentials' I0126 19:10:52.051589 13306 master.cpp:357] Authorization enabled I0126 19:10:52.052531 13305 hierarchical_allocator_process.hpp:285] Initialized hierarchical allocator process I0126 19:10:52.052881 13311 whitelist_watcher.cpp:65] No whitelist given I0126 19:10:52.055524 13306 master.cpp:1219] The newly elected leader is master@192.168.122.135:35545 with id 20150126-191052-2272962752-35545-13291 I0126 19:10:52.056226 13306 master.cpp:1232] Elected as the leading master! I0126 19:10:52.056639 13306 master.cpp:1050] Recovering from registrar I0126 19:10:52.057045 13307 registrar.cpp:313] Recovering registrar I0126 19:10:52.058554 13312 log.cpp:660] Attempting to start the writer I0126 19:10:52.060868 13309 replica.cpp:477] Replica received implicit promise request with proposal 1 I0126 19:10:52.061691 13309 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 91680ns I0126 19:10:52.062261 13309 replica.cpp:345] Persisted promised to 1 I0126 19:10:52.064559 13310 coordinator.cpp:230] Coordinator attemping to fill missing position I0126 19:10:52.069105 13311 replica.cpp:378] Replica received explicit promise request for position 0 with proposal 2 I0126 19:10:52.069860 13311 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 94858ns I0126 19:10:52.070350 13311 replica.cpp:679] Persisted action at 0 I0126 19:10:52.080348 13305 replica.cpp:511] Replica received write request for position 0 I0126 19:10:52.081153 13305 leveldb.cpp:438] Reading position from leveldb took 62247ns I0126 19:10:52.081676 13305 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 81487ns I0126 19:10:52.082053 13305 replica.cpp:679] Persisted action at 0 I0126 19:10:52.083566 13309 replica.cpp:658] Replica received learned notice for position 0 I0126 19:10:52.085734 13309 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 283144ns I0126 19:10:52.086067 13309 replica.cpp:679] Persisted action at 0 I0126 19:10:52.086448 13309 replica.cpp:664] Replica learned NOP action at position 0 I0126 19:10:52.089784 13306 log.cpp:676] Writer started with ending position 0 I0126 19:10:52.093415 13309 leveldb.cpp:438] Reading position from leveldb took 66744ns I0126 19:10:52.104814 13306 registrar.cpp:346] Successfully fetched the registry (0B) in 47.451136ms I0126 19:10:52.105731 13306 registrar.cpp:445] Applied 1 operations in 42124ns; attempting to update the 'registry' I0126 19:10:52.111935 13305 log.cpp:684] Attempting to append 131 bytes to the log I0126 19:10:52.112754 13305 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 1 I0126 19:10:52.114297 13308 replica.cpp:511] Replica received write request for position 1 I0126 19:10:52.114908 13308 leveldb.cpp:343] Persisting action (150 bytes) to leveldb took 98332ns I0126 19:10:52.115387 13308 replica.cpp:679] Persisted action at 1 I0126 19:10:52.117277 13305 replica.cpp:658] Replica received learned notice for position 1 I0126 19:10:52.118142 13305 leveldb.cpp:343] Persisting action (152 bytes) to leveldb took 227799ns I0126 19:10:52.118621 13305 replica.cpp:679] Persisted action at 1 I0126 19:10:52.118979 13305 replica.cpp:664] Replica learned APPEND action at position 1 I0126 19:10:52.121311 13305 registrar.cpp:490] Successfully updated the 'registry' in 15.161088ms I0126 19:10:52.121548 13311 log.cpp:703] Attempting to truncate the log to 1 I0126 19:10:52.122697 13311 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 2 I0126 19:10:52.124316 13307 replica.cpp:511] Replica received write request for position 2 I0126 19:10:52.124913 13307 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 87281ns I0126 19:10:52.125334 13307 replica.cpp:679] Persisted action at 2 I0126 19:10:52.127018 13311 replica.cpp:658] Replica received learned notice for position 2 I0126 19:10:52.127835 13311 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 201050ns I0126 19:10:52.128232 13311 leveldb.cpp:401] Deleting ~1 keys from leveldb took 78012ns I0126 19:10:52.128835 13305 registrar.cpp:376] Successfully recovered registrar I0126 19:10:52.128551 13311 replica.cpp:679] Persisted action at 2 I0126 19:10:52.130105 13311 replica.cpp:664] Replica learned TRUNCATE action at position 2 I0126 19:10:52.131479 13312 master.cpp:1077] Recovered 0 slaves from the Registry (95B) ; allowing 10mins for slaves to re-register I0126 19:10:52.143465 13291 containerizer.cpp:103] Using isolation: posix/cpu,posix/mem I0126 19:10:52.170471 13309 slave.cpp:173] Slave started on 101)@192.168.122.135:35545 I0126 19:10:52.171723 13309 credentials.hpp:84] Loading credential for authentication from '/tmp/SlaveRecoveryTest_0_ReconcileKillTask_qbguuM/credential' I0126 19:10:52.172286 13309 slave.cpp:282] Slave using credential for: test-principal I0126 19:10:52.172821 13309 slave.cpp:300] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0126 19:10:52.173982 13309 slave.cpp:329] Slave hostname: fedora-19 I0126 19:10:52.174505 13309 slave.cpp:330] Slave checkpoint: true I0126 19:10:52.179308 13309 state.cpp:33] Recovering state from '/tmp/SlaveRecoveryTest_0_ReconcileKillTask_qbguuM/meta' I0126 19:10:52.180075 13308 status_update_manager.cpp:197] Recovering status update manager I0126 19:10:52.180611 13308 containerizer.cpp:300] Recovering containerizer I0126 19:10:52.182473 13309 slave.cpp:3519] Finished recovery I0126 19:10:52.184403 13312 slave.cpp:613] New master detected at master@192.168.122.135:35545 I0126 19:10:52.184916 13312 slave.cpp:676] Authenticating with master master@192.168.122.135:35545 I0126 19:10:52.185230 13312 slave.cpp:681] Using default CRAM-MD5 authenticatee I0126 19:10:52.185715 13312 slave.cpp:649] Detecting new master I0126 19:10:52.186420 13312 authenticatee.hpp:138] Creating new client SASL connection I0126 19:10:52.186002 13311 status_update_manager.cpp:171] Pausing sending status updates I0126 19:10:52.188293 13312 master.cpp:4129] Authenticating slave(101)@192.168.122.135:35545 I0126 19:10:52.188748 13312 master.cpp:4140] Using default CRAM-MD5 authenticator I0126 19:10:52.189525 13312 authenticator.hpp:170] Creating new server SASL connection I0126 19:10:52.191082 13305 authenticatee.hpp:229] Received SASL authentication mechanisms: CRAM-MD5 I0126 19:10:52.191550 13305 authenticatee.hpp:255] Attempting to authenticate with mechanism 'CRAM-MD5' I0126 19:10:52.191990 13312 authenticator.hpp:276] Received SASL authentication start I0126 19:10:52.192365 13312 authenticator.hpp:398] Authentication requires more steps I0126 19:10:52.192800 13311 authenticatee.hpp:275] Received SASL authentication step I0126 19:10:52.193244 13312 authenticator.hpp:304] Received SASL authentication step I0126 19:10:52.193565 13312 auxprop.cpp:99] Request to lookup properties for user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0126 19:10:52.193902 13312 auxprop.cpp:171] Looking up auxiliary property '*userPassword' I0126 19:10:52.194301 13312 auxprop.cpp:171] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0126 19:10:52.195669 13312 auxprop.cpp:99] Request to lookup properties for user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0126 19:10:52.196048 13312 auxprop.cpp:121] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0126 19:10:52.196395 13312 auxprop.cpp:121] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0126 19:10:52.196723 13312 authenticator.hpp:390] Authentication success I0126 19:10:52.197206 13305 authenticatee.hpp:315] Authentication success I0126 19:10:52.204121 13305 slave.cpp:747] Successfully authenticated with master master@192.168.122.135:35545 I0126 19:10:52.204676 13310 master.cpp:4187] Successfully authenticated principal 'test-principal' at slave(101)@192.168.122.135:35545 I0126 19:10:52.205729 13305 slave.cpp:1075] Will retry registration in 5.608661ms if necessary I0126 19:10:52.206451 13310 master.cpp:3275] Registering slave at slave(101)@192.168.122.135:35545 (fedora-19) with id 20150126-191052-2272962752-35545-13291-S0 I0126 19:10:52.210019 13310 registrar.cpp:445] Applied 1 operations in 235087ns; attempting to update the 'registry' I0126 19:10:52.220736 13308 slave.cpp:1075] Will retry registration in 9.28397ms if necessary I0126 19:10:52.221309 13311 master.cpp:3263] Ignoring register slave message from slave(101)@192.168.122.135:35545 (fedora-19) as admission is already in progress I0126 19:10:52.224818 13307 log.cpp:684] Attempting to append 302 bytes to the log I0126 19:10:52.225554 13307 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 3 I0126 19:10:52.227422 13305 replica.cpp:511] Replica received write request for position 3 I0126 19:10:52.227969 13305 leveldb.cpp:343] Persisting action (321 bytes) to leveldb took 100350ns I0126 19:10:52.228276 13305 replica.cpp:679] Persisted action at 3 I0126 19:10:52.232475 13312 replica.cpp:658] Replica received learned notice for position 3 I0126 19:10:52.233280 13312 leveldb.cpp:343] Persisting action (323 bytes) to leveldb took 546567ns I0126 19:10:52.233726 13312 replica.cpp:679] Persisted action at 3 I0126 19:10:52.234035 13312 replica.cpp:664] Replica learned APPEND action at position 3 I0126 19:10:52.236556 13310 registrar.cpp:490] Successfully updated the 'registry' in 26.040064ms I0126 19:10:52.237330 13305 log.cpp:703] Attempting to truncate the log to 3 I0126 19:10:52.238056 13311 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 4 I0126 19:10:52.239594 13311 replica.cpp:511] Replica received write request for position 4 I0126 19:10:52.240129 13311 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 92868ns I0126 19:10:52.240458 13311 replica.cpp:679] Persisted action at 4 I0126 19:10:52.241976 13308 replica.cpp:658] Replica received learned notice for position 4 I0126 19:10:52.242645 13308 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 95635ns I0126 19:10:52.242990 13308 leveldb.cpp:401] Deleting ~2 keys from leveldb took 58066ns I0126 19:10:52.243337 13308 replica.cpp:679] Persisted action at 4 I0126 19:10:52.243695 13308 replica.cpp:664] Replica learned TRUNCATE action at position 4 I0126 19:10:52.245657 13291 sched.cpp:151] Version: 0.22.0 I0126 19:10:52.247625 13305 master.cpp:3329] Registered slave 20150126-191052-2272962752-35545-13291-S0 at slave(101)@192.168.122.135:35545 (fedora-19) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0126 19:10:52.248942 13307 slave.cpp:781] Registered with master master@192.168.122.135:35545; given slave ID 20150126-191052-2272962752-35545-13291-S0 I0126 19:10:52.250396 13307 slave.cpp:797] Checkpointing SlaveInfo to '/tmp/SlaveRecoveryTest_0_ReconcileKillTask_qbguuM/meta/slaves/20150126-191052-2272962752-35545-13291-S0/slave.info' I0126 19:10:52.250731 13309 status_update_manager.cpp:178] Resuming sending status updates I0126 19:10:52.251765 13307 slave.cpp:2588] Received ping from slave-observer(99)@192.168.122.135:35545 I0126 19:10:52.247951 13310 hierarchical_allocator_process.hpp:453] Added slave 20150126-191052-2272962752-35545-13291-S0 (fedora-19) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (and cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] available) I0126 19:10:52.252810 13310 hierarchical_allocator_process.hpp:831] No resources available to allocate! I0126 19:10:52.254365 13310 hierarchical_allocator_process.hpp:756] Performed allocation for slave 20150126-191052-2272962752-35545-13291-S0 in 1.732701ms I0126 19:10:52.254137 13307 sched.cpp:248] New master detected at master@192.168.122.135:35545 I0126 19:10:52.257863 13307 sched.cpp:304] Authenticating with master master@192.168.122.135:35545 I0126 19:10:52.258249 13307 sched.cpp:311] Using default CRAM-MD5 authenticatee I0126 19:10:52.258908 13306 authenticatee.hpp:138] Creating new client SASL connection I0126 19:10:52.261397 13309 master.cpp:4129] Authenticating scheduler-6da85b48-b57f-4202-b630-c45f8f652321@192.168.122.135:35545 I0126 19:10:52.261776 13309 master.cpp:4140] Using default CRAM-MD5 authenticator I0126 19:10:52.264528 13309 authenticator.hpp:170] Creating new server SASL connection I0126 19:10:52.266248 13312 authenticatee.hpp:229] Received SASL authentication mechanisms: CRAM-MD5 I0126 19:10:52.266749 13312 authenticatee.hpp:255] Attempting to authenticate with mechanism 'CRAM-MD5' I0126 19:10:52.267143 13312 authenticator.hpp:276] Received SASL authentication start I0126 19:10:52.267525 13312 authenticator.hpp:398] Authentication requires more steps I0126 19:10:52.267917 13312 authenticatee.hpp:275] Received SASL authentication step I0126 19:10:52.268404 13312 authenticator.hpp:304] Received SASL authentication step I0126 19:10:52.268725 13312 auxprop.cpp:99] Request to lookup properties for user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0126 19:10:52.269078 13312 auxprop.cpp:171] Looking up auxiliary property '*userPassword' I0126 19:10:52.269498 13312 auxprop.cpp:171] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0126 19:10:52.269881 13312 auxprop.cpp:99] Request to lookup properties for user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0126 19:10:52.270385 13312 auxprop.cpp:121] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0126 19:10:52.271015 13312 auxprop.cpp:121] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0126 19:10:52.271599 13312 authenticator.hpp:390] Authentication success I0126 19:10:52.272126 13312 authenticatee.hpp:315] Authentication success I0126 19:10:52.272415 13305 master.cpp:4187] Successfully authenticated principal 'test-principal' at scheduler-6da85b48-b57f-4202-b630-c45f8f652321@192.168.122.135:35545 I0126 19:10:52.273998 13307 sched.cpp:392] Successfully authenticated with master master@192.168.122.135:35545 I0126 19:10:52.274415 13307 sched.cpp:515] Sending registration request to master@192.168.122.135:35545 I0126 19:10:52.274842 13307 sched.cpp:548] Will retry registration in 674.656506ms if necessary I0126 19:10:52.275235 13305 master.cpp:1420] Received registration request for framework 'default' at scheduler-6da85b48-b57f-4202-b630-c45f8f652321@192.168.122.135:35545 I0126 19:10:52.276017 13305 master.cpp:1298] Authorizing framework principal 'test-principal' to receive offers for role '*' I0126 19:10:52.277027 13305 master.cpp:1484] Registering framework 20150126-191052-2272962752-35545-13291-0000 (default) at scheduler-6da85b48-b57f-4202-b630-c45f8f652321@192.168.122.135:35545 I0126 19:10:52.278285 13308 hierarchical_allocator_process.hpp:319] Added framework 20150126-191052-2272962752-35545-13291-0000 I0126 19:10:52.279575 13308 hierarchical_allocator_process.hpp:738] Performed allocation for 1 slaves in 697902ns I0126 19:10:52.287966 13305 master.cpp:4071] Sending 1 offers to framework 20150126-191052-2272962752-35545-13291-0000 (default) at scheduler-6da85b48-b57f-4202-b630-c45f8f652321@192.168.122.135:35545 I0126 19:10:52.288776 13307 sched.cpp:442] Framework registered with 20150126-191052-2272962752-35545-13291-0000 I0126 19:10:52.289373 13307 sched.cpp:456] Scheduler::registered took 21674ns I0126 19:10:52.289932 13307 sched.cpp:605] Scheduler::resourceOffers took 76147ns I0126 19:10:52.293220 13311 master.cpp:2677] Processing ACCEPT call for offers: [ 20150126-191052-2272962752-35545-13291-O0 ] on slave 20150126-191052-2272962752-35545-13291-S0 at slave(101)@192.168.122.135:35545 (fedora-19) for framework 20150126-191052-2272962752-35545-13291-0000 (default) at scheduler-6da85b48-b57f-4202-b630-c45f8f652321@192.168.122.135:35545 I0126 19:10:52.293586 13311 master.cpp:2513] Authorizing framework principal 'test-principal' to launch task 61eaeec3-e8ca-4e15-82d6-284c05c3bb6e as user 'jenkins' I0126 19:10:52.295825 13311 master.hpp:782] Adding task 61eaeec3-e8ca-4e15-82d6-284c05c3bb6e with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20150126-191052-2272962752-35545-13291-S0 (fedora-19) I0126 19:10:52.296272 13311 master.cpp:2885] Launching task 61eaeec3-e8ca-4e15-82d6-284c05c3bb6e of framework 20150126-191052-2272962752-35545-13291-0000 (default) at scheduler-6da85b48-b57f-4202-b630-c45f8f652321@192.168.122.135:35545 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20150126-191052-2272962752-35545-13291-S0 at slave(101)@192.168.122.135:35545 (fedora-19) I0126 19:10:52.296886 13309 slave.cpp:1130] Got assigned task 61eaeec3-e8ca-4e15-82d6-284c05c3bb6e for framework 20150126-191052-2272962752-35545-13291-0000 I0126 19:10:52.297324 13309 slave.cpp:3846] Checkpointing FrameworkInfo to '/tmp/SlaveRecoveryTest_0_ReconcileKillTask_qbguuM/meta/slaves/20150126-191052-2272962752-35545-13291-S0/frameworks/20150126-191052-22...",1 MESOS-2289,"Design doc for the HTTP API","This tracks the design of the HTTP API.",13 MESOS-2290,"Move all scheduler driver validations to master","With HTTP API, the scheduler driver will no longer exist and hence all the validations should move to the master.",3 MESOS-2293,"Implement the scheduler endpoint on master",NULL,8 MESOS-2294,"Implement the Events stream on master for Call endpoint",NULL,8 MESOS-2295,"Implement the Call endpoint on Slave",NULL,8 MESOS-2296,"Implement the Events stream on slave for Call endpoint",NULL,8 MESOS-2297,"Add authentication support for HTTP API","Since most of the communication between mesos components will happen through HTTP with the arrival of the [HTTP API|https://issues.apache.org/jira/browse/MESOS-2288], it makes sense to use HTTP standard mechanisms to authenticate this communication.",1 MESOS-2298,"Provide a Java library for master detection","When schedulers start interacting with Mesos master via HTTP endpoints, they need a way to detect masters. Mesos should provide a master detection Java library to make this easy for frameworks.",5 MESOS-2302,"FaultToleranceTest.SchedulerFailoverFrameworkMessage is flaky.","Bad Run: {noformat} [ RUN ] FaultToleranceTest.SchedulerFailoverFrameworkMessage Using temporary directory '/tmp/FaultToleranceTest_SchedulerFailoverFrameworkMessage_f3jYkr' I0123 18:50:11.669674 15688 leveldb.cpp:176] Opened db in 31.920683ms I0123 18:50:11.678328 15688 leveldb.cpp:183] Compacted db in 8.580569ms I0123 18:50:11.678455 15688 leveldb.cpp:198] Created db iterator in 38478ns I0123 18:50:11.678478 15688 leveldb.cpp:204] Seeked to beginning of db in 3057ns I0123 18:50:11.678489 15688 leveldb.cpp:273] Iterated through 0 keys in the db in 427ns I0123 18:50:11.678539 15688 replica.cpp:744] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0123 18:50:11.682271 15705 recover.cpp:449] Starting replica recovery I0123 18:50:11.682634 15705 recover.cpp:475] Replica is in EMPTY status I0123 18:50:11.684389 15708 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request I0123 18:50:11.685132 15708 recover.cpp:195] Received a recover response from a replica in EMPTY status I0123 18:50:11.689842 15708 recover.cpp:566] Updating replica status to STARTING I0123 18:50:11.702548 15708 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 12.484558ms I0123 18:50:11.702615 15708 replica.cpp:323] Persisted replica status to STARTING I0123 18:50:11.703531 15708 recover.cpp:475] Replica is in STARTING status I0123 18:50:11.705080 15704 replica.cpp:641] Replica in STARTING status received a broadcasted recover request I0123 18:50:11.712587 15708 recover.cpp:195] Received a recover response from a replica in STARTING status I0123 18:50:11.722898 15708 recover.cpp:566] Updating replica status to VOTING I0123 18:50:11.725427 15703 master.cpp:262] Master 20150123-185011-16777343-37526-15688 (localhost.localdomain) started on 127.0.0.1:37526 W0123 18:50:11.725464 15703 master.cpp:266] ************************************************** Master bound to loopback interface! Cannot communicate with remote schedulers or slaves. You might want to set '--ip' flag to a routable IP address. ************************************************** I0123 18:50:11.725502 15703 master.cpp:308] Master only allowing authenticated frameworks to register I0123 18:50:11.725513 15703 master.cpp:313] Master only allowing authenticated slaves to register I0123 18:50:11.725543 15703 credentials.hpp:36] Loading credentials for authentication from '/tmp/FaultToleranceTest_SchedulerFailoverFrameworkMessage_f3jYkr/credentials' I0123 18:50:11.725774 15703 master.cpp:357] Authorization enabled I0123 18:50:11.728428 15707 whitelist_watcher.cpp:65] No whitelist given I0123 18:50:11.729169 15707 master.cpp:1219] The newly elected leader is master@127.0.0.1:37526 with id 20150123-185011-16777343-37526-15688 I0123 18:50:11.729200 15707 master.cpp:1232] Elected as the leading master! I0123 18:50:11.729223 15707 master.cpp:1050] Recovering from registrar I0123 18:50:11.729595 15706 registrar.cpp:313] Recovering registrar I0123 18:50:11.730715 15703 hierarchical_allocator_process.hpp:285] Initialized hierarchical allocator process I0123 18:50:11.737431 15708 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 14.259597ms I0123 18:50:11.737511 15708 replica.cpp:323] Persisted replica status to VOTING I0123 18:50:11.737768 15708 recover.cpp:580] Successfully joined the Paxos group I0123 18:50:11.737977 15708 recover.cpp:464] Recover process terminated I0123 18:50:11.739083 15706 log.cpp:660] Attempting to start the writer I0123 18:50:11.741236 15706 replica.cpp:477] Replica received implicit promise request with proposal 1 I0123 18:50:11.750435 15706 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 8.813783ms I0123 18:50:11.750514 15706 replica.cpp:345] Persisted promised to 1 I0123 18:50:11.752239 15708 coordinator.cpp:230] Coordinator attemping to fill missing position I0123 18:50:11.754176 15706 replica.cpp:378] Replica received explicit promise request for position 0 with proposal 2 I0123 18:50:11.763464 15706 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 8.799822ms I0123 18:50:11.763535 15706 replica.cpp:679] Persisted action at 0 I0123 18:50:11.765697 15709 replica.cpp:511] Replica received write request for position 0 I0123 18:50:11.766293 15709 leveldb.cpp:438] Reading position from leveldb took 54028ns I0123 18:50:11.776468 15709 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 9.789169ms I0123 18:50:11.776561 15709 replica.cpp:679] Persisted action at 0 I0123 18:50:11.777515 15709 replica.cpp:658] Replica received learned notice for position 0 I0123 18:50:11.785459 15709 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 7.897242ms I0123 18:50:11.785531 15709 replica.cpp:679] Persisted action at 0 I0123 18:50:11.785565 15709 replica.cpp:664] Replica learned NOP action at position 0 I0123 18:50:11.786633 15709 log.cpp:676] Writer started with ending position 0 I0123 18:50:11.788460 15709 leveldb.cpp:438] Reading position from leveldb took 266087ns I0123 18:50:11.801141 15709 registrar.cpp:346] Successfully fetched the registry (0B) in 71.491072ms I0123 18:50:11.801300 15709 registrar.cpp:445] Applied 1 operations in 41795ns; attempting to update the 'registry' I0123 18:50:11.805186 15707 log.cpp:684] Attempting to append 136 bytes to the log I0123 18:50:11.805454 15707 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 1 I0123 18:50:11.806677 15703 replica.cpp:511] Replica received write request for position 1 I0123 18:50:11.815621 15703 leveldb.cpp:343] Persisting action (155 bytes) to leveldb took 8.89177ms I0123 18:50:11.815692 15703 replica.cpp:679] Persisted action at 1 I0123 18:50:11.817358 15704 replica.cpp:658] Replica received learned notice for position 1 I0123 18:50:11.825014 15704 leveldb.cpp:343] Persisting action (157 bytes) to leveldb took 7.578558ms I0123 18:50:11.825088 15704 replica.cpp:679] Persisted action at 1 I0123 18:50:11.825124 15704 replica.cpp:664] Replica learned APPEND action at position 1 I0123 18:50:11.827008 15705 registrar.cpp:490] Successfully updated the 'registry' in 25.629952ms I0123 18:50:11.827143 15705 registrar.cpp:376] Successfully recovered registrar I0123 18:50:11.827517 15705 master.cpp:1077] Recovered 0 slaves from the Registry (98B) ; allowing 10mins for slaves to re-register I0123 18:50:11.828515 15704 log.cpp:703] Attempting to truncate the log to 1 I0123 18:50:11.829074 15704 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 2 I0123 18:50:11.830546 15709 replica.cpp:511] Replica received write request for position 2 I0123 18:50:11.837752 15709 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 7.142431ms I0123 18:50:11.837826 15709 replica.cpp:679] Persisted action at 2 I0123 18:50:11.839334 15709 replica.cpp:658] Replica received learned notice for position 2 I0123 18:50:11.847069 15709 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 7.116607ms I0123 18:50:11.847214 15709 leveldb.cpp:401] Deleting ~1 keys from leveldb took 74008ns I0123 18:50:11.847241 15709 replica.cpp:679] Persisted action at 2 I0123 18:50:11.847295 15709 replica.cpp:664] Replica learned TRUNCATE action at position 2 I0123 18:50:11.870337 15710 slave.cpp:173] Slave started on 94)@127.0.0.1:37526 W0123 18:50:11.870980 15710 slave.cpp:176] ************************************************** Slave bound to loopback interface! Cannot communicate with remote master(s). You might want to set '--ip' flag to a routable IP address. ************************************************** I0123 18:50:11.871412 15710 credentials.hpp:84] Loading credential for authentication from '/tmp/FaultToleranceTest_SchedulerFailoverFrameworkMessage_TB8Rh3/credential' I0123 18:50:11.871819 15710 slave.cpp:282] Slave using credential for: test-principal I0123 18:50:11.873178 15710 slave.cpp:300] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0123 18:50:11.873620 15710 slave.cpp:329] Slave hostname: localhost.localdomain I0123 18:50:11.873837 15710 slave.cpp:330] Slave checkpoint: false W0123 18:50:11.874068 15710 slave.cpp:332] Disabling checkpointing is deprecated and the --checkpoint flag will be removed in a future release. Please avoid using this flag I0123 18:50:11.879103 15705 state.cpp:33] Recovering state from '/tmp/FaultToleranceTest_SchedulerFailoverFrameworkMessage_TB8Rh3/meta' W0123 18:50:11.882972 15688 sched.cpp:1246] ************************************************** Scheduler driver bound to loopback interface! Cannot communicate with remote master(s). You might want to set 'LIBPROCESS_IP' environment variable to use a routable IP address. ************************************************** I0123 18:50:11.884106 15709 status_update_manager.cpp:197] Recovering status update manager I0123 18:50:11.884703 15710 slave.cpp:3519] Finished recovery I0123 18:50:11.892076 15704 status_update_manager.cpp:171] Pausing sending status updates I0123 18:50:11.892590 15710 slave.cpp:613] New master detected at master@127.0.0.1:37526 I0123 18:50:11.892937 15710 slave.cpp:676] Authenticating with master master@127.0.0.1:37526 I0123 18:50:11.893165 15710 slave.cpp:681] Using default CRAM-MD5 authenticatee I0123 18:50:11.893754 15708 authenticatee.hpp:138] Creating new client SASL connection I0123 18:50:11.894120 15708 master.cpp:4129] Authenticating slave(94)@127.0.0.1:37526 I0123 18:50:11.894153 15708 master.cpp:4140] Using default CRAM-MD5 authenticator I0123 18:50:11.894628 15708 authenticator.hpp:170] Creating new server SASL connection I0123 18:50:11.894913 15708 authenticatee.hpp:229] Received SASL authentication mechanisms: CRAM-MD5 I0123 18:50:11.894942 15708 authenticatee.hpp:255] Attempting to authenticate with mechanism 'CRAM-MD5' I0123 18:50:11.895043 15708 authenticator.hpp:276] Received SASL authentication start I0123 18:50:11.895095 15708 authenticator.hpp:398] Authentication requires more steps I0123 18:50:11.895165 15708 authenticatee.hpp:275] Received SASL authentication step I0123 18:50:11.895261 15708 authenticator.hpp:304] Received SASL authentication step I0123 18:50:11.895292 15708 auxprop.cpp:99] Request to lookup properties for user: 'test-principal' realm: 'localhost.localdomain' server FQDN: 'localhost.localdomain' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0123 18:50:11.895305 15708 auxprop.cpp:171] Looking up auxiliary property '*userPassword' I0123 18:50:11.895354 15708 auxprop.cpp:171] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0123 18:50:11.895881 15710 slave.cpp:649] Detecting new master I0123 18:50:11.898449 15708 auxprop.cpp:99] Request to lookup properties for user: 'test-principal' realm: 'localhost.localdomain' server FQDN: 'localhost.localdomain' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0123 18:50:11.899024 15708 auxprop.cpp:121] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0123 18:50:11.899106 15708 auxprop.cpp:121] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0123 18:50:11.899190 15708 authenticator.hpp:390] Authentication success I0123 18:50:11.899569 15706 authenticatee.hpp:315] Authentication success I0123 18:50:11.902299 15706 slave.cpp:747] Successfully authenticated with master master@127.0.0.1:37526 I0123 18:50:11.902847 15706 slave.cpp:1075] Will retry registration in 19.809649ms if necessary I0123 18:50:11.903264 15705 master.cpp:3214] Queuing up registration request from slave(94)@127.0.0.1:37526 because authentication is still in progress I0123 18:50:11.903497 15705 master.cpp:4187] Successfully authenticated principal 'test-principal' at slave(94)@127.0.0.1:37526 I0123 18:50:11.903940 15705 master.cpp:3275] Registering slave at slave(94)@127.0.0.1:37526 (localhost.localdomain) with id 20150123-185011-16777343-37526-15688-S0 I0123 18:50:11.904398 15705 registrar.cpp:445] Applied 1 operations in 63679ns; attempting to update the 'registry' I0123 18:50:11.917883 15688 sched.cpp:151] Version: 0.22.0 I0123 18:50:11.919347 15703 log.cpp:684] Attempting to append 315 bytes to the log I0123 18:50:11.921039 15703 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 3 I0123 18:50:11.919992 15706 sched.cpp:248] New master detected at master@127.0.0.1:37526 I0123 18:50:11.921352 15706 sched.cpp:304] Authenticating with master master@127.0.0.1:37526 I0123 18:50:11.921408 15706 sched.cpp:311] Using default CRAM-MD5 authenticatee I0123 18:50:11.921773 15706 authenticatee.hpp:138] Creating new client SASL connection I0123 18:50:11.922266 15706 master.cpp:4129] Authenticating scheduler-2cecb105-ca23-4048-9707-12b1e4422e11@127.0.0.1:37526 I0123 18:50:11.922301 15706 master.cpp:4140] Using default CRAM-MD5 authenticator I0123 18:50:11.923928 15703 replica.cpp:511] Replica received write request for position 3 I0123 18:50:11.924285 15707 authenticator.hpp:170] Creating new server SASL connection I0123 18:50:11.925091 15707 authenticatee.hpp:229] Received SASL authentication mechanisms: CRAM-MD5 I0123 18:50:11.925122 15707 authenticatee.hpp:255] Attempting to authenticate with mechanism 'CRAM-MD5' I0123 18:50:11.925194 15707 authenticator.hpp:276] Received SASL authentication start I0123 18:50:11.925257 15707 authenticator.hpp:398] Authentication requires more steps I0123 18:50:11.925325 15707 authenticatee.hpp:275] Received SASL authentication step I0123 18:50:11.925442 15707 authenticator.hpp:304] Received SASL authentication step I0123 18:50:11.925473 15707 auxprop.cpp:99] Request to lookup properties for user: 'test-principal' realm: 'localhost.localdomain' server FQDN: 'localhost.localdomain' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0123 18:50:11.925487 15707 auxprop.cpp:171] Looking up auxiliary property '*userPassword' I0123 18:50:11.925532 15707 auxprop.cpp:171] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0123 18:50:11.925559 15707 auxprop.cpp:99] Request to lookup properties for user: 'test-principal' realm: 'localhost.localdomain' server FQDN: 'localhost.localdomain' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0123 18:50:11.925571 15707 auxprop.cpp:121] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0123 18:50:11.925580 15707 auxprop.cpp:121] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0123 18:50:11.925595 15707 authenticator.hpp:390] Authentication success I0123 18:50:11.925695 15707 authenticatee.hpp:315] Authentication success I0123 18:50:11.925792 15707 master.cpp:4187] Successfully authenticated principal 'test-principal' at scheduler-2cecb105-ca23-4048-9707-12b1e4422e11@127.0.0.1:37526 I0123 18:50:11.926127 15707 sched.cpp:392] Successfully authenticated with master master@127.0.0.1:37526 I0123 18:50:11.926154 15707 sched.cpp:515] Sending registration request to master@127.0.0.1:37526 I0123 18:50:11.926215 15707 sched.cpp:548] Will retry registration in 866.81063ms if necessary I0123 18:50:11.926640 15707 master.cpp:1420] Received registration request for framework 'default' at scheduler-2cecb105-ca23-4048-9707-12b1e4422e11@127.0.0.1:37526 I0123 18:50:11.926960 15707 master.cpp:1298] Authorizing framework principal 'test-principal' to receive offers for role '*' I0123 18:50:11.927691 15707 master.cpp:1484] Registering framework 20150123-185011-16777343-37526-15688-0000 (default) at scheduler-2cecb105-ca23-4048-9707-12b1e4422e11@127.0.0.1:37526 I0123 18:50:11.928292 15708 hierarchical_allocator_process.hpp:319] Added framework 20150123-185011-16777343-37526-15688-0000 I0123 18:50:11.928326 15708 hierarchical_allocator_process.hpp:839] No resources available to allocate! I0123 18:50:11.928340 15708 hierarchical_allocator_process.hpp:746] Performed allocation for 0 slaves in 21080ns I0123 18:50:11.934458 15707 sched.cpp:442] Framework registered with 20150123-185011-16777343-37526-15688-0000 I0123 18:50:11.934927 15707 sched.cpp:456] Scheduler::registered took 112885ns I0123 18:50:11.935747 15709 slave.cpp:1075] Will retry registration in 19.609252ms if necessary I0123 18:50:11.935981 15709 master.cpp:3263] Ignoring register slave message from slave(94)@127.0.0.1:37526 (localhost.localdomain) as admission is already in progress I0123 18:50:11.938997 15703 leveldb.cpp:343] Persisting action (334 bytes) to leveldb took 10.171709ms I0123 18:50:11.939049 15703 replica.cpp:679] Persisted action at 3 I0123 18:50:11.940630 15709 replica.cpp:658] Replica received learned notice for position 3 I0123 18:50:11.945473 15709 leveldb.cpp:343] Persisting action (336 bytes) to leveldb took 4.804742ms I0123 18:50:11.945521 15709 replica.cpp:679] Persisted action at 3 I0123 18:50:11.945550 15709 replica.cpp:664] Replica learned APPEND action at position 3 I0123 18:50:11.947105 15709 registrar.cpp:490] Successfully updated the 'registry' in 42.637056ms I0123 18:50:11.948020 15703 master.cpp:3329] Registered slave 20150123-185011-16777343-37526-15688-S0 at slave(94)@127.0.0.1:37526 (localhost.localdomain) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0123 18:50:11.948318 15703 hierarchical_allocator_process.hpp:453] Added slave 20150123-185011-16777343-37526-15688-S0 (localhost.localdomain) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (and cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] available) I0123 18:50:11.948719 15703 hierarchical_allocator_process.hpp:764] Performed allocation for slave 20150123-185011-16777343-37526-15688-S0 in 355831ns I0123 18:50:11.948813 15703 slave.cpp:781] Registered with master master@127.0.0.1:37526; given slave ID 20150123-185011-16777343-37526-15688-S0 I0123 18:50:11.948969 15703 slave.cpp:2588] Received ping from slave-observer(92)@127.0.0.1:37526 I0123 18:50:11.949324 15703 master.cpp:4071] Sending 1 offers to framework 20150123-185011-16777343-37526-15688-0000 (default) at scheduler-2cecb105-ca23-4048-9707-12b1e4422e11@127.0.0.1:37526 I0123 18:50:11.949571 15706 status_update_manager.cpp:178] Resuming sending status updates I0123 18:50:11.950023 15709 log.cpp:703] Attempting to truncate the log to 3 I0123 18:50:11.950810 15705 sched.cpp:605] Scheduler::resourceOffers took 135580ns I0123 18:50:11.952793 15708 master.cpp:2677] Processing ACCEPT call for offers: [ 20150123-185011-16777343-37526-15688-O0 ] on slave 20150123-185011-16777343-37526-15688-S0 at slave(94)@127.0.0.1:37526 (localhost.localdomain) for framework 20150123-185011-16777343-37526-15688-0000 (default) at scheduler-2cecb105-ca23-4048-9707-12b1e4422e11@127.0.0.1:37526 I0123 18:50:11.952852 15708 master.cpp:2513] Authorizing framework principal 'test-principal' to launch task 1 as user 'jenkins' W0123 18:50:11.954649 15708 master.cpp:2130] Executor default for task 1 uses less CPUs (None) than the minimum required (0.01). Please update your executor, as this will be mandatory in future releases. W0123 18:50:11.954988 15708 master.cpp:2142] Executor default for task 1 uses less memory (None) than the minimum required (32MB). Please update your executor, as this will be mandatory in future releases. I0123 18:50:11.955579 15708 master.hpp:782] Adding task 1 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20150123-185011-16777343-37526-15688-S0 (localhost.localdomain) I0123 18:50:11.956035 15703 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 4 I0123 18:50:11.957592 15704 replica.cpp:511] Replica received write request for position 4 I0123 18:50:11.958485 15708 master.cpp:2885] Launching task 1 of framework 20150123-185011-16777343-37526-15688-0000 (default) at scheduler-2cecb105-ca23-4048-9707-12b1e4422e11@127.0.0.1:37526 with resources cpus(...",1 MESOS-2305,"Refactor validators in Master.","There are several motivation for this. We are in the process of adding dynamic reservations and persistent volumes support in master. To do that, master needs to validate relevant operations from the framework (See Offer::Operation in mesos.proto). The existing validator style in master is hard to extend, compose and re-use. Another motivation for this is for unit testing (MESOS-1064). Right now, we write integration tests for those validators which is unfortunate.",3 MESOS-2306,"MasterAuthorizationTest.FrameworkRemovedBeforeReregistration is flaky.","Good run: {noformat} [ RUN ] MasterAuthorizationTest.FrameworkRemovedBeforeReregistration Using temporary directory '/tmp/MasterAuthorizationTest_FrameworkRemovedBeforeReregistration_ZU7oaD' I0122 19:23:06.481690 17483 leveldb.cpp:176] Opened db in 21.058723ms I0122 19:23:06.488590 17483 leveldb.cpp:183] Compacted db in 6.6715ms I0122 19:23:06.488816 17483 leveldb.cpp:198] Created db iterator in 30034ns I0122 19:23:06.489053 17483 leveldb.cpp:204] Seeked to beginning of db in 2908ns I0122 19:23:06.489073 17483 leveldb.cpp:273] Iterated through 0 keys in the db in 492ns I0122 19:23:06.489148 17483 replica.cpp:744] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0122 19:23:06.490272 17504 recover.cpp:449] Starting replica recovery I0122 19:23:06.490900 17504 recover.cpp:475] Replica is in EMPTY status I0122 19:23:06.492422 17504 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request I0122 19:23:06.492694 17504 recover.cpp:195] Received a recover response from a replica in EMPTY status I0122 19:23:06.493185 17504 recover.cpp:566] Updating replica status to STARTING I0122 19:23:06.514881 17504 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 21.459963ms I0122 19:23:06.514920 17504 replica.cpp:323] Persisted replica status to STARTING I0122 19:23:06.515861 17501 master.cpp:262] Master 20150122-192306-16842879-46283-17483 (lucid) started on 127.0.1.1:46283 I0122 19:23:06.515910 17501 master.cpp:308] Master only allowing authenticated frameworks to register I0122 19:23:06.515923 17501 master.cpp:313] Master only allowing authenticated slaves to register I0122 19:23:06.515946 17501 credentials.hpp:36] Loading credentials for authentication from '/tmp/MasterAuthorizationTest_FrameworkRemovedBeforeReregistration_ZU7oaD/credentials' I0122 19:23:06.516150 17501 master.cpp:357] Authorization enabled I0122 19:23:06.517511 17501 hierarchical_allocator_process.hpp:285] Initialized hierarchical allocator process I0122 19:23:06.517607 17501 whitelist_watcher.cpp:65] No whitelist given I0122 19:23:06.518066 17498 master.cpp:1219] The newly elected leader is master@127.0.1.1:46283 with id 20150122-192306-16842879-46283-17483 I0122 19:23:06.518095 17498 master.cpp:1232] Elected as the leading master! I0122 19:23:06.518121 17498 master.cpp:1050] Recovering from registrar I0122 19:23:06.518333 17498 registrar.cpp:313] Recovering registrar I0122 19:23:06.523987 17504 recover.cpp:475] Replica is in STARTING status I0122 19:23:06.525090 17504 replica.cpp:641] Replica in STARTING status received a broadcasted recover request I0122 19:23:06.525337 17504 recover.cpp:195] Received a recover response from a replica in STARTING status I0122 19:23:06.525693 17504 recover.cpp:566] Updating replica status to VOTING I0122 19:23:06.532680 17504 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 6.810884ms I0122 19:23:06.532714 17504 replica.cpp:323] Persisted replica status to VOTING I0122 19:23:06.532835 17504 recover.cpp:580] Successfully joined the Paxos group I0122 19:23:06.533004 17504 recover.cpp:464] Recover process terminated I0122 19:23:06.533833 17500 log.cpp:660] Attempting to start the writer I0122 19:23:06.535225 17500 replica.cpp:477] Replica received implicit promise request with proposal 1 I0122 19:23:06.540340 17500 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 5.086139ms I0122 19:23:06.540371 17500 replica.cpp:345] Persisted promised to 1 I0122 19:23:06.541502 17504 coordinator.cpp:230] Coordinator attemping to fill missing position I0122 19:23:06.543021 17504 replica.cpp:378] Replica received explicit promise request for position 0 with proposal 2 I0122 19:23:06.548140 17504 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 5.083443ms I0122 19:23:06.548171 17504 replica.cpp:679] Persisted action at 0 I0122 19:23:06.549746 17500 replica.cpp:511] Replica received write request for position 0 I0122 19:23:06.549926 17500 leveldb.cpp:438] Reading position from leveldb took 31962ns I0122 19:23:06.555033 17500 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 5.065823ms I0122 19:23:06.555064 17500 replica.cpp:679] Persisted action at 0 I0122 19:23:06.556094 17504 replica.cpp:658] Replica received learned notice for position 0 I0122 19:23:06.558815 17504 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 2.688382ms I0122 19:23:06.558847 17504 replica.cpp:679] Persisted action at 0 I0122 19:23:06.558868 17504 replica.cpp:664] Replica learned NOP action at position 0 I0122 19:23:06.559917 17500 log.cpp:676] Writer started with ending position 0 I0122 19:23:06.560995 17500 leveldb.cpp:438] Reading position from leveldb took 27742ns I0122 19:23:06.563467 17500 registrar.cpp:346] Successfully fetched the registry (0B) in 45.095936ms I0122 19:23:06.563551 17500 registrar.cpp:445] Applied 1 operations in 19686ns; attempting to update the 'registry' I0122 19:23:06.566107 17500 log.cpp:684] Attempting to append 118 bytes to the log I0122 19:23:06.566267 17500 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 1 I0122 19:23:06.567126 17500 replica.cpp:511] Replica received write request for position 1 I0122 19:23:06.582588 17500 leveldb.cpp:343] Persisting action (135 bytes) to leveldb took 15.425511ms I0122 19:23:06.582631 17500 replica.cpp:679] Persisted action at 1 I0122 19:23:06.583425 17500 replica.cpp:658] Replica received learned notice for position 1 I0122 19:23:06.589001 17500 leveldb.cpp:343] Persisting action (137 bytes) to leveldb took 5.549486ms I0122 19:23:06.589200 17500 replica.cpp:679] Persisted action at 1 I0122 19:23:06.589416 17500 replica.cpp:664] Replica learned APPEND action at position 1 I0122 19:23:06.596420 17500 registrar.cpp:490] Successfully updated the 'registry' in 32.815104ms I0122 19:23:06.596551 17500 registrar.cpp:376] Successfully recovered registrar I0122 19:23:06.596923 17500 master.cpp:1077] Recovered 0 slaves from the Registry (82B) ; allowing 10mins for slaves to re-register I0122 19:23:06.597007 17500 log.cpp:703] Attempting to truncate the log to 1 I0122 19:23:06.597239 17500 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 2 I0122 19:23:06.598464 17501 replica.cpp:511] Replica received write request for position 2 I0122 19:23:06.604038 17501 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 5.536264ms I0122 19:23:06.604084 17501 replica.cpp:679] Persisted action at 2 I0122 19:23:06.608747 17503 replica.cpp:658] Replica received learned notice for position 2 I0122 19:23:06.614094 17503 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 5.315347ms I0122 19:23:06.614171 17503 leveldb.cpp:401] Deleting ~1 keys from leveldb took 33021ns I0122 19:23:06.614188 17503 replica.cpp:679] Persisted action at 2 I0122 19:23:06.614208 17503 replica.cpp:664] Replica learned TRUNCATE action at position 2 I0122 19:23:06.628820 17483 sched.cpp:151] Version: 0.22.0 I0122 19:23:06.629879 17505 sched.cpp:248] New master detected at master@127.0.1.1:46283 I0122 19:23:06.629973 17505 sched.cpp:304] Authenticating with master master@127.0.1.1:46283 I0122 19:23:06.629995 17505 sched.cpp:311] Using default CRAM-MD5 authenticatee I0122 19:23:06.630314 17505 authenticatee.hpp:138] Creating new client SASL connection I0122 19:23:06.630722 17505 master.cpp:4129] Authenticating scheduler-4156eae6-8d7f-423a-920a-02b11b7bd1ba@127.0.1.1:46283 I0122 19:23:06.630750 17505 master.cpp:4140] Using default CRAM-MD5 authenticator I0122 19:23:06.631115 17505 authenticator.hpp:170] Creating new server SASL connection I0122 19:23:06.631423 17505 authenticatee.hpp:229] Received SASL authentication mechanisms: CRAM-MD5 I0122 19:23:06.631459 17505 authenticatee.hpp:255] Attempting to authenticate with mechanism 'CRAM-MD5' I0122 19:23:06.631563 17505 authenticator.hpp:276] Received SASL authentication start I0122 19:23:06.631605 17505 authenticator.hpp:398] Authentication requires more steps I0122 19:23:06.631671 17505 authenticatee.hpp:275] Received SASL authentication step I0122 19:23:06.631748 17505 authenticator.hpp:304] Received SASL authentication step I0122 19:23:06.631774 17505 auxprop.cpp:99] Request to lookup properties for user: 'test-principal' realm: 'lucid' server FQDN: 'lucid' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0122 19:23:06.631784 17505 auxprop.cpp:171] Looking up auxiliary property '*userPassword' I0122 19:23:06.631822 17505 auxprop.cpp:171] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0122 19:23:06.631856 17505 auxprop.cpp:99] Request to lookup properties for user: 'test-principal' realm: 'lucid' server FQDN: 'lucid' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0122 19:23:06.631870 17505 auxprop.cpp:121] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0122 19:23:06.631877 17505 auxprop.cpp:121] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0122 19:23:06.631892 17505 authenticator.hpp:390] Authentication success I0122 19:23:06.631988 17505 authenticatee.hpp:315] Authentication success I0122 19:23:06.632066 17505 master.cpp:4187] Successfully authenticated principal 'test-principal' at scheduler-4156eae6-8d7f-423a-920a-02b11b7bd1ba@127.0.1.1:46283 I0122 19:23:06.632359 17505 sched.cpp:392] Successfully authenticated with master master@127.0.1.1:46283 I0122 19:23:06.632382 17505 sched.cpp:515] Sending registration request to master@127.0.1.1:46283 I0122 19:23:06.632432 17505 sched.cpp:548] Will retry registration in 598.155756ms if necessary I0122 19:23:06.632575 17505 master.cpp:1420] Received registration request for framework 'default' at scheduler-4156eae6-8d7f-423a-920a-02b11b7bd1ba@127.0.1.1:46283 I0122 19:23:06.632639 17505 master.cpp:1298] Authorizing framework principal 'test-principal' to receive offers for role '*' I0122 19:23:06.632912 17505 master.cpp:1484] Registering framework 20150122-192306-16842879-46283-17483-0000 (default) at scheduler-4156eae6-8d7f-423a-920a-02b11b7bd1ba@127.0.1.1:46283 I0122 19:23:06.633421 17505 hierarchical_allocator_process.hpp:319] Added framework 20150122-192306-16842879-46283-17483-0000 I0122 19:23:06.633448 17505 hierarchical_allocator_process.hpp:839] No resources available to allocate! I0122 19:23:06.633458 17505 hierarchical_allocator_process.hpp:746] Performed allocation for 0 slaves in 17704ns I0122 19:23:06.633919 17505 sched.cpp:442] Framework registered with 20150122-192306-16842879-46283-17483-0000 I0122 19:23:06.633980 17505 sched.cpp:456] Scheduler::registered took 37063ns I0122 19:23:06.636554 17500 sched.cpp:242] Scheduler::disconnected took 14843ns I0122 19:23:06.636579 17500 sched.cpp:248] New master detected at master@127.0.1.1:46283 I0122 19:23:06.636625 17500 sched.cpp:304] Authenticating with master master@127.0.1.1:46283 I0122 19:23:06.636641 17500 sched.cpp:311] Using default CRAM-MD5 authenticatee I0122 19:23:06.636914 17500 authenticatee.hpp:138] Creating new client SASL connection I0122 19:23:06.637313 17500 master.cpp:4129] Authenticating scheduler-4156eae6-8d7f-423a-920a-02b11b7bd1ba@127.0.1.1:46283 I0122 19:23:06.637341 17500 master.cpp:4140] Using default CRAM-MD5 authenticator I0122 19:23:06.637675 17500 authenticator.hpp:170] Creating new server SASL connection I0122 19:23:06.638056 17501 authenticatee.hpp:229] Received SASL authentication mechanisms: CRAM-MD5 I0122 19:23:06.638083 17501 authenticatee.hpp:255] Attempting to authenticate with mechanism 'CRAM-MD5' I0122 19:23:06.638182 17501 authenticator.hpp:276] Received SASL authentication start I0122 19:23:06.638221 17501 authenticator.hpp:398] Authentication requires more steps I0122 19:23:06.638286 17501 authenticatee.hpp:275] Received SASL authentication step I0122 19:23:06.638360 17501 authenticator.hpp:304] Received SASL authentication step I0122 19:23:06.638383 17501 auxprop.cpp:99] Request to lookup properties for user: 'test-principal' realm: 'lucid' server FQDN: 'lucid' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0122 19:23:06.638393 17501 auxprop.cpp:171] Looking up auxiliary property '*userPassword' I0122 19:23:06.638422 17501 auxprop.cpp:171] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0122 19:23:06.638447 17501 auxprop.cpp:99] Request to lookup properties for user: 'test-principal' realm: 'lucid' server FQDN: 'lucid' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0122 19:23:06.638458 17501 auxprop.cpp:121] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0122 19:23:06.638464 17501 auxprop.cpp:121] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0122 19:23:06.638478 17501 authenticator.hpp:390] Authentication success I0122 19:23:06.638566 17501 authenticatee.hpp:315] Authentication success I0122 19:23:06.638643 17501 master.cpp:4187] Successfully authenticated principal 'test-principal' at scheduler-4156eae6-8d7f-423a-920a-02b11b7bd1ba@127.0.1.1:46283 I0122 19:23:06.638919 17501 sched.cpp:392] Successfully authenticated with master master@127.0.1.1:46283 I0122 19:23:06.638942 17501 sched.cpp:515] Sending registration request to master@127.0.1.1:46283 I0122 19:23:06.638994 17501 sched.cpp:548] Will retry registration in 489.304713ms if necessary I0122 19:23:06.639169 17501 master.cpp:1557] Received re-registration request from framework 20150122-192306-16842879-46283-17483-0000 (default) at scheduler-4156eae6-8d7f-423a-920a-02b11b7bd1ba@127.0.1.1:46283 I0122 19:23:06.639242 17501 master.cpp:1298] Authorizing framework principal 'test-principal' to receive offers for role '*' I0122 19:23:06.639839 17483 sched.cpp:1471] Asked to stop the driver I0122 19:23:06.640379 17499 sched.cpp:808] Stopping framework '20150122-192306-16842879-46283-17483-0000' I0122 19:23:06.640697 17499 master.cpp:745] Framework 20150122-192306-16842879-46283-17483-0000 (default) at scheduler-4156eae6-8d7f-423a-920a-02b11b7bd1ba@127.0.1.1:46283 disconnected I0122 19:23:06.640723 17499 master.cpp:1789] Disconnecting framework 20150122-192306-16842879-46283-17483-0000 (default) at scheduler-4156eae6-8d7f-423a-920a-02b11b7bd1ba@127.0.1.1:46283 I0122 19:23:06.640744 17499 master.cpp:1805] Deactivating framework 20150122-192306-16842879-46283-17483-0000 (default) at scheduler-4156eae6-8d7f-423a-920a-02b11b7bd1ba@127.0.1.1:46283 I0122 19:23:06.640806 17499 master.cpp:767] Giving framework 20150122-192306-16842879-46283-17483-0000 (default) at scheduler-4156eae6-8d7f-423a-920a-02b11b7bd1ba@127.0.1.1:46283 0ns to failover I0122 19:23:06.640951 17499 hierarchical_allocator_process.hpp:398] Deactivated framework 20150122-192306-16842879-46283-17483-0000 I0122 19:23:06.646342 17498 master.cpp:1604] Dropping re-registration request of framework 20150122-192306-16842879-46283-17483-0000 (default) at scheduler-4156eae6-8d7f-423a-920a-02b11b7bd1ba@127.0.1.1:46283 because it is not authenticated I0122 19:23:06.648844 17498 master.cpp:3941] Framework failover timeout, removing framework 20150122-192306-16842879-46283-17483-0000 (default) at scheduler-4156eae6-8d7f-423a-920a-02b11b7bd1ba@127.0.1.1:46283 I0122 19:23:06.648871 17498 master.cpp:4499] Removing framework 20150122-192306-16842879-46283-17483-0000 (default) at scheduler-4156eae6-8d7f-423a-920a-02b11b7bd1ba@127.0.1.1:46283 I0122 19:23:06.649624 17498 hierarchical_allocator_process.hpp:352] Removed framework 20150122-192306-16842879-46283-17483-0000 I0122 19:23:06.656532 17483 master.cpp:654] Master terminating [ OK ] MasterAuthorizationTest.FrameworkRemovedBeforeReregistration (216 ms) {noformat} Bad run: {noformat} [ RUN ] MasterAuthorizationTest.FrameworkRemovedBeforeReregistration Using temporary directory '/tmp/MasterAuthorizationTest_FrameworkRemovedBeforeReregistration_JDM2sm' I0126 19:19:55.517570 2381 leveldb.cpp:176] Opened db in 34.341401ms I0126 19:19:55.529630 2381 leveldb.cpp:183] Compacted db in 11.824435ms I0126 19:19:55.529878 2381 leveldb.cpp:198] Created db iterator in 26176ns I0126 19:19:55.530200 2381 leveldb.cpp:204] Seeked to beginning of db in 3457ns I0126 19:19:55.530455 2381 leveldb.cpp:273] Iterated through 0 keys in the db in 902ns I0126 19:19:55.530658 2381 replica.cpp:744] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0126 19:19:55.531492 2397 recover.cpp:449] Starting replica recovery I0126 19:19:55.531793 2397 recover.cpp:475] Replica is in EMPTY status I0126 19:19:55.533327 2397 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request I0126 19:19:55.533608 2397 recover.cpp:195] Received a recover response from a replica in EMPTY status I0126 19:19:55.534101 2397 recover.cpp:566] Updating replica status to STARTING I0126 19:19:55.550417 2397 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 16.106821ms I0126 19:19:55.550472 2397 replica.cpp:323] Persisted replica status to STARTING I0126 19:19:55.551434 2397 recover.cpp:475] Replica is in STARTING status I0126 19:19:55.552846 2397 replica.cpp:641] Replica in STARTING status received a broadcasted recover request I0126 19:19:55.553099 2397 recover.cpp:195] Received a recover response from a replica in STARTING status I0126 19:19:55.553565 2397 recover.cpp:566] Updating replica status to VOTING I0126 19:19:55.564590 2397 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 10.719218ms I0126 19:19:55.564919 2397 replica.cpp:323] Persisted replica status to VOTING I0126 19:19:55.565982 2397 recover.cpp:580] Successfully joined the Paxos group I0126 19:19:55.566231 2397 recover.cpp:464] Recover process terminated I0126 19:19:55.567878 2401 master.cpp:262] Master 20150126-191955-16842879-51862-2381 (lucid) started on 127.0.1.1:51862 I0126 19:19:55.567927 2401 master.cpp:308] Master only allowing authenticated frameworks to register I0126 19:19:55.567950 2401 master.cpp:313] Master only allowing authenticated slaves to register I0126 19:19:55.567978 2401 credentials.hpp:36] Loading credentials for authentication from '/tmp/MasterAuthorizationTest_FrameworkRemovedBeforeReregistration_JDM2sm/credentials' I0126 19:19:55.568220 2401 master.cpp:357] Authorization enabled I0126 19:19:55.569890 2401 hierarchical_allocator_process.hpp:285] Initialized hierarchical allocator process I0126 19:19:55.569999 2401 whitelist_watcher.cpp:65] No whitelist given I0126 19:19:55.570694 2401 master.cpp:1219] The newly elected leader is master@127.0.1.1:51862 with id 20150126-191955-16842879-51862-2381 I0126 19:19:55.570721 2401 master.cpp:1232] Elected as the leading master! I0126 19:19:55.570742 2401 master.cpp:1050] Recovering from registrar I0126 19:19:55.570977 2401 registrar.cpp:313] Recovering registrar I0126 19:19:55.571959 2401 log.cpp:660] Attempting to start the writer I0126 19:19:55.573441 2401 replica.cpp:477] Replica received implicit promise request with proposal 1 I0126 19:19:55.590724 2401 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 17.243964ms I0126 19:19:55.590785 2401 replica.cpp:345] Persisted promised to 1 I0126 19:19:55.592140 2396 coordinator.cpp:230] Coordinator attemping to fill missing position I0126 19:19:55.593834 2396 replica.cpp:378] Replica received explicit promise request for position 0 with proposal 2 I0126 19:19:55.603837 2396 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 9.955824ms I0126 19:19:55.603902 2396 replica.cpp:679] Persisted action at 0 I0126 19:19:55.606082 2401 replica.cpp:511] Replica received write request for position 0 I0126 19:19:55.606331 2401 leveldb.cpp:438] Reading position from leveldb took 44524ns I0126 19:19:55.612546 2401 leveldb.cpp:343] Persisting action (14 bytes) to level...",1 MESOS-2309,"Mesos rejects ExecutorInfo as incompatible when there is no functional difference","In AURORA-1076 it was discovered that if an ExecutorInfo was changed such that a previously unset optional field with a default value was changed to have the field set with the default value, it would be rejected as not compatible. For example if we have an ExecutorInfo with a CommandInfo with the {{shell}} attribute unset and then we change the CommandInfo to set the {{shell}} attribute to true Mesos will reject the task with: {noformat} I0130 21:50:05.373389 50869 master.cpp:3441] Sending status update TASK_LOST (UUID: 82ef615c-0d59-4427-95d5-80cf0e52b3fc) for task system-gc-c89c0c05-200c-462e-958a-ecd7b9a76831 of framework 201103282247-0000000019-0000 'Task has invalid ExecutorInfo (existing ExecutorInfo with same ExecutorID is not compatible). {noformat} This is not intuitive because the default value of the {{shell}} attribute is true. There should be no difference between not setting an optional field with a default value and setting that field to the default value.",3 MESOS-2314,"remove unnecessary constants","In {{src/slave/paths.cpp}} a number of string constants are defined to describe the formats of various paths. However, given there is a 1:1 mapping between the string constant and the functions that build the paths, the code would be more readable if the format strings were inline in the functions. In the cases where one constant depends on another (see the {{EXECUTOR_INFO_PATH, EXECUTOR_PATH, FRAMEWORK_PATH, SLAVE_PATH, ROOT_PATH}} chain, for example) the function calls can just be chained together. This will have the added benefit of removing some statically constructed string constants, which are dangerous.",2 MESOS-2315,"Deprecate / Remove CommandInfo::ContainerInfo","IIUC this has been deprecated and all current code (except examples/docker_no_executor_framework.cpp) uses the top-level ContainerInfo?",2 MESOS-2317,"Remove deprecated checkpoint=false code","Cody's plan from MESOS-444 was: 1) -Make it so the flag can't be changed at the command line- 2) -Remove the checkpoint variable entirely from slave/flags.hpp. This is a fairly involved change since a number of unit tests depend on manually setting the flag, as well as the default being non-checkpointing.- 3) -Remove logic around checkpointing in the slave, remove logic inside the master.- 4) Drop the flag from the SlaveInfo struct (Will require a deprecation cycle). ",3 MESOS-2319,"Unable to set --work_dir to a non /tmp device","When starting mesos-slave with --work_dir set to a directory which is not the same device as /tmp results in mesos-slave throwing a core dump: {code} mesos # GLOG_v=1 sbin/mesos-slave --master=zk://10.171.59.83:2181/mesos --work_dir=/var/lib/mesos/ WARNING: Logging before InitGoogleLogging() is written to STDERR I0204 18:24:49.274619 22922 process.cpp:958] libprocess is initialized on 10.169.146.67:5051 for 8 cpus I0204 18:24:49.274978 22922 logging.cpp:177] Logging to STDERR I0204 18:24:49.275111 22922 main.cpp:152] Build: 2015-02-03 22:59:30 by I0204 18:24:49.275233 22922 main.cpp:154] Version: 0.22.0 I0204 18:24:49.275485 22922 containerizer.cpp:103] Using isolation: posix/cpu,posix/mem 2015-02-04 18:24:49,275:22922(0x7ffdd4d5c700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5 2015-02-04 18:24:49,275:22922(0x7ffdd4d5c700):ZOO_INFO@log_env@716: Client environment:host.name=ip-10-169-146-67.ec2.internal 2015-02-04 18:24:49,276:22922(0x7ffdd4d5c700):ZOO_INFO@log_env@723: Client environment:os.name=Linux 2015-02-04 18:24:49,276:22922(0x7ffdd4d5c700):ZOO_INFO@log_env@724: Client environment:os.arch=3.18.2 2015-02-04 18:24:49,276:22922(0x7ffdd4d5c700):ZOO_INFO@log_env@725: Client environment:os.version=#2 SMP Tue Jan 27 23:34:36 UTC 2015 2015-02-04 18:24:49,276:22922(0x7ffdd4d5c700):ZOO_INFO@log_env@733: Client environment:user.name=core 2015-02-04 18:24:49,276:22922(0x7ffdd4d5c700):ZOO_INFO@log_env@741: Client environment:user.home=/root 2015-02-04 18:24:49,276:22922(0x7ffdd4d5c700):ZOO_INFO@log_env@753: Client environment:user.dir=/opt/mesosphere/dcos/0.0.1-0.1.20150203225612/mesos 2015-02-04 18:24:49,276:22922(0x7ffdd4d5c700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=10.171.59.83:2181 sessionTimeout=10000 watcher=0x7ffdd97bccf0 sessionId=0 sessionPasswd= context=0x7ffdc8000ba0 flags=0 I0204 18:24:49.276793 22922 main.cpp:180] Starting Mesos slave 2015-02-04 18:24:49,307:22922(0x7ffdd151f700):ZOO_INFO@check_events@1703: initiated connection to server [10.171.59.83:2181] I0204 18:24:49.307548 22922 slave.cpp:173] Slave started on 1)@10.169.146.67:5051 I0204 18:24:49.307955 22922 slave.cpp:300] Slave resources: cpus(*):1; mem(*):2728; disk(*):24736; ports(*):[31000-32000] I0204 18:24:49.308404 22922 slave.cpp:329] Slave hostname: ip-10-169-146-67.ec2.internal I0204 18:24:49.308459 22922 slave.cpp:330] Slave checkpoint: true I0204 18:24:49.310431 22924 state.cpp:33] Recovering state from '/var/lib/mesos/meta' I0204 18:24:49.310583 22924 state.cpp:668] Failed to find resources file '/var/lib/mesos/meta/resources/resources.info' I0204 18:24:49.310670 22924 state.cpp:74] Failed to find the latest slave from '/var/lib/mesos/meta' I0204 18:24:49.310803 22924 status_update_manager.cpp:197] Recovering status update manager I0204 18:24:49.310916 22924 containerizer.cpp:300] Recovering containerizer I0204 18:24:49.311110 22924 slave.cpp:3527] Finished recovery F0204 18:24:49.311312 22924 slave.cpp:3537] CHECK_SOME(state::checkpoint(path, bootId.get())): Failed to rename '/tmp/PSHLqV' to '/var/lib/mesos/meta/boot_id': Invalid cross-device link 2015-02-04 18:24:49,310:22922(0x7ffdd151f700):ZOO_INFO@check_events@1750: session establishment complete on server [10.171.59.83:2181], sessionId=0x14b51bc8506039a, negotiated timeout=10000 *** Check failure stack trace: *** @ 0x7ffdd9a6596d google::LogMessage::Fail() I0204 18:24:49.313356 22930 group.cpp:313] Group process (group(1)@10.169.146.67:5051) connected to ZooKeeper @ 0x7ffdd9a677ad google::LogMessage::SendToLog() I0204 18:24:49.313786 22930 group.cpp:790] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0) I0204 18:24:49.314487 22930 group.cpp:385] Trying to create path '/mesos' in ZooKeeper I0204 18:24:49.323668 22930 group.cpp:717] Found non-sequence node 'log_replicas' at '/mesos' in ZooKeeper I0204 18:24:49.323806 22930 detector.cpp:138] Detected a new leader: (id='1') I0204 18:24:49.323958 22930 group.cpp:659] Trying to get '/mesos/info_0000000001' in ZooKeeper I0204 18:24:49.324595 22930 detector.cpp:433] A new leading master (UPID=master@10.171.59.83:5050) is detected @ 0x7ffdd9a6555c google::LogMessage::Flush() @ 0x7ffdd9a680a9 google::LogMessageFatal::~LogMessageFatal() @ 0x7ffdd94b7179 _CheckFatal::~_CheckFatal() @ 0x7ffdd96718e2 mesos::internal::slave::Slave::__recover() @ 0x7ffdd9a1524a process::ProcessManager::resume() @ 0x7ffdd9a1550c process::schedule() @ 0x7ffdd83832ad (unknown) @ 0x7ffdd80b834d (unknown) Aborted (core dumped) {code} Removing the --work_dir option results in the slave starting successfully.",2 MESOS-2324,"MasterAllocatorTest/0.OutOfOrderDispatch is flaky"," {noformat:title=} [ RUN ] MasterAllocatorTest/0.OutOfOrderDispatch Using temporary directory '/tmp/MasterAllocatorTest_0_OutOfOrderDispatch_kjLb9b' I0206 07:55:44.084333 15065 leveldb.cpp:175] Opened db in 25.006293ms I0206 07:55:44.089635 15065 leveldb.cpp:182] Compacted db in 5.256332ms I0206 07:55:44.089695 15065 leveldb.cpp:197] Created db iterator in 23534ns I0206 07:55:44.089710 15065 leveldb.cpp:203] Seeked to beginning of db in 2175ns I0206 07:55:44.089720 15065 leveldb.cpp:272] Iterated through 0 keys in the db in 417ns I0206 07:55:44.089781 15065 replica.cpp:743] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0206 07:55:44.093750 15086 recover.cpp:448] Starting replica recovery I0206 07:55:44.094044 15086 recover.cpp:474] Replica is in EMPTY status I0206 07:55:44.095473 15086 replica.cpp:640] Replica in EMPTY status received a broadcasted recover request I0206 07:55:44.095724 15086 recover.cpp:194] Received a recover response from a replica in EMPTY status I0206 07:55:44.096097 15086 recover.cpp:565] Updating replica status to STARTING I0206 07:55:44.106575 15086 leveldb.cpp:305] Persisting metadata (8 bytes) to leveldb took 10.289939ms I0206 07:55:44.106613 15086 replica.cpp:322] Persisted replica status to STARTING I0206 07:55:44.108144 15086 recover.cpp:474] Replica is in STARTING status I0206 07:55:44.109122 15086 replica.cpp:640] Replica in STARTING status received a broadcasted recover request I0206 07:55:44.110879 15091 recover.cpp:194] Received a recover response from a replica in STARTING status I0206 07:55:44.117267 15087 recover.cpp:565] Updating replica status to VOTING I0206 07:55:44.124771 15087 leveldb.cpp:305] Persisting metadata (8 bytes) to leveldb took 2.66794ms I0206 07:55:44.124814 15087 replica.cpp:322] Persisted replica status to VOTING I0206 07:55:44.124948 15087 recover.cpp:579] Successfully joined the Paxos group I0206 07:55:44.125095 15087 recover.cpp:463] Recover process terminated I0206 07:55:44.126204 15087 master.cpp:344] Master 20150206-075544-16842879-38895-15065 (utopic) started on 127.0.1.1:38895 I0206 07:55:44.126268 15087 master.cpp:390] Master only allowing authenticated frameworks to register I0206 07:55:44.126281 15087 master.cpp:395] Master only allowing authenticated slaves to register I0206 07:55:44.126307 15087 credentials.hpp:35] Loading credentials for authentication from '/tmp/MasterAllocatorTest_0_OutOfOrderDispatch_kjLb9b/credentials' I0206 07:55:44.126683 15087 master.cpp:439] Authorization enabled I0206 07:55:44.129329 15086 master.cpp:1350] The newly elected leader is master@127.0.1.1:38895 with id 20150206-075544-16842879-38895-15065 I0206 07:55:44.129361 15086 master.cpp:1363] Elected as the leading master! I0206 07:55:44.129389 15086 master.cpp:1181] Recovering from registrar I0206 07:55:44.129653 15088 registrar.cpp:312] Recovering registrar I0206 07:55:44.130859 15088 log.cpp:659] Attempting to start the writer I0206 07:55:44.132334 15088 replica.cpp:476] Replica received implicit promise request with proposal 1 I0206 07:55:44.135187 15088 leveldb.cpp:305] Persisting metadata (8 bytes) to leveldb took 2.825465ms I0206 07:55:44.135390 15088 replica.cpp:344] Persisted promised to 1 I0206 07:55:44.138062 15091 coordinator.cpp:229] Coordinator attemping to fill missing position I0206 07:55:44.139576 15091 replica.cpp:377] Replica received explicit promise request for position 0 with proposal 2 I0206 07:55:44.142156 15091 leveldb.cpp:342] Persisting action (8 bytes) to leveldb took 2.545543ms I0206 07:55:44.142189 15091 replica.cpp:678] Persisted action at 0 I0206 07:55:44.143414 15091 replica.cpp:510] Replica received write request for position 0 I0206 07:55:44.143468 15091 leveldb.cpp:437] Reading position from leveldb took 28872ns I0206 07:55:44.145982 15091 leveldb.cpp:342] Persisting action (14 bytes) to leveldb took 2.480277ms I0206 07:55:44.146015 15091 replica.cpp:678] Persisted action at 0 I0206 07:55:44.147050 15089 replica.cpp:657] Replica received learned notice for position 0 I0206 07:55:44.154364 15089 leveldb.cpp:342] Persisting action (16 bytes) to leveldb took 7.281644ms I0206 07:55:44.154400 15089 replica.cpp:678] Persisted action at 0 I0206 07:55:44.154422 15089 replica.cpp:663] Replica learned NOP action at position 0 I0206 07:55:44.155506 15091 log.cpp:675] Writer started with ending position 0 I0206 07:55:44.156746 15091 leveldb.cpp:437] Reading position from leveldb took 30248ns I0206 07:55:44.173681 15091 registrar.cpp:345] Successfully fetched the registry (0B) in 43.977984ms I0206 07:55:44.173821 15091 registrar.cpp:444] Applied 1 operations in 30768ns; attempting to update the 'registry' I0206 07:55:44.176213 15086 log.cpp:683] Attempting to append 119 bytes to the log I0206 07:55:44.176426 15086 coordinator.cpp:339] Coordinator attempting to write APPEND action at position 1 I0206 07:55:44.177608 15088 replica.cpp:510] Replica received write request for position 1 I0206 07:55:44.180059 15088 leveldb.cpp:342] Persisting action (136 bytes) to leveldb took 2.415145ms I0206 07:55:44.180094 15088 replica.cpp:678] Persisted action at 1 I0206 07:55:44.181324 15084 replica.cpp:657] Replica received learned notice for position 1 I0206 07:55:44.183831 15084 leveldb.cpp:342] Persisting action (138 bytes) to leveldb took 2.473724ms I0206 07:55:44.183866 15084 replica.cpp:678] Persisted action at 1 I0206 07:55:44.183887 15084 replica.cpp:663] Replica learned APPEND action at position 1 I0206 07:55:44.185510 15084 registrar.cpp:489] Successfully updated the 'registry' in 11.619072ms I0206 07:55:44.185678 15086 log.cpp:702] Attempting to truncate the log to 1 I0206 07:55:44.186111 15086 coordinator.cpp:339] Coordinator attempting to write TRUNCATE action at position 2 I0206 07:55:44.186944 15086 replica.cpp:510] Replica received write request for position 2 I0206 07:55:44.187492 15084 registrar.cpp:375] Successfully recovered registrar I0206 07:55:44.188016 15087 master.cpp:1208] Recovered 0 slaves from the Registry (83B) ; allowing 10mins for slaves to re-register I0206 07:55:44.189678 15086 leveldb.cpp:342] Persisting action (16 bytes) to leveldb took 2.702559ms I0206 07:55:44.189713 15086 replica.cpp:678] Persisted action at 2 I0206 07:55:44.190620 15086 replica.cpp:657] Replica received learned notice for position 2 I0206 07:55:44.193383 15086 leveldb.cpp:342] Persisting action (18 bytes) to leveldb took 2.737088ms I0206 07:55:44.193455 15086 leveldb.cpp:400] Deleting ~1 keys from leveldb took 37762ns I0206 07:55:44.193475 15086 replica.cpp:678] Persisted action at 2 I0206 07:55:44.193496 15086 replica.cpp:663] Replica learned TRUNCATE action at position 2 I0206 07:55:44.200028 15065 containerizer.cpp:102] Using isolation: posix/cpu,posix/mem I0206 07:55:44.212924 15088 slave.cpp:172] Slave started on 46)@127.0.1.1:38895 I0206 07:55:44.213762 15088 credentials.hpp:83] Loading credential for authentication from '/tmp/MasterAllocatorTest_0_OutOfOrderDispatch_RuNyVQ/credential' I0206 07:55:44.214251 15088 slave.cpp:281] Slave using credential for: test-principal I0206 07:55:44.214653 15088 slave.cpp:299] Slave resources: cpus(*):2; mem(*):1024; disk(*):24988; ports(*):[31000-32000] I0206 07:55:44.214918 15088 slave.cpp:328] Slave hostname: utopic I0206 07:55:44.215116 15088 slave.cpp:329] Slave checkpoint: false W0206 07:55:44.215332 15088 slave.cpp:331] Disabling checkpointing is deprecated and the --checkpoint flag will be removed in a future release. Please avoid using this flag I0206 07:55:44.217061 15090 state.cpp:32] Recovering state from '/tmp/MasterAllocatorTest_0_OutOfOrderDispatch_RuNyVQ/meta' I0206 07:55:44.235409 15088 status_update_manager.cpp:196] Recovering status update manager I0206 07:55:44.235601 15088 containerizer.cpp:299] Recovering containerizer I0206 07:55:44.236486 15088 slave.cpp:3526] Finished recovery I0206 07:55:44.237709 15087 status_update_manager.cpp:170] Pausing sending status updates I0206 07:55:44.237890 15088 slave.cpp:620] New master detected at master@127.0.1.1:38895 I0206 07:55:44.241575 15088 slave.cpp:683] Authenticating with master master@127.0.1.1:38895 I0206 07:55:44.247459 15088 slave.cpp:688] Using default CRAM-MD5 authenticatee I0206 07:55:44.248617 15089 authenticatee.hpp:137] Creating new client SASL connection I0206 07:55:44.249099 15089 master.cpp:3788] Authenticating slave(46)@127.0.1.1:38895 I0206 07:55:44.249137 15089 master.cpp:3799] Using default CRAM-MD5 authenticator I0206 07:55:44.249728 15089 authenticator.hpp:169] Creating new server SASL connection I0206 07:55:44.250285 15089 authenticatee.hpp:228] Received SASL authentication mechanisms: CRAM-MD5 I0206 07:55:44.250496 15089 authenticatee.hpp:254] Attempting to authenticate with mechanism 'CRAM-MD5' I0206 07:55:44.250452 15088 slave.cpp:656] Detecting new master I0206 07:55:44.251063 15091 authenticator.hpp:275] Received SASL authentication start I0206 07:55:44.251124 15091 authenticator.hpp:397] Authentication requires more steps I0206 07:55:44.251256 15089 authenticatee.hpp:274] Received SASL authentication step I0206 07:55:44.251451 15090 authenticator.hpp:303] Received SASL authentication step I0206 07:55:44.251575 15090 authenticator.hpp:389] Authentication success I0206 07:55:44.251687 15090 master.cpp:3846] Successfully authenticated principal 'test-principal' at slave(46)@127.0.1.1:38895 I0206 07:55:44.253306 15089 authenticatee.hpp:314] Authentication success I0206 07:55:44.258015 15089 slave.cpp:754] Successfully authenticated with master master@127.0.1.1:38895 I0206 07:55:44.258468 15089 master.cpp:2913] Registering slave at slave(46)@127.0.1.1:38895 (utopic) with id 20150206-075544-16842879-38895-15065-S0 I0206 07:55:44.259028 15089 registrar.cpp:444] Applied 1 operations in 88902ns; attempting to update the 'registry' I0206 07:55:44.269492 15065 sched.cpp:149] Version: 0.22.0 I0206 07:55:44.270539 15090 sched.cpp:246] New master detected at master@127.0.1.1:38895 I0206 07:55:44.270614 15090 sched.cpp:302] Authenticating with master master@127.0.1.1:38895 I0206 07:55:44.270634 15090 sched.cpp:309] Using default CRAM-MD5 authenticatee I0206 07:55:44.270900 15090 authenticatee.hpp:137] Creating new client SASL connection I0206 07:55:44.272300 15089 log.cpp:683] Attempting to append 285 bytes to the log I0206 07:55:44.272552 15089 coordinator.cpp:339] Coordinator attempting to write APPEND action at position 3 I0206 07:55:44.273609 15086 master.cpp:3788] Authenticating scheduler-d6cac0a1-d461-4a05-b19d-5cbdae239eb0@127.0.1.1:38895 I0206 07:55:44.273643 15086 master.cpp:3799] Using default CRAM-MD5 authenticator I0206 07:55:44.273955 15086 authenticator.hpp:169] Creating new server SASL connection I0206 07:55:44.274617 15090 authenticatee.hpp:228] Received SASL authentication mechanisms: CRAM-MD5 I0206 07:55:44.274813 15090 authenticatee.hpp:254] Attempting to authenticate with mechanism 'CRAM-MD5' I0206 07:55:44.275171 15088 authenticator.hpp:275] Received SASL authentication start I0206 07:55:44.275215 15088 authenticator.hpp:397] Authentication requires more steps I0206 07:55:44.275408 15090 authenticatee.hpp:274] Received SASL authentication step I0206 07:55:44.275696 15084 authenticator.hpp:303] Received SASL authentication step I0206 07:55:44.275774 15084 authenticator.hpp:389] Authentication success I0206 07:55:44.275876 15084 master.cpp:3846] Successfully authenticated principal 'test-principal' at scheduler-d6cac0a1-d461-4a05-b19d-5cbdae239eb0@127.0.1.1:38895 I0206 07:55:44.277593 15090 authenticatee.hpp:314] Authentication success I0206 07:55:44.278201 15086 sched.cpp:390] Successfully authenticated with master master@127.0.1.1:38895 I0206 07:55:44.278548 15086 master.cpp:1568] Received registration request for framework 'framework1' at scheduler-d6cac0a1-d461-4a05-b19d-5cbdae239eb0@127.0.1.1:38895 I0206 07:55:44.278642 15086 master.cpp:1429] Authorizing framework principal 'test-principal' to receive offers for role '*' I0206 07:55:44.279157 15086 master.cpp:1632] Registering framework 20150206-075544-16842879-38895-15065-0000 (framework1) at scheduler-d6cac0a1-d461-4a05-b19d-5cbdae239eb0@127.0.1.1:38895 I0206 07:55:44.280081 15086 sched.cpp:440] Framework registered with 20150206-075544-16842879-38895-15065-0000 I0206 07:55:44.280320 15086 hierarchical_allocator_process.hpp:318] Added framework 20150206-075544-16842879-38895-15065-0000 I0206 07:55:44.281411 15089 replica.cpp:510] Replica received write request for position 3 I0206 07:55:44.282289 15085 master.cpp:2901] Ignoring register slave message from slave(46)@127.0.1.1:38895 (utopic) as admission is already in progress I0206 07:55:44.284984 15089 leveldb.cpp:342] Persisting action (304 bytes) to leveldb took 3.368213ms I0206 07:55:44.285020 15089 replica.cpp:678] Persisted action at 3 I0206 07:55:44.285893 15089 replica.cpp:657] Replica received learned notice for position 3 I0206 07:55:44.288350 15089 leveldb.cpp:342] Persisting action (306 bytes) to leveldb took 2.430449ms I0206 07:55:44.288384 15089 replica.cpp:678] Persisted action at 3 I0206 07:55:44.288405 15089 replica.cpp:663] Replica learned APPEND action at position 3 I0206 07:55:44.290154 15089 registrar.cpp:489] Successfully updated the 'registry' in 31.046912ms I0206 07:55:44.290307 15085 log.cpp:702] Attempting to truncate the log to 3 I0206 07:55:44.290671 15085 coordinator.cpp:339] Coordinator attempting to write TRUNCATE action at position 4 I0206 07:55:44.291482 15085 replica.cpp:510] Replica received write request for position 4 I0206 07:55:44.292559 15087 master.cpp:2970] Registered slave 20150206-075544-16842879-38895-15065-S0 at slave(46)@127.0.1.1:38895 (utopic) with cpus(*):2; mem(*):1024; disk(*):24988; ports(*):[31000-32000] I0206 07:55:44.292940 15087 slave.cpp:788] Registered with master master@127.0.1.1:38895; given slave ID 20150206-075544-16842879-38895-15065-S0 I0206 07:55:44.293298 15087 hierarchical_allocator_process.hpp:450] Added slave 20150206-075544-16842879-38895-15065-S0 (utopic) with cpus(*):2; mem(*):1024; disk(*):24988; ports(*):[31000-32000] (and cpus(*):2; mem(*):1024; disk(*):24988; ports(*):[31000-32000] available) I0206 07:55:44.293684 15087 status_update_manager.cpp:177] Resuming sending status updates I0206 07:55:44.294085 15087 master.cpp:3730] Sending 1 offers to framework 20150206-075544-16842879-38895-15065-0000 (framework1) at scheduler-d6cac0a1-d461-4a05-b19d-5cbdae239eb0@127.0.1.1:38895 I0206 07:55:44.299957 15085 leveldb.cpp:342] Persisting action (16 bytes) to leveldb took 8.442691ms I0206 07:55:44.300165 15085 replica.cpp:678] Persisted action at 4 I0206 07:55:44.300698 15065 sched.cpp:1468] Asked to stop the driver I0206 07:55:44.301127 15090 sched.cpp:806] Stopping framework '20150206-075544-16842879-38895-15065-0000' I0206 07:55:44.301503 15090 master.cpp:1892] Asked to unregister framework 20150206-075544-16842879-38895-15065-0000 I0206 07:55:44.301535 15090 master.cpp:4158] Removing framework 20150206-075544-16842879-38895-15065-0000 (framework1) at scheduler-d6cac0a1-d461-4a05-b19d-5cbdae239eb0@127.0.1.1:38895 I0206 07:55:44.302376 15090 slave.cpp:1592] Asked to shut down framework 20150206-075544-16842879-38895-15065-0000 by master@127.0.1.1:38895 W0206 07:55:44.302407 15090 slave.cpp:1607] Cannot shut down unknown framework 20150206-075544-16842879-38895-15065-0000 I0206 07:55:44.302814 15090 hierarchical_allocator_process.hpp:397] Deactivated framework 20150206-075544-16842879-38895-15065-0000 I0206 07:55:44.302947 15090 hierarchical_allocator_process.hpp:351] Removed framework 20150206-075544-16842879-38895-15065-0000 I0206 07:55:44.309281 15086 hierarchical_allocator_process.hpp:642] Recovered cpus(*):2; mem(*):1024; disk(*):24988; ports(*):[31000-32000] (total allocatable: cpus(*):2; mem(*):1024; disk(*):24988; ports(*):[31000-32000]) on slave 20150206-075544-16842879-38895-15065-S0 from framework 20150206-075544-16842879-38895-15065-0000 I0206 07:55:44.310158 15084 replica.cpp:657] Replica received learned notice for position 4 I0206 07:55:44.313246 15084 leveldb.cpp:342] Persisting action (18 bytes) to leveldb took 3.055049ms I0206 07:55:44.313328 15084 leveldb.cpp:400] Deleting ~2 keys from leveldb took 45270ns I0206 07:55:44.313349 15084 replica.cpp:678] Persisted action at 4 I0206 07:55:44.313374 15084 replica.cpp:663] Replica learned TRUNCATE action at position 4 I0206 07:55:44.329591 15065 sched.cpp:149] Version: 0.22.0 I0206 07:55:44.330258 15088 sched.cpp:246] New master detected at master@127.0.1.1:38895 I0206 07:55:44.330346 15088 sched.cpp:302] Authenticating with master master@127.0.1.1:38895 I0206 07:55:44.330368 15088 sched.cpp:309] Using default CRAM-MD5 authenticatee I0206 07:55:44.330652 15088 authenticatee.hpp:137] Creating new client SASL connection I0206 07:55:44.331403 15088 master.cpp:3788] Authenticating scheduler-7bdaa90b-eb9f-4009-bd5a-d07fd3f24cec@127.0.1.1:38895 I0206 07:55:44.331717 15088 master.cpp:3799] Using default CRAM-MD5 authenticator I0206 07:55:44.332293 15088 authenticator.hpp:169] Creating new server SASL connection I0206 07:55:44.332655 15088 authenticatee.hpp:228] Received SASL authentication mechanisms: CRAM-MD5 I0206 07:55:44.332684 15088 authenticatee.hpp:254] Attempting to authenticate with mechanism 'CRAM-MD5' I0206 07:55:44.332792 15088 authenticator.hpp:275] Received SASL authentication start I0206 07:55:44.332835 15088 authenticator.hpp:397] Authentication requires more steps I0206 07:55:44.332903 15088 authenticatee.hpp:274] Received SASL authentication step I0206 07:55:44.332983 15088 authenticator.hpp:303] Received SASL authentication step I0206 07:55:44.333056 15088 authenticator.hpp:389] Authentication success I0206 07:55:44.333153 15088 authenticatee.hpp:314] Authentication success I0206 07:55:44.333297 15091 master.cpp:3846] Successfully authenticated principal 'test-principal' at scheduler-7bdaa90b-eb9f-4009-bd5a-d07fd3f24cec@127.0.1.1:38895 I0206 07:55:44.334326 15087 sched.cpp:390] Successfully authenticated with master master@127.0.1.1:38895 I0206 07:55:44.334645 15087 master.cpp:1568] Received registration request for framework 'framework2' at scheduler-7bdaa90b-eb9f-4009-bd5a-d07fd3f24cec@127.0.1.1:38895 I0206 07:55:44.334722 15087 master.cpp:1429] Authorizing framework principal 'test-principal' to receive offers for role '*' I0206 07:55:44.335153 15087 master.cpp:1632] Registering framework 20150206-075544-16842879-38895-15065-0001 (framework2) at scheduler-7bdaa90b-eb9f-4009-bd5a-d07fd3f24cec@127.0.1.1:38895 I0206 07:55:44.336019 15087 sched.cpp:440] Framework registered with 20150206-075544-16842879-38895-15065-0001 I0206 07:55:44.336156 15087 hierarchical_allocator_process.hpp:318] Added framework 20150206-075544-16842879-38895-15065-0001 I0206 07:55:44.336796 15087 master.cpp:3730] Sending 1 offers to framework 20150206-075544-16842879-38895-15065-0001 (framework2) at scheduler-7bdaa90b-eb9f-4009-bd5a-d07fd3f24cec@127.0.1.1:38895 I0206 07:55:44.337725 15065 sched.cpp:1468] Asked to stop the driver I0206 07:55:44.338002 15086 sched.cpp:806] Stopping framework '20150206-075544-16842879-38895-15065-0001' I0206 07:55:44.338297 15090 master.cpp:1892] Asked to unregister framework 20150206-075544-16842879-38895-15065-0001 I0206 07:55:44.338353 15090 master.cpp:4158] Removing framework 20150206-075544-16842879-38895-15065-0001 (framework2) at scheduler-7bdaa90b-eb9f-4009-bd5a-d07fd3f24cec@127.0.1.1:38895 ../../src/tests/master_allocator_tests.cpp:300: Failure Mock function called more times than expected - taking default action specified at: ../../src/tests/mesos.hpp:713: Function call: deactivateFramework(@0x7fdb74008d70 20150206-075544-16842879-38895-15065-0001) Expected: to be called once Actual: called twice - over...",1 MESOS-2332,"Report per-container metrics for network bandwidth throttling","Export metrics from the network isolation to identify scope and duration of container throttling. Packet loss can be identified from the overlimits and requeues fields of the htb qdisc report for the virtual interface, e.g. {noformat} $ tc -s -d qdisc show dev mesos19223 qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 158213287452 bytes 1030876393 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 qdisc ingress ffff: parent ffff:fff1 ---------------- Sent 119381747824 bytes 1144549901 pkt (dropped 2044879, overlimits 0 requeues 0) backlog 0b 0p requeues 0 {noformat} Note that since a packet can be examined multiple times before transmission, overlimits can exceed total packets sent. Add to the port_mapping isolator usage() and the container statistics protobuf. Carefully consider the naming (esp tx/rx) + commenting of the protobuf fields so it's clear what these represent and how they are different to the existing dropped packet counts from the network stack.",5 MESOS-2335,"Mesos Lifecycle Modules","A new kind of module that receives callbacks at significant life cycle events of its host libprocess process. Typically the latter is a Mesos slave or master and the life time of the libprocess process coincides with the underlying OS process. h4. Motivation and Use Cases We want to add customized and experimental capabilities that concern the life time of Mesos components without protruding into Mesos source code and without creating new build process dependencies for everybody. Example use cases: 1. A slave or master life cycle module that gathers fail-over incidents and reports summaries thereof to a remote data sink. 2. A slave module that observes host computer metrics and correlates these with task activity. This can be used to find resources leaks and to prevent, respectively guide, oversubscription. 3. Upgrades and provisioning that require shutdown and restart. h4. Specifics The specific life cycle events that we want to get notified about and want to be able to act upon are: - Process is spawning/initializing - Process is terminating/finalizing In all these cases, a reference to the process is passed as a parameter, giving the module access for inspection and reaction. h4. Module Classification Unlike other named modules, a life cycle module does not directly replace or provide essential Mesos functionality (such as an Isolator module does). Unlike a decorator module it does not directly add or inject data into Mesos core either.",1 MESOS-2337,"__init__.py not getting installed in $PREFIX/lib/pythonX.Y/site-packages/mesos","When doing a {{make install}}, the src/python/native/src/mesos/__init__.py file is not getting installed in {{$PREFIX/lib/pythonX.Y/site-packages/mesos/}}. This makes it impossible to do the following import when {{PYTHONPATH}} is set to the {{site-packages}} directory. {code} import mesos.interface.mesos_pb2 {code} The directories {{$PREFIX/lib/pythonX.Y/site-packages/mesos/interface, native}} do have their corresponding {{__init__.py}} files. Reproducing the bug: {code} ../configure --prefix=$HOME/test-install && make install {code}",2 MESOS-2340,"Add ability to decode JSON serialized MasterInfo from ZK","Currently to discover the master a client needs the ZK node location and access to the MasterInfo protobuf so it can deserialize the binary blob in the node. I think it would be nice to publish JSON (like Twitter's ServerSets) so clients are not tied to protobuf to do service discovery. This ticket is an intermediate (compatibility) step: we add in {{0.23}} the ability for the {{Detector}} to ""understand"" JSON **alongside** Protobuf serialized format; this makes it compatible with both earlier versions, as well a future one (most likely, {{0.24}}) that will write the {{MasterInfo}} information in JSON format.",5 MESOS-2347,"Add ability for schedulers to explicitly acknowledge status updates on the driver.","In order for schedulers to be able to handle status updates in a scalable manner, they need the ability to send acknowledgements through the driver. This enables optimizations in schedulers (e.g. process status updates asynchronously w/o backing up the driver, processing/acking updates in batch). Without this, an implicit reconciliation can overload a scheduler (hence the motivation for MESOS-2308).",8 MESOS-2349,"Provide a way to execute an arbitrary process in a MesosContainerizer container context","Include a separate binary that when provided with a container_id, path to an executable, and optional arguments will find the container context, enter it, and exec the executable. e.g., {noformat} mesos-container-exec --container_id=abc123 [--] /path/to/executable [arg1 ...] {noformat} This need only support (initially) containers created with the MesosContainerizer and will support all isolators shipped with Mesos, i.e., it should find and enter the cgroups and namespaces for the running executor of the specified container.",5 MESOS-2350,"Add support for MesosContainerizerLaunch to chroot to a specified path","In preparation for the MesosContainerizer to support a filesystem isolator the MesosContainerizerLauncher must support chrooting. Optionally, it should also configure the chroot environment by (re-)mounting special filesystems such as /proc and /sys and making device nodes such as /dev/zero, etc., such that the chroot environment is functional.",5 MESOS-2353,"Improve performance of the state.json endpoint for large clusters.","The master's state.json endpoint consistently takes a long time to compute the JSON result, for large clusters: {noformat} $ time curl -s -o /dev/null localhost:5050/master/state.json Mon Jan 26 22:38:50 UTC 2015 real 0m13.174s user 0m0.003s sys 0m0.022s {noformat} This can cause the master to get backlogged if there are many state.json requests in flight. Looking at {{perf}} data, it seems most of the time is spent doing memory allocation / de-allocation. This ticket will try to capture any low hanging fruit to speed this up. Possibly we can leverage moves if they are not already being used by the compiler.",5 MESOS-2366,"MasterSlaveReconciliationTest.ReconcileLostTask is flaky","https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/2746/changes {code} [ RUN ] MasterSlaveReconciliationTest.ReconcileLostTask Using temporary directory '/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_Rgb8FF' I0218 01:53:26.881561 13918 leveldb.cpp:175] Opened db in 2.891605ms I0218 01:53:26.882547 13918 leveldb.cpp:182] Compacted db in 953447ns I0218 01:53:26.882596 13918 leveldb.cpp:197] Created db iterator in 20629ns I0218 01:53:26.882616 13918 leveldb.cpp:203] Seeked to beginning of db in 2370ns I0218 01:53:26.882627 13918 leveldb.cpp:272] Iterated through 0 keys in the db in 348ns I0218 01:53:26.882664 13918 replica.cpp:743] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0218 01:53:26.883124 13947 recover.cpp:448] Starting replica recovery I0218 01:53:26.883625 13941 recover.cpp:474] Replica is in 4 status I0218 01:53:26.884744 13945 replica.cpp:640] Replica in 4 status received a broadcasted recover request I0218 01:53:26.885118 13939 recover.cpp:194] Received a recover response from a replica in 4 status I0218 01:53:26.885565 13933 recover.cpp:565] Updating replica status to 3 I0218 01:53:26.886548 13932 leveldb.cpp:305] Persisting metadata (8 bytes) to leveldb took 733223ns I0218 01:53:26.886574 13932 replica.cpp:322] Persisted replica status to 3 I0218 01:53:26.886714 13943 master.cpp:347] Master 20150218-015326-3142697795-57268-13918 (pomona.apache.org) started on 67.195.81.187:57268 I0218 01:53:26.886760 13943 master.cpp:393] Master only allowing authenticated frameworks to register I0218 01:53:26.886772 13943 master.cpp:398] Master only allowing authenticated slaves to register I0218 01:53:26.886798 13943 credentials.hpp:36] Loading credentials for authentication from '/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_Rgb8FF/credentials' I0218 01:53:26.886826 13934 recover.cpp:474] Replica is in 3 status I0218 01:53:26.887151 13943 master.cpp:440] Authorization enabled I0218 01:53:26.887866 13944 replica.cpp:640] Replica in 3 status received a broadcasted recover request I0218 01:53:26.887969 13942 whitelist_watcher.cpp:78] No whitelist given I0218 01:53:26.888021 13940 hierarchical.hpp:286] Initialized hierarchical allocator process I0218 01:53:26.888178 13934 recover.cpp:194] Received a recover response from a replica in 3 status I0218 01:53:26.889114 13943 master.cpp:1354] The newly elected leader is master@67.195.81.187:57268 with id 20150218-015326-3142697795-57268-13918 I0218 01:53:27.064930 13948 process.cpp:2117] Dropped / Lost event for PID: hierarchical-allocator(183)@67.195.81.187:57268 I0218 01:53:27.911870 13943 master.cpp:1367] Elected as the leading master! I0218 01:53:27.911911 13943 master.cpp:1185] Recovering from registrar I0218 01:53:27.912106 13948 process.cpp:2117] Dropped / Lost event for PID: scheduler-93f78006-5b69-498b-b4e3-87cdf8062263@67.195.81.187:57268 I0218 01:53:27.912255 13932 registrar.cpp:312] Recovering registrar I0218 01:53:27.912307 13948 process.cpp:2117] Dropped / Lost event for PID: hierarchical-allocator(179)@67.195.81.187:57268 I0218 01:53:27.912626 13940 hierarchical.hpp:831] No resources available to allocate! I0218 01:53:27.912658 13940 hierarchical.hpp:738] Performed allocation for 0 slaves in 60316ns I0218 01:53:27.912838 13947 recover.cpp:565] Updating replica status to 1 I0218 01:53:27.913966 13947 leveldb.cpp:305] Persisting metadata (8 bytes) to leveldb took 921045ns I0218 01:53:27.913998 13947 replica.cpp:322] Persisted replica status to 1 I0218 01:53:27.914106 13932 recover.cpp:579] Successfully joined the Paxos group I0218 01:53:27.914378 13932 recover.cpp:463] Recover process terminated I0218 01:53:27.914916 13939 log.cpp:659] Attempting to start the writer I0218 01:53:27.916374 13937 replica.cpp:476] Replica received implicit promise request with proposal 1 I0218 01:53:27.916941 13937 leveldb.cpp:305] Persisting metadata (8 bytes) to leveldb took 534122ns I0218 01:53:27.916967 13937 replica.cpp:344] Persisted promised to 1 I0218 01:53:27.917795 13936 coordinator.cpp:229] Coordinator attemping to fill missing position I0218 01:53:27.919147 13941 replica.cpp:377] Replica received explicit promise request for position 0 with proposal 2 I0218 01:53:27.919492 13941 leveldb.cpp:342] Persisting action (8 bytes) to leveldb took 306270ns I0218 01:53:27.919517 13941 replica.cpp:678] Persisted action at 0 I0218 01:53:27.920755 13934 replica.cpp:510] Replica received write request for position 0 I0218 01:53:27.920819 13934 leveldb.cpp:437] Reading position from leveldb took 33747ns I0218 01:53:27.921195 13934 leveldb.cpp:342] Persisting action (14 bytes) to leveldb took 340479ns I0218 01:53:27.921221 13934 replica.cpp:678] Persisted action at 0 I0218 01:53:27.921916 13932 replica.cpp:657] Replica received learned notice for position 0 I0218 01:53:27.922339 13932 leveldb.cpp:342] Persisting action (16 bytes) to leveldb took 392653ns I0218 01:53:27.922365 13932 replica.cpp:678] Persisted action at 0 I0218 01:53:27.922386 13932 replica.cpp:663] Replica learned 1 action at position 0 I0218 01:53:27.923009 13945 log.cpp:675] Writer started with ending position 0 I0218 01:53:27.924167 13937 leveldb.cpp:437] Reading position from leveldb took 29219ns I0218 01:53:27.927683 13932 registrar.cpp:345] Successfully fetched the registry (0B) in 15.376128ms I0218 01:53:27.927789 13932 registrar.cpp:444] Applied 1 operations in 23004ns; attempting to update the 'registry' I0218 01:53:27.929957 13947 log.cpp:683] Attempting to append 139 bytes to the log I0218 01:53:27.930058 13936 coordinator.cpp:339] Coordinator attempting to write 2 action at position 1 I0218 01:53:27.930637 13934 replica.cpp:510] Replica received write request for position 1 I0218 01:53:27.930954 13934 leveldb.cpp:342] Persisting action (158 bytes) to leveldb took 286664ns I0218 01:53:27.930975 13934 replica.cpp:678] Persisted action at 1 I0218 01:53:27.931521 13942 replica.cpp:657] Replica received learned notice for position 1 I0218 01:53:27.931813 13942 leveldb.cpp:342] Persisting action (160 bytes) to leveldb took 267316ns I0218 01:53:27.931833 13942 replica.cpp:678] Persisted action at 1 I0218 01:53:27.931849 13942 replica.cpp:663] Replica learned 2 action at position 1 I0218 01:53:27.932617 13935 registrar.cpp:489] Successfully updated the 'registry' in 4.722944ms I0218 01:53:27.932726 13935 registrar.cpp:375] Successfully recovered registrar I0218 01:53:27.932751 13940 log.cpp:702] Attempting to truncate the log to 1 I0218 01:53:27.932865 13944 coordinator.cpp:339] Coordinator attempting to write 3 action at position 2 I0218 01:53:27.932998 13939 master.cpp:1212] Recovered 0 slaves from the Registry (101B) ; allowing 10mins for slaves to re-register I0218 01:53:27.933732 13936 replica.cpp:510] Replica received write request for position 2 I0218 01:53:27.934146 13936 leveldb.cpp:342] Persisting action (16 bytes) to leveldb took 386584ns I0218 01:53:27.934167 13936 replica.cpp:678] Persisted action at 2 I0218 01:53:27.934708 13935 replica.cpp:657] Replica received learned notice for position 2 I0218 01:53:27.935081 13935 leveldb.cpp:342] Persisting action (18 bytes) to leveldb took 350891ns I0218 01:53:27.935127 13935 leveldb.cpp:400] Deleting ~1 keys from leveldb took 24983ns I0218 01:53:27.935140 13935 replica.cpp:678] Persisted action at 2 I0218 01:53:27.935158 13935 replica.cpp:663] Replica learned 3 action at position 2 I0218 01:53:27.947561 13918 containerizer.cpp:104] Using isolation: posix/cpu,posix/mem I0218 01:53:27.948971 13941 slave.cpp:173] Slave started on 150)@67.195.81.187:57268 I0218 01:53:27.949003 13941 credentials.hpp:84] Loading credential for authentication from '/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_5No5Rj/credential' I0218 01:53:27.949167 13941 slave.cpp:280] Slave using credential for: test-principal I0218 01:53:27.949465 13941 slave.cpp:298] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0218 01:53:27.949556 13941 slave.cpp:327] Slave hostname: pomona.apache.org I0218 01:53:27.949575 13941 slave.cpp:328] Slave checkpoint: false W0218 01:53:27.949587 13941 slave.cpp:330] Disabling checkpointing is deprecated and the --checkpoint flag will be removed in a future release. Please avoid using this flag I0218 01:53:27.950536 13932 state.cpp:34] Recovering state from '/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_5No5Rj/meta' I0218 01:53:27.950783 13940 status_update_manager.cpp:196] Recovering status update manager I0218 01:53:27.953531 13944 containerizer.cpp:301] Recovering containerizer I0218 01:53:27.953944 13918 sched.cpp:151] Version: 0.22.0 I0218 01:53:27.954617 13932 slave.cpp:3611] Finished recovery I0218 01:53:27.954732 13935 sched.cpp:248] New master detected at master@67.195.81.187:57268 I0218 01:53:27.954833 13935 sched.cpp:304] Authenticating with master master@67.195.81.187:57268 I0218 01:53:27.954856 13935 sched.cpp:311] Using default CRAM-MD5 authenticatee I0218 01:53:27.955037 13947 authenticatee.hpp:138] Creating new client SASL connection I0218 01:53:27.955198 13944 status_update_manager.cpp:170] Pausing sending status updates I0218 01:53:27.955195 13941 slave.cpp:623] New master detected at master@67.195.81.187:57268 I0218 01:53:27.955238 13934 master.cpp:3811] Authenticating scheduler-17aa8fa2-195f-43d6-85d7-87b949d4419b@67.195.81.187:57268 I0218 01:53:27.955270 13934 master.cpp:3822] Using default CRAM-MD5 authenticator I0218 01:53:27.955317 13941 slave.cpp:686] Authenticating with master master@67.195.81.187:57268 I0218 01:53:27.955348 13941 slave.cpp:691] Using default CRAM-MD5 authenticatee I0218 01:53:27.955518 13933 authenticator.hpp:169] Creating new server SASL connection I0218 01:53:27.955534 13939 authenticatee.hpp:138] Creating new client SASL connection I0218 01:53:27.955693 13935 authenticatee.hpp:229] Received SASL authentication mechanisms: CRAM-MD5 I0218 01:53:27.955732 13935 authenticatee.hpp:255] Attempting to authenticate with mechanism 'CRAM-MD5' I0218 01:53:27.955844 13932 authenticator.hpp:275] Received SASL authentication start I0218 01:53:27.955905 13932 authenticator.hpp:397] Authentication requires more steps I0218 01:53:27.955999 13935 authenticatee.hpp:275] Received SASL authentication step I0218 01:53:27.956120 13932 authenticator.hpp:303] Received SASL authentication step I0218 01:53:27.957321 13941 slave.cpp:659] Detecting new master I0218 01:53:27.957473 13934 master.cpp:3811] Authenticating slave(150)@67.195.81.187:57268 I0218 01:53:28.009866 13948 process.cpp:2117] Dropped / Lost event for PID: slave(146)@67.195.81.187:57268 I0218 01:53:28.592335 13932 auxprop.cpp:98] Request to lookup properties for user: 'test-principal' realm: 'pomona.apache.org' server FQDN: 'pomona.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0218 01:53:28.592350 13934 master.cpp:3822] Using default CRAM-MD5 authenticator I0218 01:53:28.592367 13932 auxprop.cpp:170] Looking up auxiliary property '*userPassword' I0218 01:53:28.592434 13932 auxprop.cpp:170] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0218 01:53:28.592483 13932 auxprop.cpp:98] Request to lookup properties for user: 'test-principal' realm: 'pomona.apache.org' server FQDN: 'pomona.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0218 01:53:28.592501 13932 auxprop.cpp:120] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0218 01:53:28.592510 13932 auxprop.cpp:120] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0218 01:53:28.592530 13932 authenticator.hpp:389] Authentication success I0218 01:53:28.592646 13935 authenticatee.hpp:315] Authentication success I0218 01:53:28.592686 13948 process.cpp:2117] Dropped / Lost event for PID: scheduler-4eee5e93-d6bb-4af4-9795-0aec0916dfa5@67.195.81.187:57268 I0218 01:53:28.592800 13939 authenticator.hpp:169] Creating new server SASL connection I0218 01:53:28.592836 13948 process.cpp:2117] Dropped / Lost event for PID: hierarchical-allocator(180)@67.195.81.187:57268 I0218 01:53:28.592864 13934 master.cpp:3869] Successfully authenticated principal 'test-principal' at scheduler-17aa8fa2-195f-43d6-85d7-87b949d4419b@67.195.81.187:57268 I0218 01:53:28.592990 13933 authenticatee.hpp:229] Received SASL authentication mechanisms: CRAM-MD5 I0218 01:53:28.593029 13933 authenticatee.hpp:255] Attempting to authenticate with mechanism 'CRAM-MD5' I0218 01:53:28.593245 13933 authenticator.hpp:275] Received SASL authentication start I0218 01:53:28.593364 13933 authenticator.hpp:397] Authentication requires more steps I0218 01:53:28.593490 13941 sched.cpp:392] Successfully authenticated with master master@67.195.81.187:57268 I0218 01:53:28.593519 13941 sched.cpp:515] Sending registration request to master@67.195.81.187:57268 I0218 01:53:28.593531 13945 authenticatee.hpp:275] Received SASL authentication step I0218 01:53:28.593606 13941 sched.cpp:548] Will retry registration in 1.707160316secs if necessary I0218 01:53:28.593720 13933 authenticator.hpp:303] Received SASL authentication step I0218 01:53:28.593731 13939 master.cpp:1572] Received registration request for framework 'default' at scheduler-17aa8fa2-195f-43d6-85d7-87b949d4419b@67.195.81.187:57268 I0218 01:53:28.593757 13933 auxprop.cpp:98] Request to lookup properties for user: 'test-principal' realm: 'pomona.apache.org' server FQDN: 'pomona.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0218 01:53:28.593780 13933 auxprop.cpp:170] Looking up auxiliary property '*userPassword' I0218 01:53:28.593818 13939 master.cpp:1433] Authorizing framework principal 'test-principal' to receive offers for role '*' I0218 01:53:28.593823 13933 auxprop.cpp:170] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0218 01:53:28.593891 13933 auxprop.cpp:98] Request to lookup properties for user: 'test-principal' realm: 'pomona.apache.org' server FQDN: 'pomona.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0218 01:53:28.593909 13933 auxprop.cpp:120] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0218 01:53:28.593919 13933 auxprop.cpp:120] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0218 01:53:28.593947 13933 authenticator.hpp:389] Authentication success I0218 01:53:28.594048 13945 authenticatee.hpp:315] Authentication success I0218 01:53:28.594140 13946 master.cpp:3869] Successfully authenticated principal 'test-principal' at slave(150)@67.195.81.187:57268 I0218 01:53:28.594383 13947 slave.cpp:757] Successfully authenticated with master master@67.195.81.187:57268 I0218 01:53:28.594571 13947 slave.cpp:1089] Will retry registration in 17.484321ms if necessary I0218 01:53:28.594606 13946 master.cpp:1636] Registering framework 20150218-015326-3142697795-57268-13918-0000 (default) at scheduler-17aa8fa2-195f-43d6-85d7-87b949d4419b@67.195.81.187:57268 I0218 01:53:28.594995 13944 hierarchical.hpp:320] Added framework 20150218-015326-3142697795-57268-13918-0000 I0218 01:53:28.595034 13944 hierarchical.hpp:831] No resources available to allocate! I0218 01:53:28.595057 13944 hierarchical.hpp:738] Performed allocation for 0 slaves in 35451ns I0218 01:53:28.595185 13937 sched.cpp:442] Framework registered with 20150218-015326-3142697795-57268-13918-0000 I0218 01:53:28.595232 13937 sched.cpp:456] Scheduler::registered took 22922ns I0218 01:53:28.595273 13946 master.cpp:2936] Registering slave at slave(150)@67.195.81.187:57268 (pomona.apache.org) with id 20150218-015326-3142697795-57268-13918-S0 I0218 01:53:28.595803 13934 registrar.cpp:444] Applied 1 operations in 74798ns; attempting to update the 'registry' I0218 01:53:28.598387 13939 log.cpp:683] Attempting to append 316 bytes to the log I0218 01:53:28.598578 13938 coordinator.cpp:339] Coordinator attempting to write 2 action at position 3 I0218 01:53:28.599488 13932 replica.cpp:510] Replica received write request for position 3 I0218 01:53:28.599758 13932 leveldb.cpp:342] Persisting action (335 bytes) to leveldb took 234907ns I0218 01:53:28.599786 13932 replica.cpp:678] Persisted action at 3 I0218 01:53:28.600777 13939 replica.cpp:657] Replica received learned notice for position 3 I0218 01:53:28.601304 13939 leveldb.cpp:342] Persisting action (337 bytes) to leveldb took 503852ns I0218 01:53:28.601326 13939 replica.cpp:678] Persisted action at 3 I0218 01:53:28.601346 13939 replica.cpp:663] Replica learned 2 action at position 3 I0218 01:53:28.602901 13934 log.cpp:702] Attempting to truncate the log to 3 I0218 01:53:28.603011 13938 coordinator.cpp:339] Coordinator attempting to write 3 action at position 4 I0218 01:53:28.603135 13932 registrar.cpp:489] Successfully updated the 'registry' in 7.035904ms I0218 01:53:28.603687 13932 replica.cpp:510] Replica received write request for position 4 I0218 01:53:28.603844 13934 slave.cpp:2666] Received ping from slave-observer(147)@67.195.81.187:57268 I0218 01:53:28.603945 13941 master.cpp:2993] Registered slave 20150218-015326-3142697795-57268-13918-S0 at slave(150)@67.195.81.187:57268 (pomona.apache.org) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0218 01:53:28.604046 13933 hierarchical.hpp:452] Added slave 20150218-015326-3142697795-57268-13918-S0 (pomona.apache.org) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (and cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] available) I0218 01:53:28.604112 13932 leveldb.cpp:342] Persisting action (16 bytes) to leveldb took 399822ns I0218 01:53:28.604131 13932 replica.cpp:678] Persisted action at 4 I0218 01:53:28.605741 13933 hierarchical.hpp:756] Performed allocation for slave 20150218-015326-3142697795-57268-13918-S0 in 1.649293ms I0218 01:53:28.605836 13934 slave.cpp:791] Registered with master master@67.195.81.187:57268; given slave ID 20150218-015326-3142697795-57268-13918-S0 I0218 01:53:28.606003 13933 replica.cpp:657] Replica received learned notice for position 4 I0218 01:53:28.606037 13947 status_update_manager.cpp:177] Resuming sending status updates I0218 01:53:28.606075 13937 master.cpp:3753] Sending 1 offers to framework 20150218-015326-3142697795-57268-13918-0000 (default) at scheduler-17aa8fa2-195f-43d6-85d7-87b949d4419b@67.195.81.187:57268 I0218 01:53:28.606547 13933 leveldb.cpp:342] Persisting action (18 bytes) to leveldb took 517378ns I0218 01:53:29.008322 13933 leveldb.cpp:400] Deleting ~2 keys from leveldb took 86406ns I0218 01:53:29.008350 13933 replica.cpp:678] Persisted action at 4 I0218 01:53:29.008380 13933 replica.cpp:663] Replica learned 3 action at position 4 I0218 01:53:28.912961 13946 hierarchical.hpp:831] No resources available to allocate! I0218 01:53:29.008543 13946 hierarchical.hpp:738] Performed allocation for 1 slaves in 95.683965ms I0218 01:53:29.008621 13944 sched.cpp:605] Scheduler::resourceOffers took 74896ns I0218 01:53:29.009996 13932 master.cpp:2266] Processing ACCEPT call for offers: [ 20150218-015326-3142697795-57268-13918-O0 ] on slave 20150218-015326-3142697795-57268-13918-S0 at slave(150)@67.195.81.187:57268 (pomona.apache.org) for framework 20150218-015326-3142697795-57268-13918-0000 (default) at scheduler-17aa8fa2-195f-43d6-85d7-87b949d4419b@67.195.81.187:57268 I0218 01:53:29.010035 13932 master.cpp:2110] Authorizing framework principal 'test-principal' to launch task 1 as user 'jenkins' W0218 01:53:29.011081 13932 validation.cpp:326] Executor default for task 1 uses less CPUs (None) than the minimum required (0.01). Please update your executor, as this will be...",1 MESOS-2367,"Improve slave resiliency in the face of orphan containers ","Right now there's a case where a misbehaving executor can cause a slave process to flap: {panel:title=Quote From [~jieyu]} {quote} 1) User tries to kill an instance 2) Slave sends {{KillTaskMessage}} to executor 3) Executor sends kill signals to task processes 4) Executor sends {{TASK_KILLED}} to slave 5) Slave updates container cpu limit to be 0.01 cpus 6) A user-process is still processing the kill signal 7) the task process cannot exit since it has too little cpu share and is throttled 8) Executor itself terminates 9) Slave tries to destroy the container, but cannot because the user-process is stuck in the exit path. 10) Slave restarts, and is constantly flapping because it cannot kill orphan containers {quote} {panel} The slave's orphan container handling should be improved to deal with this case despite ill-behaved users (framework writers).",5 MESOS-2372,"Test script for verifying compatibility between Mesos components","While our current unit/integration test suite catches functional bugs, it doesn't catch compatibility bugs (e.g, MESOS-2371). This is really crucial to provide operators the ability to do seamless upgrades on live clusters. We should have a test suite / framework (ideally running on CI vetting each review on RB) that tests upgrade paths between master, slave, scheduler and executor.",2 MESOS-2373,"DRFSorter needs to distinguish resources from different slaves.","Currently the {{DRFSorter}} aggregates total and allocated resources across multiple slaves, which only works for scalar resources. We need to distinguish resources from different slaves. Suppose we have 2 slaves and 1 framework. The framework is allocated all resources from both slaves. {code} Resources slaveResources = Resources::parse(""cpus:2;mem:512;ports:[31000-32000]"").get(); DRFSorter sorter; sorter.add(slaveResources); // Add slave1 resources sorter.add(slaveResources); // Add slave2 resources // Total resources in sorter at this point is // cpus(*):4; mem(*):1024; ports(*):[31000-32000]. // The scalar resources get aggregated correctly but ports do not. sorter.add(""F""); // The 2 calls to allocated only works because we simply do: // allocation[name] += resources; // without checking that the 'resources' is available in the total. sorter.allocated(""F"", slaveResources); sorter.allocated(""F"", slaveResources); // At this point, sorter.allocation(""F"") is: // cpus(*):4; mem(*):1024; ports(*):[31000-32000]. {code} To provide some context, this issue came up while trying to reserve all unreserved resources from every offer. {code} for (const Offer& offer : offers) { Resources unreserved = offer.resources().unreserved(); Resources reserved = unreserved.flatten(role, Resource::FRAMEWORK); Offer::Operation reserve; reserve.set_type(Offer::Operation::RESERVE); reserve.mutable_reserve()->mutable_resources()->CopyFrom(reserved); driver->acceptOffers({offer.id()}, {reserve}); } {code} Suppose the slave resources are the same as above: {quote} Slave1: {{cpus(\*):2; mem(\*):512; ports(\*):\[31000-32000\]}} Slave2: {{cpus(\*):2; mem(\*):512; ports(\*):\[31000-32000\]}} {quote} Initial (incorrect) total resources in the DRFSorter is: {quote} {{cpus(\*):4; mem(\*):1024; ports(\*):\[31000-32000\]}} {quote} We receive 2 offers, 1 from each slave: {quote} Offer1: {{cpus(\*):2; mem(\*):512; ports(\*):\[31000-32000\]}} Offer2: {{cpus(\*):2; mem(\*):512; ports(\*):\[31000-32000\]}} {quote} At this point, the resources allocated for the framework is: {quote} {{cpus(\*):4; mem(\*):1024; ports(\*):\[31000-32000\]}} {quote} After first {{RESERVE}} operation with Offer1: The allocated resources for the framework becomes: {quote} {{cpus(\*):2; mem(\*):512; cpus(role):2; mem(role):512; ports(role):\[31000-32000\]}} {quote} During second {{RESERVE}} operation with Offer2: {code:title=HierarchicalAllocatorProcess::updateAllocation} // ... FrameworkSorter* frameworkSorter = frameworkSorters[frameworks\[frameworkId\].role]; Resources allocation = frameworkSorter->allocation(frameworkId.value()); // Update the allocated resources. Try updatedAllocation = allocation.apply(operations); CHECK_SOME(updatedAllocation); // ... {code} {{allocation}} in the above code is: {quote} {{cpus(\*):2; mem(\*):512; cpus(role):2; mem(role):512; ports(role):\[31000-32000\]}} {quote} We try to {{apply}} a {{RESERVE}} operation and we fail to find {{ports(\*):\[31000-32000\]}} which leads to the {{CHECK}} fail at {{CHECK_SOME(updatedAllocation);}}",2 MESOS-2382,"replace unsafe ""find | xargs"" with ""find -exec""","The problem exists in 1194:src/Makefile.am 47:src/tests/balloon_framework_test.sh The current ""find | xargs rm -rf"" in the Makefile could potentially destroy data if mesos source was in a folder with a space in the name. E.g. if you for some reason checkout mesos to ""/ mesos"" the command in src/Makefile.am would turn into a rm -rf / ""find | xargs"" should be NUL delimited with ""find -print0 | xargs -0"" for safer execution or can just be replaced with the find build-in option ""find -exec '{}' \+"" which behaves similar to xargs. There was a second occurrence of this in a test script, though in that case it would only rmdir empty folders, so is less critical. I submitted a PR here: https://github.com/apache/mesos/pull/36 ",1 MESOS-2387,"SlaveTest.TaskLaunchContainerizerUpdateFails is flaky","Observed on internal CI {code} [ RUN ] SlaveTest.TaskLaunchContainerizerUpdateFails Using temporary directory '/tmp/SlaveTest_TaskLaunchContainerizerUpdateFails_tUjtcI' I0222 04:59:56.568491 21813 process.cpp:2117] Dropped / Lost event for PID: slave(52)@192.168.122.68:39461 I0222 04:59:56.595433 21791 leveldb.cpp:175] Opened db in 27.59732ms I0222 04:59:56.603965 21791 leveldb.cpp:182] Compacted db in 8.49192ms I0222 04:59:56.604019 21791 leveldb.cpp:197] Created db iterator in 19206ns I0222 04:59:56.604037 21791 leveldb.cpp:203] Seeked to beginning of db in 1802ns I0222 04:59:56.604046 21791 leveldb.cpp:272] Iterated through 0 keys in the db in 467ns I0222 04:59:56.604081 21791 replica.cpp:743] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0222 04:59:56.607413 21809 recover.cpp:448] Starting replica recovery I0222 04:59:56.607687 21809 recover.cpp:474] Replica is in 4 status I0222 04:59:56.609011 21809 replica.cpp:640] Replica in 4 status received a broadcasted recover request I0222 04:59:56.609262 21809 recover.cpp:194] Received a recover response from a replica in 4 status I0222 04:59:56.609709 21809 recover.cpp:565] Updating replica status to 3 I0222 04:59:56.610749 21811 master.cpp:347] Master 20150222-045956-1148889280-39461-21791 (centos-7) started on 192.168.122.68:39461 I0222 04:59:56.610791 21811 master.cpp:393] Master only allowing authenticated frameworks to register I0222 04:59:56.610802 21811 master.cpp:398] Master only allowing authenticated slaves to register I0222 04:59:56.610821 21811 credentials.hpp:36] Loading credentials for authentication from '/tmp/SlaveTest_TaskLaunchContainerizerUpdateFails_tUjtcI/credentials' I0222 04:59:56.611042 21811 master.cpp:440] Authorization enabled I0222 04:59:56.612329 21811 hierarchical.hpp:286] Initialized hierarchical allocator process I0222 04:59:56.612416 21811 whitelist_watcher.cpp:78] No whitelist given I0222 04:59:56.613005 21811 master.cpp:1354] The newly elected leader is master@192.168.122.68:39461 with id 20150222-045956-1148889280-39461-21791 I0222 04:59:56.613034 21811 master.cpp:1367] Elected as the leading master! I0222 04:59:56.613050 21811 master.cpp:1185] Recovering from registrar I0222 04:59:56.613229 21811 registrar.cpp:312] Recovering registrar I0222 04:59:56.622866 21809 leveldb.cpp:305] Persisting metadata (8 bytes) to leveldb took 12.988429ms I0222 04:59:56.622913 21809 replica.cpp:322] Persisted replica status to 3 I0222 04:59:56.623118 21809 recover.cpp:474] Replica is in 3 status I0222 04:59:56.624419 21809 replica.cpp:640] Replica in 3 status received a broadcasted recover request I0222 04:59:56.624685 21809 recover.cpp:194] Received a recover response from a replica in 3 status I0222 04:59:56.625200 21809 recover.cpp:565] Updating replica status to 1 I0222 04:59:56.635154 21809 leveldb.cpp:305] Persisting metadata (8 bytes) to leveldb took 9.799671ms I0222 04:59:56.635197 21809 replica.cpp:322] Persisted replica status to 1 I0222 04:59:56.635296 21809 recover.cpp:579] Successfully joined the Paxos group I0222 04:59:56.635426 21809 recover.cpp:463] Recover process terminated I0222 04:59:56.635812 21809 log.cpp:659] Attempting to start the writer I0222 04:59:56.637075 21809 replica.cpp:476] Replica received implicit promise request with proposal 1 I0222 04:59:56.648674 21809 leveldb.cpp:305] Persisting metadata (8 bytes) to leveldb took 11.566146ms I0222 04:59:56.648717 21809 replica.cpp:344] Persisted promised to 1 I0222 04:59:56.649456 21809 coordinator.cpp:229] Coordinator attemping to fill missing position I0222 04:59:56.650800 21809 replica.cpp:377] Replica received explicit promise request for position 0 with proposal 2 I0222 04:59:56.659916 21809 leveldb.cpp:342] Persisting action (8 bytes) to leveldb took 9.078258ms I0222 04:59:56.659981 21809 replica.cpp:678] Persisted action at 0 I0222 04:59:56.661075 21809 replica.cpp:510] Replica received write request for position 0 I0222 04:59:56.661129 21809 leveldb.cpp:437] Reading position from leveldb took 26387ns I0222 04:59:56.671227 21809 leveldb.cpp:342] Persisting action (14 bytes) to leveldb took 10.064302ms I0222 04:59:56.671262 21809 replica.cpp:678] Persisted action at 0 I0222 04:59:56.671821 21809 replica.cpp:657] Replica received learned notice for position 0 I0222 04:59:56.684200 21809 leveldb.cpp:342] Persisting action (16 bytes) to leveldb took 12.346897ms I0222 04:59:56.684242 21809 replica.cpp:678] Persisted action at 0 I0222 04:59:56.684262 21809 replica.cpp:663] Replica learned 1 action at position 0 I0222 04:59:56.684875 21809 log.cpp:675] Writer started with ending position 0 I0222 04:59:56.685932 21809 leveldb.cpp:437] Reading position from leveldb took 27308ns I0222 04:59:56.688256 21809 registrar.cpp:345] Successfully fetched the registry (0B) in 74.992128ms I0222 04:59:56.688344 21809 registrar.cpp:444] Applied 1 operations in 19566ns; attempting to update the 'registry' I0222 04:59:56.690690 21809 log.cpp:683] Attempting to append 129 bytes to the log I0222 04:59:56.690848 21809 coordinator.cpp:339] Coordinator attempting to write 2 action at position 1 I0222 04:59:56.691661 21809 replica.cpp:510] Replica received write request for position 1 I0222 04:59:56.701247 21809 leveldb.cpp:342] Persisting action (148 bytes) to leveldb took 9.550768ms I0222 04:59:56.701292 21809 replica.cpp:678] Persisted action at 1 I0222 04:59:56.702066 21809 replica.cpp:657] Replica received learned notice for position 1 I0222 04:59:56.712136 21809 leveldb.cpp:342] Persisting action (150 bytes) to leveldb took 10.041696ms I0222 04:59:56.712175 21809 replica.cpp:678] Persisted action at 1 I0222 04:59:56.712198 21809 replica.cpp:663] Replica learned 2 action at position 1 I0222 04:59:56.713289 21809 registrar.cpp:489] Successfully updated the 'registry' in 24.890112ms I0222 04:59:56.713397 21809 registrar.cpp:375] Successfully recovered registrar I0222 04:59:56.713537 21809 log.cpp:702] Attempting to truncate the log to 1 I0222 04:59:56.713795 21809 master.cpp:1212] Recovered 0 slaves from the Registry (93B) ; allowing 10mins for slaves to re-register I0222 04:59:56.713871 21809 coordinator.cpp:339] Coordinator attempting to write 3 action at position 2 I0222 04:59:56.714879 21809 replica.cpp:510] Replica received write request for position 2 I0222 04:59:56.725225 21809 leveldb.cpp:342] Persisting action (16 bytes) to leveldb took 10.311704ms I0222 04:59:56.725270 21809 replica.cpp:678] Persisted action at 2 I0222 04:59:56.726066 21809 replica.cpp:657] Replica received learned notice for position 2 I0222 04:59:56.734110 21809 leveldb.cpp:342] Persisting action (18 bytes) to leveldb took 8.012327ms I0222 04:59:56.734180 21809 leveldb.cpp:400] Deleting ~1 keys from leveldb took 36578ns I0222 04:59:56.734201 21809 replica.cpp:678] Persisted action at 2 I0222 04:59:56.734221 21809 replica.cpp:663] Replica learned 3 action at position 2 I0222 04:59:56.747556 21809 slave.cpp:173] Slave started on 53)@192.168.122.68:39461 I0222 04:59:56.747601 21809 credentials.hpp:84] Loading credential for authentication from '/tmp/SlaveTest_TaskLaunchContainerizerUpdateFails_qkhaJP/credential' I0222 04:59:56.747774 21809 slave.cpp:280] Slave using credential for: test-principal I0222 04:59:56.748021 21809 slave.cpp:298] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0222 04:59:56.748682 21809 slave.cpp:327] Slave hostname: centos-7 I0222 04:59:56.748705 21809 slave.cpp:328] Slave checkpoint: false W0222 04:59:56.748714 21809 slave.cpp:330] Disabling checkpointing is deprecated and the --checkpoint flag will be removed in a future release. Please avoid using this flag I0222 04:59:56.749826 21809 state.cpp:34] Recovering state from '/tmp/SlaveTest_TaskLaunchContainerizerUpdateFails_qkhaJP/meta' I0222 04:59:56.750191 21809 status_update_manager.cpp:196] Recovering status update manager I0222 04:59:56.750465 21809 slave.cpp:3775] Finished recovery I0222 04:59:56.751260 21809 slave.cpp:623] New master detected at master@192.168.122.68:39461 I0222 04:59:56.751349 21809 slave.cpp:686] Authenticating with master master@192.168.122.68:39461 I0222 04:59:56.751369 21809 slave.cpp:691] Using default CRAM-MD5 authenticatee I0222 04:59:56.751502 21809 slave.cpp:659] Detecting new master I0222 04:59:56.751596 21809 status_update_manager.cpp:170] Pausing sending status updates I0222 04:59:56.751668 21809 authenticatee.hpp:138] Creating new client SASL connection I0222 04:59:56.752781 21809 master.cpp:3811] Authenticating slave(53)@192.168.122.68:39461 I0222 04:59:56.752820 21809 master.cpp:3822] Using default CRAM-MD5 authenticator I0222 04:59:56.753124 21809 authenticator.hpp:169] Creating new server SASL connection I0222 04:59:56.755609 21809 authenticatee.hpp:229] Received SASL authentication mechanisms: CRAM-MD5 I0222 04:59:56.755641 21809 authenticatee.hpp:255] Attempting to authenticate with mechanism 'CRAM-MD5' I0222 04:59:56.755708 21809 authenticator.hpp:275] Received SASL authentication start I0222 04:59:56.755751 21809 authenticator.hpp:397] Authentication requires more steps I0222 04:59:56.755813 21809 authenticatee.hpp:275] Received SASL authentication step I0222 04:59:56.755887 21809 authenticator.hpp:303] Received SASL authentication step I0222 04:59:56.755920 21809 auxprop.cpp:98] Request to lookup properties for user: 'test-principal' realm: 'centos-7' server FQDN: 'centos-7' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0222 04:59:56.755934 21809 auxprop.cpp:170] Looking up auxiliary property '*userPassword' I0222 04:59:56.756005 21809 auxprop.cpp:170] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0222 04:59:56.756036 21809 auxprop.cpp:98] Request to lookup properties for user: 'test-principal' realm: 'centos-7' server FQDN: 'centos-7' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0222 04:59:56.756047 21809 auxprop.cpp:120] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0222 04:59:56.756054 21809 auxprop.cpp:120] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0222 04:59:56.756068 21809 authenticator.hpp:389] Authentication success I0222 04:59:56.756155 21809 authenticatee.hpp:315] Authentication success I0222 04:59:56.756219 21809 master.cpp:3869] Successfully authenticated principal 'test-principal' at slave(53)@192.168.122.68:39461 I0222 04:59:56.756503 21809 slave.cpp:757] Successfully authenticated with master master@192.168.122.68:39461 I0222 04:59:56.756611 21809 slave.cpp:1089] Will retry registration in 11.221976ms if necessary I0222 04:59:56.756876 21809 master.cpp:2936] Registering slave at slave(53)@192.168.122.68:39461 (centos-7) with id 20150222-045956-1148889280-39461-21791-S0 I0222 04:59:56.757323 21809 registrar.cpp:444] Applied 1 operations in 70787ns; attempting to update the 'registry' I0222 04:59:56.759790 21809 log.cpp:683] Attempting to append 299 bytes to the log I0222 04:59:56.760000 21809 coordinator.cpp:339] Coordinator attempting to write 2 action at position 3 I0222 04:59:56.760920 21809 replica.cpp:510] Replica received write request for position 3 I0222 04:59:56.762037 21791 sched.cpp:154] Version: 0.22.0 I0222 04:59:56.762763 21806 sched.cpp:251] New master detected at master@192.168.122.68:39461 I0222 04:59:56.762835 21806 sched.cpp:307] Authenticating with master master@192.168.122.68:39461 I0222 04:59:56.762856 21806 sched.cpp:314] Using default CRAM-MD5 authenticatee I0222 04:59:56.763082 21806 authenticatee.hpp:138] Creating new client SASL connection I0222 04:59:56.763753 21806 master.cpp:3811] Authenticating scheduler-d9c22c4e-8dec-42a6-a350-a98472642891@192.168.122.68:39461 I0222 04:59:56.763784 21806 master.cpp:3822] Using default CRAM-MD5 authenticator I0222 04:59:56.764040 21806 authenticator.hpp:169] Creating new server SASL connection I0222 04:59:56.764624 21806 authenticatee.hpp:229] Received SASL authentication mechanisms: CRAM-MD5 I0222 04:59:56.764653 21806 authenticatee.hpp:255] Attempting to authenticate with mechanism 'CRAM-MD5' I0222 04:59:56.764719 21806 authenticator.hpp:275] Received SASL authentication start I0222 04:59:56.764758 21806 authenticator.hpp:397] Authentication requires more steps I0222 04:59:56.764819 21806 authenticatee.hpp:275] Received SASL authentication step I0222 04:59:56.764889 21806 authenticator.hpp:303] Received SASL authentication step I0222 04:59:56.764911 21806 auxprop.cpp:98] Request to lookup properties for user: 'test-principal' realm: 'centos-7' server FQDN: 'centos-7' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0222 04:59:56.764922 21806 auxprop.cpp:170] Looking up auxiliary property '*userPassword' I0222 04:59:56.764974 21806 auxprop.cpp:170] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0222 04:59:56.765005 21806 auxprop.cpp:98] Request to lookup properties for user: 'test-principal' realm: 'centos-7' server FQDN: 'centos-7' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0222 04:59:56.765017 21806 auxprop.cpp:120] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0222 04:59:56.765023 21806 auxprop.cpp:120] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0222 04:59:56.765036 21806 authenticator.hpp:389] Authentication success I0222 04:59:56.765120 21806 authenticatee.hpp:315] Authentication success I0222 04:59:56.765182 21806 master.cpp:3869] Successfully authenticated principal 'test-principal' at scheduler-d9c22c4e-8dec-42a6-a350-a98472642891@192.168.122.68:39461 I0222 04:59:56.765442 21806 sched.cpp:395] Successfully authenticated with master master@192.168.122.68:39461 I0222 04:59:56.765465 21806 sched.cpp:518] Sending registration request to master@192.168.122.68:39461 I0222 04:59:56.765522 21806 sched.cpp:551] Will retry registration in 1.283564292secs if necessary I0222 04:59:56.765637 21806 master.cpp:1572] Received registration request for framework 'default' at scheduler-d9c22c4e-8dec-42a6-a350-a98472642891@192.168.122.68:39461 I0222 04:59:56.765699 21806 master.cpp:1433] Authorizing framework principal 'test-principal' to receive offers for role '*' I0222 04:59:56.766120 21806 master.cpp:1636] Registering framework 20150222-045956-1148889280-39461-21791-0000 (default) at scheduler-d9c22c4e-8dec-42a6-a350-a98472642891@192.168.122.68:39461 I0222 04:59:56.766572 21806 hierarchical.hpp:320] Added framework 20150222-045956-1148889280-39461-21791-0000 I0222 04:59:56.766598 21806 hierarchical.hpp:831] No resources available to allocate! I0222 04:59:56.766609 21806 hierarchical.hpp:738] Performed allocation for 0 slaves in 15902ns I0222 04:59:56.766753 21806 sched.cpp:445] Framework registered with 20150222-045956-1148889280-39461-21791-0000 I0222 04:59:56.766790 21806 sched.cpp:459] Scheduler::registered took 15076ns I0222 04:59:56.773710 21806 slave.cpp:1089] Will retry registration in 3.454005ms if necessary I0222 04:59:56.773900 21806 master.cpp:2924] Ignoring register slave message from slave(53)@192.168.122.68:39461 (centos-7) as admission is already in progress I0222 04:59:56.775297 21809 leveldb.cpp:342] Persisting action (318 bytes) to leveldb took 14.319807ms I0222 04:59:56.775344 21809 replica.cpp:678] Persisted action at 3 I0222 04:59:56.776139 21809 replica.cpp:657] Replica received learned notice for position 3 I0222 04:59:56.778630 21806 slave.cpp:1089] Will retry registration in 32.764468ms if necessary I0222 04:59:56.778779 21806 master.cpp:2924] Ignoring register slave message from slave(53)@192.168.122.68:39461 (centos-7) as admission is already in progress I0222 04:59:56.783778 21809 leveldb.cpp:342] Persisting action (320 bytes) to leveldb took 7.609533ms I0222 04:59:56.783828 21809 replica.cpp:678] Persisted action at 3 I0222 04:59:56.783849 21809 replica.cpp:663] Replica learned 2 action at position 3 I0222 04:59:56.785058 21809 registrar.cpp:489] Successfully updated the 'registry' in 27.669248ms I0222 04:59:56.785274 21809 log.cpp:702] Attempting to truncate the log to 3 I0222 04:59:56.785815 21809 master.cpp:2993] Registered slave 20150222-045956-1148889280-39461-21791-S0 at slave(53)@192.168.122.68:39461 (centos-7) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0222 04:59:56.785913 21809 coordinator.cpp:339] Coordinator attempting to write 3 action at position 4 I0222 04:59:56.786267 21809 hierarchical.hpp:452] Added slave 20150222-045956-1148889280-39461-21791-S0 (centos-7) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (and cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] available) I0222 04:59:56.786600 21809 hierarchical.hpp:756] Performed allocation for slave 20150222-045956-1148889280-39461-21791-S0 in 292298ns I0222 04:59:56.786684 21809 slave.cpp:791] Registered with master master@192.168.122.68:39461; given slave ID 20150222-045956-1148889280-39461-21791-S0 I0222 04:59:56.786792 21809 slave.cpp:2830] Received ping from slave-observer(52)@192.168.122.68:39461 I0222 04:59:56.787230 21809 master.cpp:3753] Sending 1 offers to framework 20150222-045956-1148889280-39461-21791-0000 (default) at scheduler-d9c22c4e-8dec-42a6-a350-a98472642891@192.168.122.68:39461 I0222 04:59:56.787334 21809 status_update_manager.cpp:177] Resuming sending status updates I0222 04:59:56.788156 21809 sched.cpp:608] Scheduler::resourceOffers took 557128ns I0222 04:59:56.788936 21809 master.cpp:2266] Processing ACCEPT call for offers: [ 20150222-045956-1148889280-39461-21791-O0 ] on slave 20150222-045956-1148889280-39461-21791-S0 at slave(53)@192.168.122.68:39461 (centos-7) for framework 20150222-045956-1148889280-39461-21791-0000 (default) at scheduler-d9c22c4e-8dec-42a6-a350-a98472642891@192.168.122.68:39461 I0222 04:59:56.789000 21809 master.cpp:2110] Authorizing framework principal 'test-principal' to launch task 0 as user 'jenkins' W0222 04:59:56.790506 21809 validation.cpp:327] Executor default for task 0 uses less CPUs (None) than the minimum required (0.01). Please update your executor, as this will be mandatory in future releases. W0222 04:59:56.790546 21809 validation.cpp:339] Executor default for task 0 uses less memory (None) than the minimum required (32MB). Please update your executor, as this will be mandatory in future releases. I0222 04:59:56.790808 21809 master.hpp:821] Adding task 0 with resources cpus(*):1; mem(*):128 on slave 20150222-045956-1148889280-39461-21791-S0 (centos-7) I0222 04:59:56.790885 21809 master.cpp:2543] Launching task 0 of framework 20150222-045956-1148889280-39461-21791-0000 (default) at scheduler-d9c22c4e-8dec-42a6-a350-a98472642891@192.168.122.68:39461 with resources cpus(*):1; mem(*):128 on slave 20150222-045956-1148889280-39461-21791-S0 at slave(53)@192.168.122.68:39461 (centos-7) I0222 04:59:56.791201 21809 replica.cpp:510] Replica received write request for position 4 I0222 04:59:56.791610 21806 slave.cpp:1120] Got assigned task 0 for framework 20150222-045956-1148889280-39461-21791-0000 I0222 04:59:56.792140 21806 slave.cpp:1230] Launching task 0 for framework 20150222-045956-1148889280-39461-21791-0000 I0222 04:59:56.794872 21806 slave.cpp:4177] Launching executor default of framework 20150222-045956-1148889280-39461-21791-0000 in work directory '/tmp/SlaveTest_TaskLaunchContainerizerUpdateFails_qkhaJP/slaves/20150222-045956-1148889280-39461-21791-S0/frameworks/20150222-045956-1148889280-39461-21791-0000/executors/default/runs/753232b5-43ff-4fbf-b29a-0f76161132ab' I0222 04:59:56.796846 21806 exec.cpp:130] Version: 0.22.0 I0222 04:59:56.797173 21806 slave.cpp:1377] Queuing task '0' for execut...",1 MESOS-2388,"GroupTest.LabelledGroup segfaults","Observed this on internal CI. Not sure if it is due to ""GroupTest.LabelledGroup"" or an earlier test. {code} I0219 01:04:17.980598 27766 zookeeper_test_server.cpp:117] Shutting down ZooKeeperTestServer on port 39597 [ OK ] GroupTest.RetryableErrors (30150 ms) [ RUN ] GroupTest.LabelledGroup Makefile:6656: recipe for target 'check-local' failed make[3]: *** [check-local] Segmentation fault (core dumped) {code}",2 MESOS-2391,"Provide user doc for the new posix disk isolator in Mesos containerizer","We introduced a posix disk isolator for Mesos containerizer in 0.22.0. This isolator allows us to get container disk usage as well as enforcing container disk quota. It's based on 'du'. We need to document this feature.",2 MESOS-2392,"Rate limit slaves removals during master recovery.","Much like we rate limit slave removals in the common path (MESOS-1148), we need to rate limit slave removals that occur during master recovery. When a master recovers and is using a strict registry, slaves that do not re-register within a timeout will be removed. Currently there is a safeguard in place to abort when too many slaves have not re-registered. However, in the case of a transient partition, we don't want to remove large sections of slaves without rate limiting.",3 MESOS-2394,"Create styleguide for documentation","As of right now different pages in our documentation use quite different styles. Consider for example the different emphasis for NOTE: * {noformat}> NOTE: http://mesos.apache.org/documentation/latest/slave-recovery/{noformat} * {noformat}*NOTE*: http://mesos.apache.org/documentation/latest/upgrades/ {noformat} Would be great to establish a common style for the documentation!",2 MESOS-2400,"Improve NsTest.ROOT_setns","- Use symbol NAME directly to launch the subprocess instead of the hard-coded string. - Replaced the static string with char[]. ",1 MESOS-2401,"MasterTest.ShutdownFrameworkWhileTaskRunning is flaky","Looks like the executorShutdownTimeout() was called immediately after executorShutdown() was called! {code} [ RUN ] MasterTest.ShutdownFrameworkWhileTaskRunning Using temporary directory '/tmp/MasterTest_ShutdownFrameworkWhileTaskRunning_sBd6vK' I0224 18:51:17.385068 30213 leveldb.cpp:176] Opened db in 1.262442ms I0224 18:51:17.386360 30213 leveldb.cpp:183] Compacted db in 985102ns I0224 18:51:17.387025 30213 leveldb.cpp:198] Created db iterator in 78043ns I0224 18:51:17.387420 30213 leveldb.cpp:204] Seeked to beginning of db in 25814ns I0224 18:51:17.387804 30213 leveldb.cpp:273] Iterated through 0 keys in the db in 25025ns I0224 18:51:17.388270 30213 replica.cpp:744] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0224 18:51:17.389760 30227 recover.cpp:449] Starting replica recovery I0224 18:51:17.395699 30227 recover.cpp:475] Replica is in 4 status I0224 18:51:17.398294 30227 replica.cpp:641] Replica in 4 status received a broadcasted recover request I0224 18:51:17.398816 30227 recover.cpp:195] Received a recover response from a replica in 4 status I0224 18:51:17.402415 30230 recover.cpp:566] Updating replica status to 3 I0224 18:51:17.403473 30229 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 273857ns I0224 18:51:17.404093 30229 replica.cpp:323] Persisted replica status to 3 I0224 18:51:17.404930 30229 recover.cpp:475] Replica is in 3 status I0224 18:51:17.407995 30233 replica.cpp:641] Replica in 3 status received a broadcasted recover request I0224 18:51:17.410697 30231 recover.cpp:195] Received a recover response from a replica in 3 status I0224 18:51:17.415710 30230 recover.cpp:566] Updating replica status to 1 I0224 18:51:17.416987 30227 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 221966ns I0224 18:51:17.417579 30227 replica.cpp:323] Persisted replica status to 1 I0224 18:51:17.418803 30234 recover.cpp:580] Successfully joined the Paxos group I0224 18:51:17.419699 30227 recover.cpp:464] Recover process terminated I0224 18:51:17.430594 30234 master.cpp:349] Master 20150224-185117-2272962752-44950-30213 (fedora-19) started on 192.168.122.135:44950 I0224 18:51:17.431082 30234 master.cpp:395] Master only allowing authenticated frameworks to register I0224 18:51:17.431453 30234 master.cpp:400] Master only allowing authenticated slaves to register I0224 18:51:17.431828 30234 credentials.hpp:37] Loading credentials for authentication from '/tmp/MasterTest_ShutdownFrameworkWhileTaskRunning_sBd6vK/credentials' I0224 18:51:17.432740 30234 master.cpp:442] Authorization enabled I0224 18:51:17.434224 30229 hierarchical.hpp:287] Initialized hierarchical allocator process I0224 18:51:17.434994 30233 whitelist_watcher.cpp:79] No whitelist given I0224 18:51:17.440687 30234 master.cpp:1356] The newly elected leader is master@192.168.122.135:44950 with id 20150224-185117-2272962752-44950-30213 I0224 18:51:17.441764 30234 master.cpp:1369] Elected as the leading master! I0224 18:51:17.442430 30234 master.cpp:1187] Recovering from registrar I0224 18:51:17.443053 30229 registrar.cpp:313] Recovering registrar I0224 18:51:17.445468 30228 log.cpp:660] Attempting to start the writer I0224 18:51:17.449970 30233 replica.cpp:477] Replica received implicit promise request with proposal 1 I0224 18:51:17.451359 30233 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 339488ns I0224 18:51:17.451949 30233 replica.cpp:345] Persisted promised to 1 I0224 18:51:17.456845 30235 process.cpp:2117] Dropped / Lost event for PID: hierarchical-allocator(154)@192.168.122.135:44950 I0224 18:51:17.461741 30231 coordinator.cpp:230] Coordinator attemping to fill missing position I0224 18:51:17.464686 30228 replica.cpp:378] Replica received explicit promise request for position 0 with proposal 2 I0224 18:51:17.465515 30228 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 170261ns I0224 18:51:17.465991 30228 replica.cpp:679] Persisted action at 0 I0224 18:51:17.470512 30229 replica.cpp:511] Replica received write request for position 0 I0224 18:51:17.471437 30229 leveldb.cpp:438] Reading position from leveldb took 139178ns I0224 18:51:17.472129 30229 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 141560ns I0224 18:51:17.472705 30229 replica.cpp:679] Persisted action at 0 I0224 18:51:17.476305 30228 replica.cpp:658] Replica received learned notice for position 0 I0224 18:51:17.477991 30228 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 208112ns I0224 18:51:17.478574 30228 replica.cpp:679] Persisted action at 0 I0224 18:51:17.479044 30228 replica.cpp:664] Replica learned 1 action at position 0 I0224 18:51:17.484371 30233 log.cpp:676] Writer started with ending position 0 I0224 18:51:17.487396 30233 leveldb.cpp:438] Reading position from leveldb took 96498ns I0224 18:51:17.498906 30233 registrar.cpp:346] Successfully fetched the registry (0B) in 55.234048ms I0224 18:51:17.499781 30233 registrar.cpp:445] Applied 1 operations in 97308ns; attempting to update the 'registry' I0224 18:51:17.503955 30231 log.cpp:684] Attempting to append 131 bytes to the log I0224 18:51:17.505009 30231 coordinator.cpp:340] Coordinator attempting to write 2 action at position 1 I0224 18:51:17.507428 30228 replica.cpp:511] Replica received write request for position 1 I0224 18:51:17.508517 30228 leveldb.cpp:343] Persisting action (150 bytes) to leveldb took 316570ns I0224 18:51:17.508985 30228 replica.cpp:679] Persisted action at 1 I0224 18:51:17.512902 30229 replica.cpp:658] Replica received learned notice for position 1 I0224 18:51:17.517261 30229 leveldb.cpp:343] Persisting action (152 bytes) to leveldb took 427860ns I0224 18:51:17.517470 30229 replica.cpp:679] Persisted action at 1 I0224 18:51:17.517796 30229 replica.cpp:664] Replica learned 2 action at position 1 I0224 18:51:17.532624 30232 registrar.cpp:490] Successfully updated the 'registry' in 32.31104ms I0224 18:51:17.533957 30228 log.cpp:703] Attempting to truncate the log to 1 I0224 18:51:17.534366 30228 coordinator.cpp:340] Coordinator attempting to write 3 action at position 2 I0224 18:51:17.536684 30227 replica.cpp:511] Replica received write request for position 2 I0224 18:51:17.537406 30227 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 196455ns I0224 18:51:17.537946 30227 replica.cpp:679] Persisted action at 2 I0224 18:51:17.537695 30232 registrar.cpp:376] Successfully recovered registrar I0224 18:51:17.544136 30231 master.cpp:1214] Recovered 0 slaves from the Registry (95B) ; allowing 10mins for slaves to re-register I0224 18:51:17.546041 30227 replica.cpp:658] Replica received learned notice for position 2 I0224 18:51:17.546728 30227 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 192442ns I0224 18:51:17.547058 30227 leveldb.cpp:401] Deleting ~1 keys from leveldb took 61064ns I0224 18:51:17.547363 30227 replica.cpp:679] Persisted action at 2 I0224 18:51:17.547669 30227 replica.cpp:664] Replica learned 3 action at position 2 I0224 18:51:17.565460 30234 slave.cpp:174] Slave started on 138)@192.168.122.135:44950 I0224 18:51:17.566038 30234 credentials.hpp:85] Loading credential for authentication from '/tmp/MasterTest_ShutdownFrameworkWhileTaskRunning_lRugms/credential' I0224 18:51:17.566584 30234 slave.cpp:281] Slave using credential for: test-principal I0224 18:51:17.567198 30234 slave.cpp:299] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0224 18:51:17.567930 30234 slave.cpp:328] Slave hostname: fedora-19 I0224 18:51:17.568172 30234 slave.cpp:329] Slave checkpoint: false W0224 18:51:17.568435 30234 slave.cpp:331] Disabling checkpointing is deprecated and the --checkpoint flag will be removed in a future release. Please avoid using this flag I0224 18:51:17.570539 30227 state.cpp:35] Recovering state from '/tmp/MasterTest_ShutdownFrameworkWhileTaskRunning_lRugms/meta' I0224 18:51:17.573499 30232 status_update_manager.cpp:197] Recovering status update manager I0224 18:51:17.574209 30234 slave.cpp:3775] Finished recovery I0224 18:51:17.576277 30229 status_update_manager.cpp:171] Pausing sending status updates I0224 18:51:17.576680 30234 slave.cpp:624] New master detected at master@192.168.122.135:44950 I0224 18:51:17.577131 30234 slave.cpp:687] Authenticating with master master@192.168.122.135:44950 I0224 18:51:17.577385 30234 slave.cpp:692] Using default CRAM-MD5 authenticatee I0224 18:51:17.577945 30228 authenticatee.hpp:139] Creating new client SASL connection I0224 18:51:17.578837 30234 slave.cpp:660] Detecting new master I0224 18:51:17.579270 30228 master.cpp:3813] Authenticating slave(138)@192.168.122.135:44950 I0224 18:51:17.579900 30228 master.cpp:3824] Using default CRAM-MD5 authenticator I0224 18:51:17.580572 30228 authenticator.hpp:170] Creating new server SASL connection I0224 18:51:17.581501 30231 authenticatee.hpp:230] Received SASL authentication mechanisms: CRAM-MD5 I0224 18:51:17.581805 30231 authenticatee.hpp:256] Attempting to authenticate with mechanism 'CRAM-MD5' I0224 18:51:17.582222 30228 authenticator.hpp:276] Received SASL authentication start I0224 18:51:17.582531 30228 authenticator.hpp:398] Authentication requires more steps I0224 18:51:17.582945 30230 authenticatee.hpp:276] Received SASL authentication step I0224 18:51:17.583351 30228 authenticator.hpp:304] Received SASL authentication step I0224 18:51:17.583643 30228 auxprop.cpp:99] Request to lookup properties for user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0224 18:51:17.583911 30228 auxprop.cpp:171] Looking up auxiliary property '*userPassword' I0224 18:51:17.584241 30228 auxprop.cpp:171] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0224 18:51:17.584517 30228 auxprop.cpp:99] Request to lookup properties for user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0224 18:51:17.584787 30228 auxprop.cpp:121] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0224 18:51:17.585075 30228 auxprop.cpp:121] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0224 18:51:17.585358 30228 authenticator.hpp:390] Authentication success I0224 18:51:17.585750 30233 authenticatee.hpp:316] Authentication success I0224 18:51:17.586354 30232 master.cpp:3871] Successfully authenticated principal 'test-principal' at slave(138)@192.168.122.135:44950 I0224 18:51:17.590953 30234 slave.cpp:758] Successfully authenticated with master master@192.168.122.135:44950 I0224 18:51:17.591686 30233 master.cpp:2938] Registering slave at slave(138)@192.168.122.135:44950 (fedora-19) with id 20150224-185117-2272962752-44950-30213-S0 I0224 18:51:17.592718 30233 registrar.cpp:445] Applied 1 operations in 100358ns; attempting to update the 'registry' I0224 18:51:17.595989 30227 log.cpp:684] Attempting to append 302 bytes to the log I0224 18:51:17.596757 30227 coordinator.cpp:340] Coordinator attempting to write 2 action at position 3 I0224 18:51:17.599280 30227 replica.cpp:511] Replica received write request for position 3 I0224 18:51:17.599481 30234 slave.cpp:1090] Will retry registration in 12.331173ms if necessary I0224 18:51:17.601940 30227 leveldb.cpp:343] Persisting action (321 bytes) to leveldb took 999045ns I0224 18:51:17.602339 30227 replica.cpp:679] Persisted action at 3 I0224 18:51:17.612349 30229 replica.cpp:658] Replica received learned notice for position 3 I0224 18:51:17.612934 30229 leveldb.cpp:343] Persisting action (323 bytes) to leveldb took 152139ns I0224 18:51:17.613471 30229 replica.cpp:679] Persisted action at 3 I0224 18:51:17.613796 30229 replica.cpp:664] Replica learned 2 action at position 3 I0224 18:51:17.615980 30229 master.cpp:2926] Ignoring register slave message from slave(138)@192.168.122.135:44950 (fedora-19) as admission is already in progress I0224 18:51:17.614302 30233 slave.cpp:1090] Will retry registration in 11.014835ms if necessary I0224 18:51:17.617490 30234 registrar.cpp:490] Successfully updated the 'registry' in 24.179968ms I0224 18:51:17.618989 30234 master.cpp:2995] Registered slave 20150224-185117-2272962752-44950-30213-S0 at slave(138)@192.168.122.135:44950 (fedora-19) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0224 18:51:17.619567 30233 hierarchical.hpp:455] Added slave 20150224-185117-2272962752-44950-30213-S0 (fedora-19) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (and cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] available) I0224 18:51:17.621080 30233 hierarchical.hpp:834] No resources available to allocate! I0224 18:51:17.621441 30233 hierarchical.hpp:759] Performed allocation for slave 20150224-185117-2272962752-44950-30213-S0 in 544608ns I0224 18:51:17.619704 30229 slave.cpp:792] Registered with master master@192.168.122.135:44950; given slave ID 20150224-185117-2272962752-44950-30213-S0 I0224 18:51:17.622195 30229 slave.cpp:2830] Received ping from slave-observer(125)@192.168.122.135:44950 I0224 18:51:17.622385 30227 status_update_manager.cpp:178] Resuming sending status updates I0224 18:51:17.620266 30232 log.cpp:703] Attempting to truncate the log to 3 I0224 18:51:17.623522 30232 coordinator.cpp:340] Coordinator attempting to write 3 action at position 4 I0224 18:51:17.624835 30229 replica.cpp:511] Replica received write request for position 4 I0224 18:51:17.625727 30229 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 259831ns I0224 18:51:17.626122 30229 replica.cpp:679] Persisted action at 4 I0224 18:51:17.627686 30227 replica.cpp:658] Replica received learned notice for position 4 I0224 18:51:17.628228 30227 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 93777ns I0224 18:51:17.628785 30227 leveldb.cpp:401] Deleting ~2 keys from leveldb took 57660ns I0224 18:51:17.629176 30227 replica.cpp:679] Persisted action at 4 I0224 18:51:17.629443 30227 replica.cpp:664] Replica learned 3 action at position 4 I0224 18:51:17.636715 30213 sched.cpp:157] Version: 0.23.0 I0224 18:51:17.638003 30229 sched.cpp:254] New master detected at master@192.168.122.135:44950 I0224 18:51:17.638602 30229 sched.cpp:310] Authenticating with master master@192.168.122.135:44950 I0224 18:51:17.639024 30229 sched.cpp:317] Using default CRAM-MD5 authenticatee I0224 18:51:17.639580 30228 authenticatee.hpp:139] Creating new client SASL connection I0224 18:51:17.640455 30235 process.cpp:2117] Dropped / Lost event for PID: scheduler-11bb6bcb-cd51-4927-a28b-dbca9d63772f@192.168.122.135:44950 I0224 18:51:17.641150 30228 master.cpp:3813] Authenticating scheduler-fc72e828-0783-41b6-9892-ffc961e8567e@192.168.122.135:44950 I0224 18:51:17.641597 30228 master.cpp:3824] Using default CRAM-MD5 authenticator I0224 18:51:17.642643 30228 authenticator.hpp:170] Creating new server SASL connection I0224 18:51:17.643698 30234 authenticatee.hpp:230] Received SASL authentication mechanisms: CRAM-MD5 I0224 18:51:17.644296 30234 authenticatee.hpp:256] Attempting to authenticate with mechanism 'CRAM-MD5' I0224 18:51:17.644739 30228 authenticator.hpp:276] Received SASL authentication start I0224 18:51:17.645143 30228 authenticator.hpp:398] Authentication requires more steps I0224 18:51:17.645654 30230 authenticatee.hpp:276] Received SASL authentication step I0224 18:51:17.646122 30228 authenticator.hpp:304] Received SASL authentication step I0224 18:51:17.646421 30228 auxprop.cpp:99] Request to lookup properties for user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0224 18:51:17.646746 30228 auxprop.cpp:171] Looking up auxiliary property '*userPassword' I0224 18:51:17.647203 30228 auxprop.cpp:171] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0224 18:51:17.647644 30228 auxprop.cpp:99] Request to lookup properties for user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0224 18:51:17.648454 30228 auxprop.cpp:121] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0224 18:51:17.648788 30228 auxprop.cpp:121] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0224 18:51:17.649210 30228 authenticator.hpp:390] Authentication success I0224 18:51:17.649705 30231 authenticatee.hpp:316] Authentication success I0224 18:51:17.653314 30231 sched.cpp:398] Successfully authenticated with master master@192.168.122.135:44950 I0224 18:51:17.653766 30232 master.cpp:3871] Successfully authenticated principal 'test-principal' at scheduler-fc72e828-0783-41b6-9892-ffc961e8567e@192.168.122.135:44950 I0224 18:51:17.654683 30231 sched.cpp:521] Sending registration request to master@192.168.122.135:44950 I0224 18:51:17.655138 30231 sched.cpp:554] Will retry registration in 1.028970132secs if necessary I0224 18:51:17.657112 30232 master.cpp:1574] Received registration request for framework 'default' at scheduler-fc72e828-0783-41b6-9892-ffc961e8567e@192.168.122.135:44950 I0224 18:51:17.658509 30232 master.cpp:1435] Authorizing framework principal 'test-principal' to receive offers for role '*' I0224 18:51:17.659765 30232 master.cpp:1638] Registering framework 20150224-185117-2272962752-44950-30213-0000 (default) at scheduler-fc72e828-0783-41b6-9892-ffc961e8567e@192.168.122.135:44950 I0224 18:51:17.660727 30233 hierarchical.hpp:321] Added framework 20150224-185117-2272962752-44950-30213-0000 I0224 18:51:17.661730 30233 hierarchical.hpp:741] Performed allocation for 1 slaves in 529369ns I0224 18:51:17.662911 30229 sched.cpp:448] Framework registered with 20150224-185117-2272962752-44950-30213-0000 I0224 18:51:17.663374 30229 sched.cpp:462] Scheduler::registered took 35637ns I0224 18:51:17.664552 30232 master.cpp:3755] Sending 1 offers to framework 20150224-185117-2272962752-44950-30213-0000 (default) at scheduler-fc72e828-0783-41b6-9892-ffc961e8567e@192.168.122.135:44950 I0224 18:51:17.668009 30234 sched.cpp:611] Scheduler::resourceOffers took 2.574292ms I0224 18:51:17.671038 30232 master.cpp:2268] Processing ACCEPT call for offers: [ 20150224-185117-2272962752-44950-30213-O0 ] on slave 20150224-185117-2272962752-44950-30213-S0 at slave(138)@192.168.122.135:44950 (fedora-19) for framework 20150224-185117-2272962752-44950-30213-0000 (default) at scheduler-fc72e828-0783-41b6-9892-ffc961e8567e@192.168.122.135:44950 I0224 18:51:17.672071 30232 master.cpp:2112] Authorizing framework principal 'test-principal' to launch task 1 as user 'jenkins' W0224 18:51:17.674675 30232 validation.cpp:326] Executor default for task 1 uses less CPUs (None) than the minimum required (0.01). Please update your executor, as this will be mandatory in future releases. W0224 18:51:17.675395 30232 validation.cpp:338] Executor default for task 1 uses less memory (None) than the minimum required (32MB). Please update your executor, as this will be mandatory in future releases. I0224 18:51:17.676460 30232 master.hpp:822] Adding task 1 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20150224-185117-2272962752-44950-30213-S0 (fedora-19) I0224 18:51:17.677078 30232 master.cpp:2545] Launching task 1 of framework 20150224-185117-2272962752-44950-30213-0000 (default) at scheduler-fc72e828-0783-41b6-9892-ffc961e8567e@192.168.122.135:44950 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20150224-185117-2272962752-44950-30213-S0 at slave(138)@192.168.122.135:44950 (fedora-19) I0224 18:51:17.678084 30230 slave.cpp:1121] Got ass...",1 MESOS-2402,"MesosContainerizerDestroyTest.LauncherDestroyFailure is flaky","""Failed to os::execvpe in childMain"". Never seen this one before. {code} [ RUN ] MesosContainerizerDestroyTest.LauncherDestroyFailure Using temporary directory '/tmp/MesosContainerizerDestroyTest_LauncherDestroyFailure_QpjQEn' I0224 18:55:49.326912 21391 containerizer.cpp:461] Starting container 'test_container' for executor 'executor' of framework '' I0224 18:55:49.332252 21391 launcher.cpp:130] Forked child with pid '23496' for container 'test_container' ABORT: (src/subprocess.cpp:165): Failed to os::execvpe in childMain *** Aborted at 1424832949 (unix time) try ""date -d @1424832949"" if you are using GNU date *** PC: @ 0x2b178c5db0d5 (unknown) I0224 18:55:49.340955 21392 process.cpp:2117] Dropped / Lost event for PID: scheduler-509d37ac-296f-4429-b101-af433c1800e9@127.0.1.1:39647 I0224 18:55:49.342300 21386 containerizer.cpp:911] Destroying container 'test_container' *** SIGABRT (@0x3e800005bc8) received by PID 23496 (TID 0x2b178f9f0700) from PID 23496; stack trace: *** @ 0x2b178c397cb0 (unknown) @ 0x2b178c5db0d5 (unknown) @ 0x2b178c5de83b (unknown) @ 0x87a945 _Abort() @ 0x2b1789f610b9 process::childMain() I0224 18:55:49.391793 21386 containerizer.cpp:1120] Executor for container 'test_container' has exited I0224 18:55:49.400478 21391 process.cpp:2770] Handling HTTP event for process 'metrics' with path: '/metrics/snapshot' tests/containerizer_tests.cpp:485: Failure Value of: metrics.values[""containerizer/mesos/container_destroy_errors""] Actual: 16-byte object <02-00 00-00 17-2B 00-00 E0-86 0E-04 00-00 00-00> Expected: 1u Which is: 1 [ FAILED ] MesosContainerizerDestroyTest.LauncherDestroyFailure (89 ms) {code}",2 MESOS-2403,"MasterAllocatorTest/0.FrameworkReregistersFirst is flaky","{code} [ RUN ] MasterAllocatorTest/0.FrameworkReregistersFirst Using temporary directory '/tmp/MasterAllocatorTest_0_FrameworkReregistersFirst_Vy5Nml' I0224 23:22:31.681670 30589 leveldb.cpp:176] Opened db in 2.943518ms I0224 23:22:31.682152 30619 process.cpp:2117] Dropped / Lost event for PID: slave(65)@67.195.81.187:38391 I0224 23:22:31.682732 30589 leveldb.cpp:183] Compacted db in 1.029469ms I0224 23:22:31.682777 30589 leveldb.cpp:198] Created db iterator in 15460ns I0224 23:22:31.682792 30589 leveldb.cpp:204] Seeked to beginning of db in 1832ns I0224 23:22:31.682802 30589 leveldb.cpp:273] Iterated through 0 keys in the db in 319ns I0224 23:22:31.682833 30589 replica.cpp:744] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0224 23:22:31.683228 30605 recover.cpp:449] Starting replica recovery I0224 23:22:31.683537 30605 recover.cpp:475] Replica is in 4 status I0224 23:22:31.684624 30615 replica.cpp:641] Replica in 4 status received a broadcasted recover request I0224 23:22:31.684978 30616 recover.cpp:195] Received a recover response from a replica in 4 status I0224 23:22:31.685405 30610 recover.cpp:566] Updating replica status to 3 I0224 23:22:31.686249 30609 master.cpp:349] Master 20150224-232231-3142697795-38391-30589 (pomona.apache.org) started on 67.195.81.187:38391 I0224 23:22:31.686265 30617 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 717897ns I0224 23:22:31.686319 30617 replica.cpp:323] Persisted replica status to 3 I0224 23:22:31.686336 30609 master.cpp:395] Master only allowing authenticated frameworks to register I0224 23:22:31.686357 30609 master.cpp:400] Master only allowing authenticated slaves to register I0224 23:22:31.686390 30609 credentials.hpp:37] Loading credentials for authentication from '/tmp/MasterAllocatorTest_0_FrameworkReregistersFirst_Vy5Nml/credentials' I0224 23:22:31.686511 30606 recover.cpp:475] Replica is in 3 status I0224 23:22:31.686563 30609 master.cpp:442] Authorization enabled I0224 23:22:31.686929 30607 whitelist_watcher.cpp:79] No whitelist given I0224 23:22:31.686954 30603 hierarchical.hpp:287] Initialized hierarchical allocator process I0224 23:22:31.687134 30605 replica.cpp:641] Replica in 3 status received a broadcasted recover request I0224 23:22:31.687731 30609 master.cpp:1356] The newly elected leader is master@67.195.81.187:38391 with id 20150224-232231-3142697795-38391-30589 I0224 23:22:31.839818 30609 master.cpp:1369] Elected as the leading master! I0224 23:22:31.839834 30609 master.cpp:1187] Recovering from registrar I0224 23:22:31.839926 30605 registrar.cpp:313] Recovering registrar I0224 23:22:31.840000 30613 recover.cpp:195] Received a recover response from a replica in 3 status I0224 23:22:31.840504 30606 recover.cpp:566] Updating replica status to 1 I0224 23:22:31.841599 30611 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 990330ns I0224 23:22:31.841627 30611 replica.cpp:323] Persisted replica status to 1 I0224 23:22:31.841743 30611 recover.cpp:580] Successfully joined the Paxos group I0224 23:22:31.841904 30611 recover.cpp:464] Recover process terminated I0224 23:22:31.842366 30608 log.cpp:660] Attempting to start the writer I0224 23:22:31.843557 30607 replica.cpp:477] Replica received implicit promise request with proposal 1 I0224 23:22:31.844312 30607 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 722368ns I0224 23:22:31.844337 30607 replica.cpp:345] Persisted promised to 1 I0224 23:22:31.844889 30615 coordinator.cpp:230] Coordinator attemping to fill missing position I0224 23:22:31.846043 30614 replica.cpp:378] Replica received explicit promise request for position 0 with proposal 2 I0224 23:22:31.846729 30614 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 660024ns I0224 23:22:31.846746 30614 replica.cpp:679] Persisted action at 0 I0224 23:22:31.847671 30611 replica.cpp:511] Replica received write request for position 0 I0224 23:22:31.847723 30611 leveldb.cpp:438] Reading position from leveldb took 27349ns I0224 23:22:31.848429 30611 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 671461ns I0224 23:22:31.848454 30611 replica.cpp:679] Persisted action at 0 I0224 23:22:31.849041 30615 replica.cpp:658] Replica received learned notice for position 0 I0224 23:22:31.849762 30615 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 690386ns I0224 23:22:31.849787 30615 replica.cpp:679] Persisted action at 0 I0224 23:22:31.849808 30615 replica.cpp:664] Replica learned 1 action at position 0 I0224 23:22:31.850416 30612 log.cpp:676] Writer started with ending position 0 I0224 23:22:31.851490 30615 leveldb.cpp:438] Reading position from leveldb took 30659ns I0224 23:22:31.854452 30610 registrar.cpp:346] Successfully fetched the registry (0B) in 14.491136ms I0224 23:22:31.854543 30610 registrar.cpp:445] Applied 1 operations in 18024ns; attempting to update the 'registry' I0224 23:22:31.857095 30604 log.cpp:684] Attempting to append 139 bytes to the log I0224 23:22:31.857208 30608 coordinator.cpp:340] Coordinator attempting to write 2 action at position 1 I0224 23:22:31.858073 30609 replica.cpp:511] Replica received write request for position 1 I0224 23:22:31.858808 30609 leveldb.cpp:343] Persisting action (158 bytes) to leveldb took 701708ns I0224 23:22:31.858835 30609 replica.cpp:679] Persisted action at 1 I0224 23:22:31.859508 30618 replica.cpp:658] Replica received learned notice for position 1 I0224 23:22:31.860267 30618 leveldb.cpp:343] Persisting action (160 bytes) to leveldb took 731035ns I0224 23:22:31.860309 30618 replica.cpp:679] Persisted action at 1 I0224 23:22:31.860332 30618 replica.cpp:664] Replica learned 2 action at position 1 I0224 23:22:31.860983 30609 registrar.cpp:490] Successfully updated the 'registry' in 6.39616ms I0224 23:22:31.861071 30609 registrar.cpp:376] Successfully recovered registrar I0224 23:22:31.861126 30608 log.cpp:703] Attempting to truncate the log to 1 I0224 23:22:31.861249 30603 coordinator.cpp:340] Coordinator attempting to write 3 action at position 2 I0224 23:22:31.861248 30617 master.cpp:1214] Recovered 0 slaves from the Registry (101B) ; allowing 10mins for slaves to re-register I0224 23:22:31.861831 30613 replica.cpp:511] Replica received write request for position 2 I0224 23:22:31.862504 30613 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 648125ns I0224 23:22:31.862531 30613 replica.cpp:679] Persisted action at 2 I0224 23:22:31.863067 30603 replica.cpp:658] Replica received learned notice for position 2 I0224 23:22:31.863689 30603 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 602784ns I0224 23:22:31.863737 30603 leveldb.cpp:401] Deleting ~1 keys from leveldb took 28697ns I0224 23:22:31.863751 30603 replica.cpp:679] Persisted action at 2 I0224 23:22:31.863767 30603 replica.cpp:664] Replica learned 3 action at position 2 I0224 23:22:31.875962 30610 slave.cpp:174] Slave started on 66)@67.195.81.187:38391 I0224 23:22:31.876008 30610 credentials.hpp:85] Loading credential for authentication from '/tmp/MasterAllocatorTest_0_FrameworkReregistersFirst_ikVXQM/credential' I0224 23:22:31.876144 30610 slave.cpp:281] Slave using credential for: test-principal I0224 23:22:31.876404 30610 slave.cpp:299] Slave resources: cpus(*):2; mem(*):1024; disk(*):3.70122e+06; ports(*):[31000-32000] I0224 23:22:31.876489 30610 slave.cpp:328] Slave hostname: pomona.apache.org I0224 23:22:31.876502 30610 slave.cpp:329] Slave checkpoint: false W0224 23:22:31.876507 30610 slave.cpp:331] Disabling checkpointing is deprecated and the --checkpoint flag will be removed in a future release. Please avoid using this flag I0224 23:22:31.877014 30603 state.cpp:35] Recovering state from '/tmp/MasterAllocatorTest_0_FrameworkReregistersFirst_ikVXQM/meta' I0224 23:22:31.877230 30610 status_update_manager.cpp:197] Recovering status update manager I0224 23:22:31.877495 30609 slave.cpp:3776] Finished recovery I0224 23:22:31.877879 30607 status_update_manager.cpp:171] Pausing sending status updates I0224 23:22:31.877879 30604 slave.cpp:624] New master detected at master@67.195.81.187:38391 I0224 23:22:31.877959 30604 slave.cpp:687] Authenticating with master master@67.195.81.187:38391 I0224 23:22:31.877975 30604 slave.cpp:692] Using default CRAM-MD5 authenticatee I0224 23:22:31.878069 30604 slave.cpp:660] Detecting new master I0224 23:22:31.878093 30608 authenticatee.hpp:139] Creating new client SASL connection I0224 23:22:31.878223 30604 master.cpp:3813] Authenticating slave(66)@67.195.81.187:38391 I0224 23:22:31.878244 30604 master.cpp:3824] Using default CRAM-MD5 authenticator I0224 23:22:31.878412 30613 authenticator.hpp:170] Creating new server SASL connection I0224 23:22:31.878525 30603 authenticatee.hpp:230] Received SASL authentication mechanisms: CRAM-MD5 I0224 23:22:31.878551 30603 authenticatee.hpp:256] Attempting to authenticate with mechanism 'CRAM-MD5' I0224 23:22:31.878625 30617 authenticator.hpp:276] Received SASL authentication start I0224 23:22:31.878662 30617 authenticator.hpp:398] Authentication requires more steps I0224 23:22:31.878727 30603 authenticatee.hpp:276] Received SASL authentication step I0224 23:22:31.878815 30617 authenticator.hpp:304] Received SASL authentication step I0224 23:22:31.878839 30617 auxprop.cpp:99] Request to lookup properties for user: 'test-principal' realm: 'pomona.apache.org' server FQDN: 'pomona.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0224 23:22:31.878847 30617 auxprop.cpp:171] Looking up auxiliary property '*userPassword' I0224 23:22:31.878875 30617 auxprop.cpp:171] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0224 23:22:31.878891 30617 auxprop.cpp:99] Request to lookup properties for user: 'test-principal' realm: 'pomona.apache.org' server FQDN: 'pomona.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0224 23:22:31.878900 30617 auxprop.cpp:121] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0224 23:22:31.878906 30617 auxprop.cpp:121] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0224 23:22:31.878916 30617 authenticator.hpp:390] Authentication success I0224 23:22:31.880717 30589 sched.cpp:157] Version: 0.23.0 I0224 23:22:32.017823 30611 authenticatee.hpp:316] Authentication success I0224 23:22:32.017901 30618 master.cpp:3871] Successfully authenticated principal 'test-principal' at slave(66)@67.195.81.187:38391 I0224 23:22:32.018156 30615 sched.cpp:254] New master detected at master@67.195.81.187:38391 I0224 23:22:32.018240 30615 sched.cpp:310] Authenticating with master master@67.195.81.187:38391 I0224 23:22:32.018263 30615 sched.cpp:317] Using default CRAM-MD5 authenticatee I0224 23:22:32.018496 30613 slave.cpp:758] Successfully authenticated with master master@67.195.81.187:38391 I0224 23:22:32.018579 30611 authenticatee.hpp:139] Creating new client SASL connection I0224 23:22:32.018620 30613 slave.cpp:1090] Will retry registration in 363167ns if necessary I0224 23:22:32.018811 30615 master.cpp:2938] Registering slave at slave(66)@67.195.81.187:38391 (pomona.apache.org) with id 20150224-232231-3142697795-38391-30589-S0 I0224 23:22:32.019122 30615 master.cpp:3813] Authenticating scheduler-9a3224cc-aef0-49a7-a240-4b85b913ff44@67.195.81.187:38391 I0224 23:22:32.019156 30615 master.cpp:3824] Using default CRAM-MD5 authenticator I0224 23:22:32.019232 30612 registrar.cpp:445] Applied 1 operations in 57599ns; attempting to update the 'registry' I0224 23:22:32.019394 30603 authenticator.hpp:170] Creating new server SASL connection I0224 23:22:32.019541 30611 authenticatee.hpp:230] Received SASL authentication mechanisms: CRAM-MD5 I0224 23:22:32.019568 30611 authenticatee.hpp:256] Attempting to authenticate with mechanism 'CRAM-MD5' I0224 23:22:32.019666 30605 authenticator.hpp:276] Received SASL authentication start I0224 23:22:32.019717 30605 authenticator.hpp:398] Authentication requires more steps I0224 23:22:32.019805 30615 authenticatee.hpp:276] Received SASL authentication step I0224 23:22:32.019942 30605 authenticator.hpp:304] Received SASL authentication step I0224 23:22:32.019979 30605 auxprop.cpp:99] Request to lookup properties for user: 'test-principal' realm: 'pomona.apache.org' server FQDN: 'pomona.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0224 23:22:32.019994 30605 auxprop.cpp:171] Looking up auxiliary property '*userPassword' I0224 23:22:32.020025 30605 auxprop.cpp:171] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0224 23:22:32.020036 30610 slave.cpp:1090] Will retry registration in 10.850555ms if necessary I0224 23:22:32.020053 30605 auxprop.cpp:99] Request to lookup properties for user: 'test-principal' realm: 'pomona.apache.org' server FQDN: 'pomona.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0224 23:22:32.020102 30605 auxprop.cpp:121] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0224 23:22:32.020117 30605 auxprop.cpp:121] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0224 23:22:32.020133 30605 authenticator.hpp:390] Authentication success I0224 23:22:32.020151 30611 master.cpp:2926] Ignoring register slave message from slave(66)@67.195.81.187:38391 (pomona.apache.org) as admission is already in progress I0224 23:22:32.020226 30603 authenticatee.hpp:316] Authentication success I0224 23:22:32.020256 30611 master.cpp:3871] Successfully authenticated principal 'test-principal' at scheduler-9a3224cc-aef0-49a7-a240-4b85b913ff44@67.195.81.187:38391 I0224 23:22:32.020534 30615 sched.cpp:398] Successfully authenticated with master master@67.195.81.187:38391 I0224 23:22:32.020561 30615 sched.cpp:521] Sending registration request to master@67.195.81.187:38391 I0224 23:22:32.020635 30615 sched.cpp:554] Will retry registration in 490.035142ms if necessary I0224 23:22:32.020720 30613 master.cpp:1574] Received registration request for framework 'default' at scheduler-9a3224cc-aef0-49a7-a240-4b85b913ff44@67.195.81.187:38391 I0224 23:22:32.020787 30613 master.cpp:1435] Authorizing framework principal 'test-principal' to receive offers for role '*' I0224 23:22:32.021122 30607 master.cpp:1638] Registering framework 20150224-232231-3142697795-38391-30589-0000 (default) at scheduler-9a3224cc-aef0-49a7-a240-4b85b913ff44@67.195.81.187:38391 I0224 23:22:32.021502 30611 hierarchical.hpp:321] Added framework 20150224-232231-3142697795-38391-30589-0000 I0224 23:22:32.021531 30611 hierarchical.hpp:834] No resources available to allocate! I0224 23:22:32.021543 30611 hierarchical.hpp:741] Performed allocation for 0 slaves in 18915ns I0224 23:22:32.021618 30609 sched.cpp:448] Framework registered with 20150224-232231-3142697795-38391-30589-0000 I0224 23:22:32.021673 30609 sched.cpp:462] Scheduler::registered took 26310ns I0224 23:22:32.022400 30613 log.cpp:684] Attempting to append 316 bytes to the log I0224 23:22:32.022523 30608 coordinator.cpp:340] Coordinator attempting to write 2 action at position 3 I0224 23:22:32.023232 30607 replica.cpp:511] Replica received write request for position 3 I0224 23:22:32.024055 30607 leveldb.cpp:343] Persisting action (335 bytes) to leveldb took 798548ns I0224 23:22:32.024073 30607 replica.cpp:679] Persisted action at 3 I0224 23:22:32.024651 30610 replica.cpp:658] Replica received learned notice for position 3 I0224 23:22:32.025252 30610 leveldb.cpp:343] Persisting action (337 bytes) to leveldb took 580525ns I0224 23:22:32.025271 30610 replica.cpp:679] Persisted action at 3 I0224 23:22:32.025297 30610 replica.cpp:664] Replica learned 2 action at position 3 I0224 23:22:32.025995 30618 registrar.cpp:490] Successfully updated the 'registry' in 6.586112ms I0224 23:22:32.026228 30604 log.cpp:703] Attempting to truncate the log to 3 I0224 23:22:32.026360 30609 coordinator.cpp:340] Coordinator attempting to write 3 action at position 4 I0224 23:22:32.026669 30609 slave.cpp:2831] Received ping from slave-observer(66)@67.195.81.187:38391 I0224 23:22:32.026772 30609 slave.cpp:792] Registered with master master@67.195.81.187:38391; given slave ID 20150224-232231-3142697795-38391-30589-S0 I0224 23:22:32.026737 30603 master.cpp:2995] Registered slave 20150224-232231-3142697795-38391-30589-S0 at slave(66)@67.195.81.187:38391 (pomona.apache.org) with cpus(*):2; mem(*):1024; disk(*):3.70122e+06; ports(*):[31000-32000] I0224 23:22:32.026867 30603 status_update_manager.cpp:178] Resuming sending status updates I0224 23:22:32.026868 30617 hierarchical.hpp:455] Added slave 20150224-232231-3142697795-38391-30589-S0 (pomona.apache.org) with cpus(*):2; mem(*):1024; disk(*):3.70122e+06; ports(*):[31000-32000] (and cpus(*):2; mem(*):1024; disk(*):3.70122e+06; ports(*):[31000-32000] available) I0224 23:22:32.026921 30615 replica.cpp:511] Replica received write request for position 4 I0224 23:22:32.027276 30617 hierarchical.hpp:759] Performed allocation for slave 20150224-232231-3142697795-38391-30589-S0 in 351257ns I0224 23:22:32.027580 30615 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 624249ns I0224 23:22:32.027604 30615 replica.cpp:679] Persisted action at 4 I0224 23:22:32.027642 30618 master.cpp:3755] Sending 1 offers to framework 20150224-232231-3142697795-38391-30589-0000 (default) at scheduler-9a3224cc-aef0-49a7-a240-4b85b913ff44@67.195.81.187:38391 I0224 23:22:32.028223 30617 replica.cpp:658] Replica received learned notice for position 4 I0224 23:22:32.028621 30607 sched.cpp:611] Scheduler::resourceOffers took 648326ns I0224 23:22:32.028916 30617 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 662416ns I0224 23:22:32.028991 30617 leveldb.cpp:401] Deleting ~2 keys from leveldb took 47386ns I0224 23:22:32.029021 30617 replica.cpp:679] Persisted action at 4 I0224 23:22:32.029044 30617 replica.cpp:664] Replica learned 3 action at position 4 I0224 23:22:32.029534 30613 master.cpp:2268] Processing ACCEPT call for offers: [ 20150224-232231-3142697795-38391-30589-O0 ] on slave 20150224-232231-3142697795-38391-30589-S0 at slave(66)@67.195.81.187:38391 (pomona.apache.org) for framework 20150224-232231-3142697795-38391-30589-0000 (default) at scheduler-9a3224cc-aef0-49a7-a240-4b85b913ff44@67.195.81.187:38391 I0224 23:22:32.190521 30613 master.cpp:2112] Authorizing framework principal 'test-principal' to launch task 0 as user 'jenkins' W0224 23:22:32.191864 30604 validation.cpp:328] Executor default for task 0 uses less CPUs (None) than the minimum required (0.01). Please update your executor, as this will be mandatory in future releases. W0224 23:22:32.191905 30604 validation.cpp:340] Executor default for task 0 uses less memory (None) than the minimum required (32MB). Please update your executor, as this will be mandatory in future releases. I0224 23:22:32.192206 30604 master.hpp:822] Adding task 0 with resources cpus(*):1; mem(*):500 on slave 20150224-232231-3142697795-38391-30589-S0 (pomona.apache.org) I0224 23:22:32.192318 30604 master.cpp:2545] Launching task 0 of framework 20150224-232231-3142697795-38391-30589-0000 (default) at scheduler-9a3224cc-aef0-49a7-a240-4b85b913ff44@67.195.81.187:38391 with resources cpus(*):1; mem(*):500 on slave 20150224-232231-3142697795-38391-30589-S0 at slave(66)@67.195.81.187:38391 (pomona.apache.org) I0224 23:22:32.192659 30611 slave.cpp:1121] Got assigned task 0 for framework 20150224-232231-3142697795-38391-30589-0000 I0224 23:22:32.192847 30609 hierarchical.hpp:648] Recovered cpus(*):1; mem(*):524; disk(*):3.70122e+06; ports(*):[31000-32000] (total allocatable: cpus(*):1; mem(*):524; ...",2 MESOS-2404,"Add an example framework to test persistent volumes.","This serves two purposes: 1) testing the new persistence feature 2) served as an example for others to use the new feature",3 MESOS-2405,"Add user doc for using persistent volumes.",NULL,2 MESOS-2408,"Slave should garbage collect released persistent volumes.","This is tricky in the case when a persistence id is re-used. When a persistent volume is destroyed explicitly by the framework, master deletes all information about this volume. That mean the master no longer has the ability to check if the persistence id is re-used (and reject the later attempt). On the slave side, we'll use some GC policy to remove directories associated with deleted persistent volumes (similar to how we GC sandboxes). That means the persistent volume directory won't be deleted immediately when the volume is destroyed by the framework explicitly. When the same persistence id is reused, we'll see the persistent volume still exists and we need to cancel the GC of that directory (similar to what we cancel the GC for meta directories during runTask).",5 MESOS-2422,"Use fq_codel qdisc for egress network traffic isolation",NULL,8 MESOS-2427,"Add Java binding for the acceptOffers API.","We introduced the new acceptOffers API in C++ driver. We need to provide Java binding for this API as well.",2 MESOS-2428,"Add Python bindings for the acceptOffers API.","We introduced the new acceptOffers API in C++ driver. We need to provide Python binding for this API as well.",2 MESOS-2438,"Improve support for streaming HTTP Responses in libprocess.","Currently libprocess' HTTP::Response supports a PIPE construct for doing streaming responses: {code} struct Response { ... // Either provide a ""body"", an absolute ""path"" to a file, or a // ""pipe"" for streaming a response. Distinguish between the cases // using 'type' below. // // BODY: Uses 'body' as the body of the response. These may be // encoded using gzip for efficiency, if 'Content-Encoding' is not // already specified. // // PATH: Attempts to perform a 'sendfile' operation on the file // found at 'path'. // // PIPE: Splices data from 'pipe' using 'Transfer-Encoding=chunked'. // Note that the read end of the pipe will be closed by libprocess // either after the write end has been closed or if the socket the // data is being spliced to has been closed (i.e., nobody is // listening any longer). This can cause writes to the pipe to // generate a SIGPIPE (which will terminate your program unless you // explicitly ignore them or handle them). // // In all cases (BODY, PATH, PIPE), you are expected to properly // specify the 'Content-Type' header, but the 'Content-Length' and // or 'Transfer-Encoding' headers will be filled in for you. enum { NONE, BODY, PATH, PIPE } type; ... }; {code} This interface is too low level and difficult to program against: * Connection closure is signaled with SIGPIPE, which is difficult for callers to deal with (must suppress SIGPIPE locally or globally in order to get EPIPE instead). * Pipes are generally for inter-process communication, and the pipe has finite size. With a blocking pipe the caller must deal with blocking when the pipe's buffer limit is exceeded. With a non-blocking pipe, the caller must deal with retrying the write. We'll want to consider a few use cases: # Sending an HTTP::Response with streaming data. # Making a request with http::get and http::post in which the data is returned in a streaming manner. # Making a request in which the request content is streaming. This ticket will focus on 1 as it is required for the HTTP API.",8 MESOS-2447,"Mesos replicated log does not log the Action type name.","This is a regression introduced during the internal namespace refactor. 0.21.0 master: {noformat} I0224 02:43:29.806895 50982 replica.cpp:661] Replica learned APPEND action at position 1655 {noformat} 0.22.0 master: {noformat} I0303 21:45:39.406929 1302 replica.cpp:664] Replica learned 2 action at position 2079 {noformat}",1 MESOS-2452,"The recovered executor directory points to the meta directory.","The bug was introduced in this review: https://reviews.apache.org/r/29687 RunState.directory points to the metadata directory. This would cause the PosixDiskIsolator to report incorrect disk usages after slave recovery. We also need a test to test the slave recovery path for the PosixDiskIsolator.",2 MESOS-2454,"Add support for /proc/self/mountinfo on Linux","/proc/self/mountinfo provides mount information specific to the calling process. This includes information on optional fields describing mount propagation, e.g., shared/slave mounts. Initially, add this to linux/fs then perhaps move existing users of MountTable to use the mountinfo, deprecating and removing the mostly (but not entirely) redundant code.",3 MESOS-2455,"Add operator endpoints to create/destroy persistent volumes.","Persistent volumes will not be released automatically. So we probably need an endpoint for operators to forcefully release persistent volumes. We probably need to add principal to Persistence struct and use ACLs to control who can release what. Additionally, it would be useful to have an endpoint for operators to create persistent volumes.",3 MESOS-2461,"Slave should provide details on processes running in its cgroups","The slave can optionally be put into its own cgroups for a list of subsystems, e.g., for monitoring of memory and cpu. See the slave flag: --slave_subsystems It currently refuses to start if there are any processes in its cgroups - this could be another slave or some subprocess started by a previous slave - and only logs the pids of those processes. Improve this to log details about the processes: suggest at least the process command, uid running it, and perhaps its start time.",1 MESOS-2462,"Add option for Subprocess to set a death signal for the forked child","Currently, children forked by the slave, including those through Subprocess, will continue running if the slave exits. For some processes, including helper processes like the fetcher, du, or perf, we'd like them to be terminated when the slave exits. Add support to Subprocess to optionally set a DEATHSIG for the child, e.g., setting SIGTERM would mean the child would get SIGTERM when the slave terminates. This can be done (*after forking*) with PR_SET_DEATHSIG. See ""man prctl"". It is preserved through an exec call.",3 MESOS-2464,"Authentication failure may lead to slave crash","When slave authentication fails, the following attempt to transmit a {{UnregisterSlaveMessage}} may cause a crash within the slave. {noformat} E0309 01:08:34.819758 336699392 slave.cpp:740] Master master@192.168.178.20:5050 refused authentication I0309 01:08:34.819787 336699392 slave.cpp:538] Master refused authentication; unregistering and shutting down [libprotobuf FATAL google/protobuf/message_lite.cc:273] CHECK failed: IsInitialized(): Can't serialize message of type ""mesos.internal.UnregisterSlaveMessage"" because it is missing required fields: slave_id.value libprocess: slave(1)@192.168.178.20:5051 terminating due to CHECK failed: IsInitialized(): Can't serialize message of type ""mesos.internal.UnregisterSlaveMessage"" because it is missing required fields: slave_id.value {noformat} The problem here is the following code: {noformat} UnregisterSlaveMessage message_; message_.mutable_slave_id()->MergeFrom(info.id()); {noformat} Authentication happens before registration. {{info.id}} is an optional member (of {{SlaveInfo}}) and not known yet. It is set later, while registering. So {{slave_id}} will remain unset.",1 MESOS-2466,"Write documentation for all the LIBPROCESS_* environment variables.","libprocess uses a set of environment variables to modify its behaviour; however, these variables are not documented anywhere, nor it is defined where the documentation should be. What would be needed is a decision whether the environment variables should be documented (a new doc file or reusing an existing one), and then add the documentation there. After searching in the code, these are the variables which need to be documented: # {{LIBPROCESS_IP}} # {{LIBPROCESS_PORT}} # {{LIBPROCESS_ADVERTISE_IP}} # {{LIBPROCESS_ADVERTISE_PORT}}",2 MESOS-2467,"Allow --resources flag to take JSON.","Currently, we used a customized format for --resources flag. As we introduce more and more stuffs (e.g., persistence, reservation) in Resource object, we need a more generic way to specify --resources. For backward compatibility, we can scan the first character. If it is '[', then we invoke the JSON parser. Otherwise, we use the existing parser.",3 MESOS-2469,"Mesos master/slave should be able to bind to 127.0.0.1 if explicitly requested","With the current refactoring to IP it looks like master and slave can no longer bind to 127.0.0.1 even if explicitly requested via ""--ip"" flag. Among other things, this breaks the balloon framework test which uses this flag.",1 MESOS-2475,"Add the Resource::ReservationInfo protobuf message","The {{Resource::ReservationInfo}} protobuf message encapsulates information needed to keep track of reservations. It's named {{ReservationInfo}} rather than {{Reservation}} to keep consistency with {{Resource::DiskInfo}}. Here's what it will look like: {code} message ReservationInfo { // Indicates the principal of the operator or framework that created the // reservation. This is used to determine whether this resource can be // unreserved by an operator or a framework by checking the // ""unreserve"" ACL. required string principal; } // If this is set, this resource was dynamically reserved by an // operator or a framework. Otherwise, this resource was // statically configured by an operator via the --resources flag. optional ReservationInfo reservation; {code}",2 MESOS-2476,"Enable Resources to handle Resource::ReservationInfo","After [MESOS-2475|https://issues.apache.org/jira/browse/MESOS-2475], our C++ {{Resources}} class needs to know how to handle {{Resource}} protobuf messages that have the {{reservation}} field set.",2 MESOS-2477,"Enable Resources::apply to handle reservation operations.","{{Resources::apply}} currently only handles {{Create}} and {{Destroy}} operations which exist for persistent volumes. We need to handle the {{Reserve}} and {{Unreserve}} operations for dynamic reservations as well.",3 MESOS-2485,"Add ability to distinguish slave removals metrics by reason.","Currently we only expose a single removal metric ({{""master/slave_removals""}}) which makes it difficult to distinguish between removal reasons in the alerting. Currently, a slave can be removed for the following reasons: # Health checks failed. # Slave unregistered. # Slave was replaced by a new slave (on the same endpoint). In the case of (2), we expect this to be due to maintenance and don't want to be notified as strongly as with health check failures.",3 MESOS-2489,"Enable a framework to perform reservation operations.","h3. Goal This is the first step to supporting dynamic reservations. The goal of this task is to enable a framework to reply to a resource offer with *Reserve* and *Unreserve* offer operations as defined by {{Offer::Operation}} in {{mesos.proto}}. h3. Overview It's divided into a few subtasks so that it's clear what the small chunks to be addressed are. In summary, we need to introduce the {{Resource::ReservationInfo}} protobuf message to encapsulate the reservation information, enable the C++ {{Resources}} class to handle it then enable the master to handle reservation operations. h3. Expected Outcome * The framework will be able to send back reservation operations to (un)reserve resources. * The reservations are kept only in the master since we don't send the {{CheckpointResources}} message to checkpoint the reservations on the slave yet. * The reservations are considered to be reserved for the framework's role.",4 MESOS-2491,"Persist the reservation state on the slave","h3. Goal The goal for this task is to persist the reservation state stored on the master on the corresponding slave. The {{needCheckpointing}} predicate is used to capture the condition for which a resource needs to be checkpointed. Currently the only condition is {{isPersistentVolume}}. We'll update this to include dynamically reserved resources. h3. Expected Outcome * The dynamically reserved resources will be persisted on the slave.",5 MESOS-2497,"Create synchronous validations for Calls","/call endpoint will return a 202 accepted code but has to do some basic validations before. In case of invalidation it will return a 4xx code. We have to create a mechanism that will validate the 'request' and send back the appropriate code.",8 MESOS-2500,"Doxygen setup for libprocess","Goals: - Initial doxygen setup. - Enable interested developers to generate already available doxygen content locally in their workspace and view it. - Form the basis for future contributions of more doxygen content. 1. Devise a way to use Doxygen with Mesos source code. (For example, solve this by adding optional brew/apt-get installation to the ""Getting Started"" doc.) 2. Create a make target for libprocess documentation that can be manually triggered. 3. Create initial library top level documentation. 4. Enhance one header file with Doxygen. Make sure the generated output has all necessary links to navigate from the lib to the file and back, etc. ",2 MESOS-2501,"Doxygen style for libprocess","Create a description of the Doxygen style to use for libprocess documentation. It is expected that this will later also become the Doxygen style for stout and Mesos, but we are working on libprocess only for now. Possible outcome: a file named docs/doxygen-style.md We hope for much input and expect a lot of discussion! ",1 MESOS-2507,"Performance issue in the master when a large number of slaves are registering.","For large clusters, when a lot of slaves are registering, the master gets backlogged processing registration requests. {{perf}} revealed the following: {code} Events: 14K cycles 25.44% libmesos-0.22.0-x.so [.] mesos::internal::master::Master::registerSlave(process::UPID const&, mesos::SlaveInfo const&, std::vector > cons 11.18% libmesos-0.22.0-x.so [.] pipecb 5.88% libc-2.5.so [.] malloc_consolidate 5.33% libc-2.5.so [.] _int_free 5.25% libc-2.5.so [.] malloc 5.23% libc-2.5.so [.] _int_malloc 4.11% libstdc++.so.6.0.8 [.] std::string::assign(std::string const&) 3.22% libmesos-0.22.0-x.so [.] mesos::Resource::SharedDtor() 3.10% [kernel] [k] _raw_spin_lock 1.97% libmesos-0.22.0-x.so [.] mesos::Attribute::SharedDtor() 1.28% libc-2.5.so [.] memcmp 1.08% libc-2.5.so [.] free {code} This is likely because we loop over all the slaves for each registration: {code} void Master::registerSlave( const UPID& from, const SlaveInfo& slaveInfo, const vector& checkpointedResources, const string& version) { // ... // Check if this slave is already registered (because it retries). foreachvalue (Slave* slave, slaves.registered) { if (slave->pid == from) { // ... } } // ... } {code}",5 MESOS-2512,"FetcherTest.ExtractNotExecutable is flaky","Observed in our internal CI. {code} [ RUN ] FetcherTest.ExtractNotExecutable Using temporary directory '/tmp/FetcherTest_ExtractNotExecutable_R5R7Cn' tar: Removing leading `/' from member names I0316 18:55:48.509306 14678 fetcher.cpp:155] Starting to fetch URIs for container: de1e5165-82b4-434b-9149-8667cf652c64, directory: /tmp/FetcherTest_ExtractNotExecutable_R5R7Cn I0316 18:55:48.509845 14678 fetcher.cpp:238] Fetching URIs using command '/var/jenkins/workspace/mesos-fedora-20-gcc/src/mesos-fetcher' I0316 18:55:48.568611 15028 logging.cpp:177] Logging to STDERR I0316 18:55:48.574928 15028 fetcher.cpp:214] Fetching URI '/tmp/DIjmjV.tar.gz' I0316 18:55:48.575166 15028 fetcher.cpp:194] Copying resource from '/tmp/DIjmjV.tar.gz' to '/tmp/FetcherTest_ExtractNotExecutable_R5R7Cn' tar: This does not look like a tar archive tar: Exiting with failure status due to previous errors Failed to extract /tmp/FetcherTest_ExtractNotExecutable_R5R7Cn/DIjmjV.tar.gz:Failed to extract: command tar -C '/tmp/FetcherTest_ExtractNotExecutable_R5R7Cn' -xf '/tmp/FetcherTest_ExtractNotExecutable_R5R7Cn/DIjmjV.tar.gz' exited with status: 512 tests/fetcher_tests.cpp:686: Failure (fetch).failure(): Failed to fetch URIs for container 'de1e5165-82b4-434b-9149-8667cf652c64'with exit status: 256 [ FAILED ] FetcherTest.ExtractNotExecutable (208 ms) {code}",2 MESOS-2514,"Change the default leaf qdisc to fq_codel inside containers","When we enable bandwidth cap, htb is used on egress side inside containers, however, the default leaf qdisc for a htb class is still pfifo_fast, which is known to have buffer bloat. Change the default leaf qdisc to fq_codel too: `tc qd add dev eth0 parent 1:1 fq_codel` I can no longer see packet drops after this change.",1 MESOS-2519,"Log IP addresses from HTTP requests","Querying /master/state.json is an expensive operation when a cluster is large, and it's possible to DOS the master via frequent and repeated queries (which is a separate problem). Querying the endpoint results in a log entry being written, but the entry lacks useful information, such as an IP address, response code and response size. These details are useful for tracking down who/what is querying the endpoint. Consider adding these details to the log entry, or even writing a separate [access|https://httpd.apache.org/docs/trunk/logs.html#accesslog] [log|https://httpd.apache.org/docs/trunk/logs.html#common]. Also consider writing log entries for _all_ HTTP requests (/metrics/snapshot produces no log entries). {noformat:title=sample log entry} I0319 18:06:18.824846 10521 http.cpp:478] HTTP request for '/master/state.json' {noformat}",3 MESOS-2528,"Symlink the namespace handle with ContainerID for the port mapping isolator.","This serves two purposes: 1) Allows us to enter the network namespace using container ID (instead of pid): ""ip netns exec [commands] [args]"". 2) Allows us to get container ID for orphan containers during recovery. This will be helpful for solving MESOS-2367. The challenge here is to solve it in a backward compatible way. I propose to create symlinks under /var/run/netns. For example: /var/run/netns/containeridxxxx --> /var/run/netns/12345 (12345 is the pid) The old code will only remove the bind mounts and leave the symlinks, which I think is fine since containerid is globally unique (uuid).",3 MESOS-2533,"Support HTTP checks in Mesos health check program","Currently, only commands are supported but our health check protobuf enables users to encode HTTP checks as well. We should wire up this in the health check program or remove the http field from the protobuf.",8 MESOS-2534,"PerfTest.ROOT_SampleInit test fails.","From MESOS-2300 as well, it looks like this test is not reliable: {code} [ RUN ] PerfTest.ROOT_SampleInit ../../src/tests/perf_tests.cpp:147: Failure Expected: (0u) < (statistics.get().cycles()), actual: 0 vs 0 ../../src/tests/perf_tests.cpp:150: Failure Expected: (0.0) < (statistics.get().task_clock()), {code} It looks like this test samples PID 1, which is either {{init}} or {{systemd}}. Per a chat with [~idownes] this should probably sample something that is guaranteed to be consuming cycles.",2 MESOS-2538,"Remove unnecessary default flags from PortMappingMesosTest.","As all the explicitly set flags are defaults, we can remove them and simplify the code. MESOS-2375 removed other occurrences of these default flags.",1 MESOS-2545,"Developer guide for libprocess","Create a developer guide for libprocess that explains the philosophy behind it and explains the most important features as well as the prevalent use patterns in Mesos with examples. This could be similar to stout/README.md. ",2 MESOS-2547,"Cleanup stale bind mounts for port mapping isolator during slave recovery.","Leaked bind mount under /var/run/netns for port mapping isolator is a known issue. There are many ways it can get leaked. For example, if the slave crashes after creating the bind mount but before creating the veth, the bind mount will be leaked. Also, if the detached unmount does not finish in time and the subsequent os::rm fails, the bind mount will be leaked as well. Since leaked bind mount is inevitable, we need to clean them up during startup (slave recovery).",2 MESOS-2548,"new `make distcheck` failures inside a docker container","After the commits: {code} Change #21 Category None Changed by Jie Yu Changed at Wed 25 Mar 2015 00:12:14 Repository https://git-wip-us.apache.org/repos/asf/mesos.git Branch master Revision 6c6473febac40be1e01c9ab005cca20ad2a48e18 Comments Disallowed multiple cgroups base hierarchies in tests. Review: https://reviews.apache.org/r/32452 Changed files src/tests/mesos.cpp Change #22 Category None Changed by Jie Yu Changed at Wed 25 Mar 2015 00:15:37 Repository https://git-wip-us.apache.org/repos/asf/mesos.git Branch master Revision 212b88c4d20a89dcd9f319b3be984f5646a47499 Comments Allowed MesosContainerizer to take empty isolation flag. Review: https://reviews.apache.org/r/32467 {code} Numerous tests inside our internal CI started failing: {code} [ RUN ] SlaveRecoveryTest/0.RecoverSlaveState ../../src/tests/mesos.cpp:555: Failure Value of: _baseHierarchy Actual: ""/sys/fs/cgroup/cpu,"" Expected: baseHierarchy Which is: ""/sys/fs/cgroup/"" ------------------------------------------------------------- Multiple cgroups base hierarchies detected: '/sys/fs/cgroup/' '/sys/fs/cgroup/cpu,' Mesos does not support multiple cgroups base hierarchies. Please unmount the corresponding (or all) subsystems. ------------------------------------------------------------- [ FAILED ] SlaveRecoveryTest/0.RecoverSlaveState, where TypeParam = mesos::internal::slave::MesosContainerizer (26 ms) [ RUN ] SlaveRecoveryTest/0.RecoverStatusUpdateManager ../../src/tests/mesos.cpp:555: Failure Value of: _baseHierarchy Actual: ""/sys/fs/cgroup/cpu,"" Expected: baseHierarchy Which is: ""/sys/fs/cgroup/"" ------------------------------------------------------------- Multiple cgroups base hierarchies detected: '/sys/fs/cgroup/' '/sys/fs/cgroup/cpu,' Mesos does not support multiple cgroups base hierarchies. Please unmount the corresponding (or all) subsystems. ------------------------------------------------------------- [ FAILED ] SlaveRecoveryTest/0.RecoverStatusUpdateManager, where TypeParam = mesos::internal::slave::MesosContainerizer (26 ms) [ RUN ] SlaveRecoveryTest/0.ReconnectExecutor ../../src/tests/mesos.cpp:555: Failure Value of: _baseHierarchy Actual: ""/sys/fs/cgroup/cpu,"" Expected: baseHierarchy Which is: ""/sys/fs/cgroup/"" ------------------------------------------------------------- Multiple cgroups base hierarchies detected: '/sys/fs/cgroup/' '/sys/fs/cgroup/cpu,' Mesos does not support multiple cgroups base hierarchies. Please unmount the corresponding (or all) subsystems. ------------------------------------------------------------- [ FAILED ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer (26 ms) [ RUN ] SlaveRecoveryTest/0.RecoverUnregisteredExecutor ../../src/tests/mesos.cpp:555: Failure Value of: _baseHierarchy Actual: ""/sys/fs/cgroup/cpu,"" Expected: baseHierarchy Which is: ""/sys/fs/cgroup/"" ------------------------------------------------------------- Multiple cgroups base hierarchies detected: '/sys/fs/cgroup/' '/sys/fs/cgroup/cpu,' Mesos does not support multiple cgroups base hierarchies. Please unmount the corresponding (or all) subsystems. ------------------------------------------------------------- [ FAILED ] SlaveRecoveryTest/0.RecoverUnregisteredExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer (24 ms) [ RUN ] SlaveRecoveryTest/0.RecoverTerminatedExecutor ../../src/tests/mesos.cpp:555: Failure Value of: _baseHierarchy Actual: ""/sys/fs/cgroup/cpu,"" Expected: baseHierarchy Which is: ""/sys/fs/cgroup/"" ------------------------------------------------------------- Multiple cgroups base hierarchies detected: '/sys/fs/cgroup/' '/sys/fs/cgroup/cpu,' Mesos does not support multiple cgroups base hierarchies. Please unmount the corresponding (or all) subsystems. ------------------------------------------------------------- [ FAILED ] SlaveRecoveryTest/0.RecoverTerminatedExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer (24 ms) [ RUN ] SlaveRecoveryTest/0.RecoverCompletedExecutor ../../src/tests/mesos.cpp:555: Failure Value of: _baseHierarchy Actual: ""/sys/fs/cgroup/cpu,"" Expected: baseHierarchy Which is: ""/sys/fs/cgroup/"" ------------------------------------------------------------- Multiple cgroups base hierarchies detected: '/sys/fs/cgroup/' '/sys/fs/cgroup/cpu,' Mesos does not support multiple cgroups base hierarchies. Please unmount the corresponding (or all) subsystems. ------------------------------------------------------------- [ FAILED ] SlaveRecoveryTest/0.RecoverCompletedExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer (23 ms) [ RUN ] SlaveRecoveryTest/0.CleanupExecutor ../../src/tests/mesos.cpp:555: Failure Value of: _baseHierarchy Actual: ""/sys/fs/cgroup/cpu,"" Expected: baseHierarchy Which is: ""/sys/fs/cgroup/"" ------------------------------------------------------------- Multiple cgroups base hierarchies detected: '/sys/fs/cgroup/' '/sys/fs/cgroup/cpu,' Mesos does not support multiple cgroups base hierarchies. Please unmount the corresponding (or all) subsystems. ------------------------------------------------------------- [ FAILED ] SlaveRecoveryTest/0.CleanupExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer (24 ms) [ RUN ] SlaveRecoveryTest/0.RemoveNonCheckpointingFramework ../../src/tests/mesos.cpp:555: Failure Value of: _baseHierarchy Actual: ""/sys/fs/cgroup/cpu,"" Expected: baseHierarchy Which is: ""/sys/fs/cgroup/"" ------------------------------------------------------------- Multiple cgroups base hierarchies detected: '/sys/fs/cgroup/' '/sys/fs/cgroup/cpu,' Mesos does not support multiple cgroups base hierarchies. Please unmount the corresponding (or all) subsystems. ------------------------------------------------------------- [ FAILED ] SlaveRecoveryTest/0.RemoveNonCheckpointingFramework, where TypeParam = mesos::internal::slave::MesosContainerizer (25 ms) [ RUN ] SlaveRecoveryTest/0.NonCheckpointingFramework ../../src/tests/mesos.cpp:555: Failure Value of: _baseHierarchy Actual: ""/sys/fs/cgroup/cpu,"" Expected: baseHierarchy Which is: ""/sys/fs/cgroup/"" ------------------------------------------------------------- Multiple cgroups base hierarchies detected: '/sys/fs/cgroup/' '/sys/fs/cgroup/cpu,' Mesos does not support multiple cgroups base hierarchies. Please unmount the corresponding (or all) subsystems. ------------------------------------------------------------- [ FAILED ] SlaveRecoveryTest/0.NonCheckpointingFramework, where TypeParam = mesos::internal::slave::MesosContainerizer (24 ms) [ RUN ] SlaveRecoveryTest/0.KillTask ../../src/tests/mesos.cpp:555: Failure Value of: _baseHierarchy Actual: ""/sys/fs/cgroup/cpu,"" Expected: baseHierarchy Which is: ""/sys/fs/cgroup/"" ------------------------------------------------------------- Multiple cgroups base hierarchies detected: '/sys/fs/cgroup/' '/sys/fs/cgroup/cpu,' Mesos does not support multiple cgroups base hierarchies. Please unmount the corresponding (or all) subsystems. ------------------------------------------------------------- [ FAILED ] SlaveRecoveryTest/0.KillTask, where TypeParam = mesos::internal::slave::MesosContainerizer (24 ms) [ RUN ] SlaveRecoveryTest/0.Reboot 2015-03-25 00:32:56,830:40596(0x7f7cbf4f4700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:32810] zk retcode=-4, errno=111(Connection refused): server refused to accept the client ../../src/tests/mesos.cpp:555: Failure Value of: _baseHierarchy Actual: ""/sys/fs/cgroup/cpu,"" Expected: baseHierarchy Which is: ""/sys/fs/cgroup/"" ------------------------------------------------------------- Multiple cgroups base hierarchies detected: '/sys/fs/cgroup/' '/sys/fs/cgroup/cpu,' Mesos does not support multiple cgroups base hierarchies. Please unmount the corresponding (or all) subsystems. ------------------------------------------------------------- [ FAILED ] SlaveRecoveryTest/0.Reboot, where TypeParam = mesos::internal::slave::MesosContainerizer (24 ms) [ RUN ] SlaveRecoveryTest/0.GCExecutor ../../src/tests/mesos.cpp:555: Failure Value of: _baseHierarchy Actual: ""/sys/fs/cgroup/cpu,"" Expected: baseHierarchy Which is: ""/sys/fs/cgroup/"" ------------------------------------------------------------- Multiple cgroups base hierarchies detected: '/sys/fs/cgroup/' '/sys/fs/cgroup/cpu,' Mesos does not support multiple cgroups base hierarchies. Please unmount the corresponding (or all) subsystems. ------------------------------------------------------------- [ FAILED ] SlaveRecoveryTest/0.GCExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer (24 ms) [ RUN ] SlaveRecoveryTest/0.ShutdownSlave ../../src/tests/mesos.cpp:555: Failure Value of: _baseHierarchy Actual: ""/sys/fs/cgroup/cpu,"" Expected: baseHierarchy Which is: ""/sys/fs/cgroup/"" ------------------------------------------------------------- Multiple cgroups base hierarchies detected: '/sys/fs/cgroup/' '/sys/fs/cgroup/cpu,' Mesos does not support multiple cgroups base hierarchies. Please unmount the corresponding (or all) subsystems. ------------------------------------------------------------- [ FAILED ] SlaveRecoveryTest/0.ShutdownSlave, where TypeParam = mesos::internal::slave::MesosContainerizer (24 ms) [ RUN ] SlaveRecoveryTest/0.ShutdownSlaveSIGUSR1 ../../src/tests/mesos.cpp:555: Failure Value of: _baseHierarchy Actual: ""/sys/fs/cgroup/cpu,"" Expected: baseHierarchy Which is: ""/sys/fs/cgroup/"" ------------------------------------------------------------- Multiple cgroups base hierarchies detected: '/sys/fs/cgroup/' '/sys/fs/cgroup/cpu,' Mesos does not support multiple cgroups base hierarchies. Please unmount the corresponding (or all) subsystems. ------------------------------------------------------------- [ FAILED ] SlaveRecoveryTest/0.ShutdownSlaveSIGUSR1, where TypeParam = mesos::internal::slave::MesosContainerizer (24 ms) [ RUN ] SlaveRecoveryTest/0.RegisterDisconnectedSlave ../../src/tests/mesos.cpp:555: Failure Value of: _baseHierarchy Actual: ""/sys/fs/cgroup/cpu,"" Expected: baseHierarchy Which is: ""/sys/fs/cgroup/"" ------------------------------------------------------------- Multiple cgroups base hierarchies detected: '/sys/fs/cgroup/' '/sys/fs/cgroup/cpu,' Mesos does not support multiple cgroups base hierarchies. Please unmount the corresponding (or all) subsystems. ------------------------------------------------------------- [ FAILED ] SlaveRecoveryTest/0.RegisterDisconnectedSlave, where TypeParam = mesos::internal::slave::MesosContainerizer (25 ms) [ RUN ] SlaveRecoveryTest/0.ReconcileKillTask ../../src/tests/mesos.cpp:555: Failure Value of: _baseHierarchy Actual: ""/sys/fs/cgroup/cpu,"" Expected: baseHierarchy Which is: ""/sys/fs/cgroup/"" ------------------------------------------------------------- Multiple cgroups base hierarchies detected: '/sys/fs/cgroup/' '/sys/fs/cgroup/cpu,' Mesos does not support multiple cgroups base hierarchies. Please unmount the corresponding (or all) subsystems. ------------------------------------------------------------- [ FAILED ] SlaveRecoveryTest/0.ReconcileKillTask, where TypeParam = mesos::internal::slave::MesosContainerizer (24 ms) [ RUN ] SlaveRecoveryTest/0.ReconcileShutdownFramework ../../src/tests/mesos.cpp:555: Failure Value of: _baseHierarchy Actual: ""/sys/fs/cgroup/cpu,"" Expected: baseHierarchy Which is: ""/sys/fs/cgroup/"" ------------------------------------------------------------- Multiple cgroups base hierarchies detected: '/sys/fs/cgroup/' '/sys/fs/cgroup/cpu,' Mesos does not support multiple cgroups base hierarchies. Please unmount the corresponding (or all) subsystems. ------------------------------------------------------------- [ FAILED ] SlaveRecoveryTest/0.ReconcileShutdownFramework, where TypeParam = mesos::internal::slave::MesosContainerizer (23 ms) [ RUN ] SlaveRecoveryTest/0.ReconcileTasksMissingFromSlave ../../src/tests/mesos.cpp:555: Failure Value of: _baseHierarchy Actual: ""/sys/fs/cgroup/cpu,"" Expected: baseHierarchy Which is: ""/sys/fs/cgroup/"" ------------------------------------------------------------- Multiple cgroups base hierarchies detected: '/sys/fs/cgroup/' '/sys/fs/cgroup/cpu,' Mesos does not support multiple cgroups base hierarchies. Please unmount the corresponding (or all) subsystems. ------------------------------------------------------------- [ FAILED ] SlaveRecoveryTest/0.ReconcileTasksMissingFromSlave, where TypeParam = mesos::internal::slave::MesosContainerizer (25 ms) [ RUN ] SlaveRecoveryTest/0.SchedulerFailover ../../src/tests/mesos.cpp:555: Failure Value of: _baseHierarchy Actual: ""/sys/fs/cgroup/cpu,"" Expected: baseHierarchy Which is: ""/sys/fs/cgroup/"" ------------------------------------------------------------- Multiple cgroups base hierarchies detected: '/sys/fs/cgroup/' '/sys/fs/cgroup/cpu,' Mesos does not support multiple cgroups base hierarchies. Please unmount the corresponding (or all) subsystems. ------------------------------------------------------------- [ FAILED ] SlaveRecoveryTest/0.SchedulerFailover, where TypeParam = mesos::internal::slave::MesosContainerizer (26 ms) [ RUN ] SlaveRecoveryTest/0.PartitionedSlave ../../src/tests/mesos.cpp:555: Failure Value of: _baseHierarchy Actual: ""/sys/fs/cgroup/cpu,"" Expected: baseHierarchy Which is: ""/sys/fs/cgroup/"" ------------------------------------------------------------- Multiple cgroups base hierarchies detected: '/sys/fs/cgroup/' '/sys/fs/cgroup/cpu,' Mesos does not support multiple cgroups base hierarchies. Please unmount the corresponding (or all) subsystems. ------------------------------------------------------------- [ FAILED ] SlaveRecoveryTest/0.PartitionedSlave, where TypeParam = mesos::internal::slave::MesosContainerizer (26 ms) [ RUN ] SlaveRecoveryTest/0.MasterFailover ../../src/tests/mesos.cpp:555: Failure Value of: _baseHierarchy Actual: ""/sys/fs/cgroup/cpu,"" Expected: baseHierarchy Which is: ""/sys/fs/cgroup/"" ------------------------------------------------------------- Multiple cgroups base hierarchies detected: '/sys/fs/cgroup/' '/sys/fs/cgroup/cpu,' Mesos does not support multiple cgroups base hierarchies. Please unmount the corresponding (or all) subsystems. ------------------------------------------------------------- [ FAILED ] SlaveRecoveryTest/0.MasterFailover, where TypeParam = mesos::internal::slave::MesosContainerizer (26 ms) [ RUN ] SlaveRecoveryTest/0.MultipleFrameworks ../../src/tests/mesos.cpp:555: Failure Value of: _baseHierarchy Actual: ""/sys/fs/cgroup/cpu,"" Expected: baseHierarchy Which is: ""/sys/fs/cgroup/"" ------------------------------------------------------------- Multiple cgroups base hierarchies detected: '/sys/fs/cgroup/' '/sys/fs/cgroup/cpu,' Mesos does not support multiple cgroups base hierarchies. Please unmount the corresponding (or all) subsystems. ------------------------------------------------------------- [ FAILED ] SlaveRecoveryTest/0.MultipleFrameworks, where TypeParam = mesos::internal::slave::MesosContainerizer (26 ms) [ RUN ] SlaveRecoveryTest/0.MultipleSlaves ../../src/tests/mesos.cpp:555: Failure Value of: _baseHierarchy Actual: ""/sys/fs/cgroup/cpu,"" Expected: baseHierarchy Which is: ""/sys/fs/cgroup/"" ------------------------------------------------------------- Multiple cgroups base hierarchies detected: '/sys/fs/cgroup/' '/sys/fs/cgroup/cpu,' Mesos does not support multiple cgroups base hierarchies. Please unmount the corresponding (or all) subsystems. ------------------------------------------------------------- [ FAILED ] SlaveRecoveryTest/0.MultipleSlaves, where TypeParam = mesos::internal::slave::MesosContainerizer (26 ms) [ RUN ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch ../../src/tests/mesos.cpp:555: Failure Value of: _baseHierarchy Actual: ""/sys/fs/cgroup/cpu,"" Expected: baseHierarchy Which is: ""/sys/fs/cgroup/"" ------------------------------------------------------------- Multiple cgroups base hierarchies detected: '/sys/fs/cgroup/' '/sys/fs/cgroup/cpu,' Mesos does not support multiple cgroups base hierarchies. Please unmount the corresponding (or all) subsystems. ------------------------------------------------------------- [ FAILED ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch, where TypeParam = mesos::internal::slave::MesosContainerizer (26 ms) [----------] 24 tests from SlaveRecoveryTest/0 (596 ms total) [----------] 4 tests from MesosContainerizerSlaveRecoveryTest [ RUN ] MesosContainerizerSlaveRecoveryTest.ResourceStatistics ../../src/tests/mesos.cpp:555: Failure Value of: _baseHierarchy Actual: ""/sys/fs/cgroup/cpu,"" Expected: baseHierarchy Which is: ""/sys/fs/cgroup/"" ------------------------------------------------------------- Multiple cgroups base hierarchies detected: '/sys/fs/cgroup/' '/sys/fs/cgroup/cpu,' Mesos does not support multiple cgroups base hierarchies. Please unmount the corresponding (or all) subsystems. ------------------------------------------------------------- [ FAILED ] MesosContainerizerSlaveRecoveryTest.ResourceStatistics (25 ms) [ RUN ] MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PerfRollForward ../../src/tests/mesos.cpp:555: Failure Value of: _baseHierarchy Actual: ""/sys/fs/cgroup/cpu,"" Expected: baseHierarchy Which is: ""/sys/fs/cgroup/"" ------------------------------------------------------------- Multiple cgroups base hierarchies detected: '/sys/fs/cgroup/' '/sys/fs/cgroup/cpu,' Mesos does not support multiple cgroups base hierarchies. Please unmount the corresponding (or all) subsystems. ------------------------------------------------------------- [ FAILED ] MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PerfRollForward (24 ms) [ RUN ] MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceForward ../../src/tests/mesos.cpp:555: Failure Value of: _baseHierarchy Actual: ""/sys/fs/cgroup/cpu,"" Expected: baseHierarchy Which is: ""/sys/fs/cgroup/"" ------------------------------------------------------------- Multiple cgroups base hierarchies detected: '/sys/fs/cgroup/' '/sys/fs/cgroup/cpu,' Mesos does not support multiple cgroups base hierarchies. Please unmount the corresponding (or all) subsystems. ------------------------------------------------------------- [ FAILED ] MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceForward (25 ms) [ RUN ] MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceBackward ../../src/tests/mesos.cpp:555: Failure Value of: _baseHierarchy Actual: ""/sys/fs/cgroup/cpu,"" Expected: baseHierarchy Which is: ""/sys/fs/cgroup/"" ------------------------------------------------------------- Multiple cgroups base hierarchies detected: '/sys/fs/cgroup/' '/sys/fs/cgroup/cpu,' Mesos does not support multiple cgroups base hierarchies. Please unmount the corresponding (or all) subsystems. ------------------------------------------------------------- [ FAILED ] MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceBackward (24 ms) [----------] 4 tests from MesosContainerizerSlaveRecoveryTest (98 ms total) {code} {code} [ FAILED ] 28 tests, listed below: [ FAILED ] SlaveRecoveryTest/0.RecoverSlaveState, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.RecoverStatusUpdateManager, wh...",1 MESOS-2551,"C++ Scheduler library should send Call messages to Master","Currently, the C++ library sends different messages to Master instead of a single Call message. To vet the new Call API it should send Call messages. Master should be updated to handle all types of Calls.",8 MESOS-2552,"C++ Scheduler library should send HTTP Calls to master","Once the scheduler library sends Call messages, we should update it to send Calls as HTTP requests to ""/call"" endpoint on master.",3 MESOS-2555,"Document issue with slave recovery when using systemd.","As the problem encountered in MESOS-2419 is a common problem with the default systemd configuration it would make sense to document this in the upgrade guide or somewhere else in the documentation.",1 MESOS-2559,"Do not use RunTaskMessage.framework_id.","Assume that FrameworkInfo.id is always set and so need to read/set RunTaskMessage.framework_id. This should land after https://issues.apache.org/jira/browse/MESOS-2558 has been shipped.",1 MESOS-2562,"0.24.0 release","The main feature of this release is going to be v1 (beta) release of the HTTP scheduler API (part of MESOS-2288 epic). Unresolved issues tracker: https://issues.apache.org/jira/issues/?jql=project%20%3D%20MESOS%20AND%20status%20!%3D%20Resolved%20AND%20%22Target%20Version%2Fs%22%20%3D%200.24.0%20ORDER%20BY%20status%20DESC",5 MESOS-2571,"Expose Memory Pressure in MemIsolator",NULL,3 MESOS-2572,"Add memory statistics tests.",NULL,5 MESOS-2573,"Use Memory Test Helper to improve some test code.",NULL,2 MESOS-2574,"Namespace handle symlinks in port_mapping isolator should not be under /var/run/netns","Consider putting symlinks under /var/run/messo/netns. This is because 'ip' command assumes all files under /var/run/netns are valid namespaces without duplication and it has command like: ip -all netns exec ip link to list all links for each network namespace.",3 MESOS-2578,"Add '{' on newline for function declarations in style checker","Similar to MESOS-2577; another common style mistake is to not move curly braces on a newline for function and class declarations: {code} class Foo { void bar() { ... } }; {code} vs {code} class Foo { void bar() { ... } }; {code} This should be easy to check with our style checker too.",1 MESOS-2579,"0.22.1 release",NULL,1 MESOS-2581,"Document tips, best practices, guidelines for doing code reviews.","We currently have a [""Committers Guide""|https://github.com/apache/mesos/blob/0.22.0/docs/committers-guide.md], however most of this information is relevant to all contributors looking to be participating in the code review process. I'm proposing we extract much of this information into a more general ""Code Reviewing"" document, and include additional tips, best practices, lessons learned from members of the community. This would be a great pre-requisite for on-boarding more committers and adding [MAINTAINERS|http://mail-archives.apache.org/mod_mbox/mesos-dev/201502.mbox/%3CCA+8RcoReugMVqoOpsnB8WGYBELa5fHwPA=J=YHJE22iwZvsbeQ@mail.gmail.com%3E]. The committers guide can be more specific to our expectations of committers, so we may want to make this into a ""committership"" document to help set expectations for contributors looking to become committers.",3 MESOS-2582,"Create optional release step: update PyPi repositories","One of the build artifacts for a release is the python package `mesos.interface`. That needs to be uploaded to PyPi along with a release to allow for users of python frameworks to use that version of mesos.",2 MESOS-2590,"Let the slave control the duration of the perf sampler instead of relying on a sleep command.","Right now, we use a sleep command to control the duration of perf sampling: {noformat} sudo perf stat -a -x, --log-fd 1 --pid 10940 -- sleep 10 {noformat} This causes an additional process (i.e., the sleep process) to be forked and causes troubles for us to terminate the perf sampler once the slave exits (See MESOS-2462). Seems that the additional sleep process is not necessary. The slave can just monitor the duration and send a SIGINT to the perf process when duration elapsed. This will cause the perf process to output the stats and terminate.",3 MESOS-2591,"Refactor launchHelper and statisticsHelper in port_mapping_tests to allow reuse","Refactor launchHelper and statisticsHelper in port_mapping_tests to allow reuse",2 MESOS-2595,"Create docker executor","Currently we're reusing the command executor to wait on the progress of the docker executor, but has the following drawback: - We need to launch a seperate docker log process just to forward logs, where we can just simply reattach stdout/stderr if we create a specific executor for docker - In general, Mesos slave is assuming that the executor is the one starting the actual task. But the current docker containerizer, the containerizer is actually starting the docker container first then launches the command executor to wait on it. This can cause problems if the container failed before the command executor was able to launch, as slave will try to update the limits of the containerizer on executor registration but then the docker containerizer will fail to do so since the container failed. Overall it's much simpler to tie the container lifecycle with the executor and simplfies logic and log management.",8 MESOS-2596,"Update allocator docs","Once Allocator interface changes, so does the way of writing new allocators. This should be reflected in Mesos docs. The modules doc should mention how to write and use allocator modules. Configuration doc should mention the new {{--allocator}} flag.",2 MESOS-2598,"Slave state.json frameworks.executors.queued_tasks wrong format?","queued_tasks.executor_id is expected to be a string and not a complete json object. It should have the very same format as the tasks array on the same level. Example, directly taken from slave {noformat} .... ""queued_tasks"": [ { ""data"": """", ""executor_id"": { ""command"": { ""argv"": [], ""uris"": [ { ""executable"": false, ""value"": ""http://downloads.foo.io/orchestra/storm-mesos/0.9.2-incubating-47-ovh.bb373df1c/storm-mesos-0.9.2-incubating.tgz"" } ], ""value"": ""cd storm-mesos* && python bin/storm supervisor storm.mesos.MesosSupervisor"" }, ""data"": ""{\""assignmentid\"":\""srv4.hw.ca1.foo.com\"",\""supervisorid\"":\""srv4.hw.ca1.foo.com-stage-ingestion-stats-slave-111-1428421145\""}"", ""executor_id"": ""stage-ingestion-stats-slave-111-1428421145"", ""framework_id"": ""20150401-160104-251662508-5050-2197-0002"", ""name"": """", ""resources"": { ""cpus"": 0.5, ""disk"": 0, ""mem"": 1000 } }, ""id"": ""srv4.hw.ca1.foo.com-31708"", ""name"": ""worker srv4.hw.ca1.foo.com:31708"", ""resources"": { ""cpus"": 1, ""disk"": 0, ""mem"": 5120, ""ports"": ""[31708-31708]"" }, ""slave_id"": ""20150327-025553-218108076-5050-4122-S0"" }, ... ] {noformat}",3 MESOS-2600,"Add /reserve and /unreserve endpoints on the master for dynamic reservation","Enable operators to manage dynamic reservations by Introducing the {{/reserve}} and {{/unreserve}} HTTP endpoints on the master.",5 MESOS-2607,"Notify dev / user mailing list of the upcoming mem stat renames in 0.23.0 ",NULL,2 MESOS-2613,"Change docker rm command","Right now it seems Mesos is using „docker rm –f ID“ to delete containers so bind mounts are not deleted. This means thousands of dirs in /var/lib/docker/vfs/dir I would like to have the option to change it to „docker rm –f –v ID“ This deletes bind mounts but not persistant volumes. Best, Mike",2 MESOS-2615,"Pipe 'updateFramework' path from master to Allocator to support framework re-registration","Pipe the 'updateFramework' call from the master through the allocator, as described in the design doc in the epic: MESOS-703",1 MESOS-2622,"Document the semantic change in decorator return values","In order to enable decorator modules to _remove_ metadata (environment variables or labels), we changed the meaning of the return value for decorator hooks. The Result return values means: ||State||Before||After|| |Error|Error is propagated to the call-site|No change| |None|The result of the decorator is not applied|No change| |Some|The result of the decorator is *appended*|The result of the decorator *overwrites* the final labels/environment object|",1 MESOS-2627,"ExamplesTest.PersistentVolumeFramework is flaky","This just failed for the first time on our OS X Bot (Far less frequent flaky than the other ExamplesTest, but still flaky) while compiling master at commit f6620f851f635b3346c6ebf878152f38b3932ad9. There weren't any commits which touched / changed anything in the test in the set. {code} [ RUN ] ExamplesTest.PersistentVolumeFramework ../../src/tests/script.cpp:83: Failure Failed persistent_volume_framework_test.sh terminated with signal Abort trap: 6 [ FAILED ] ExamplesTest.PersistentVolumeFramework (7865 ms) {code}",1 MESOS-2629,"Update style guide to disallow capture by reference of temporaries","We modify the style guide to disallow constant references to temporaries as a whole. This means disallowing both (1) and (2) below. h3. Background 1. Constant references to simple expression temporaries do extend the lifetime of the temporary till end of function scope: * Temporary returned by function: {code} // See full example below. T f(const char* s) { return T(s); } { const T& good = f(""Ok""); // use of good is ok. } {code} * Temporary constructed as simple expression: {code} // See full example below. { const T& good = T(""Ok""); // use of good is ok. } {code} 2. Constant references to expressions that result in a reference to a temporary do not extend the lifetime of the temporary: * Temporary returned by function: {code} // See full example below. T f(const char* s) { return T(s); } { const T& bad = f(""Bad!"").Member(); // use of bad is invalid. } {code} * Temporary constructed as simple expression: {code} // See full example below. { const T& bad = T(""Bad!"").Member(); // use of bad is invalid. } {code} h3. Mesos Case - In Mesos we use Future a lot. Many of our functions return Futures by value: {code} class Socket { Future accept(); Future recv(char* data, size_t size); ... } {code} - Sometimes we capture these Futures: {code} { const Future& accepted = socket.accept(); // Valid c++, propose we disallow. } {code} - Sometimes we chain these Futures: {code} { socket.accept().then(lambda::bind(_accepted)); // Temporary will be valid during 'then' expression evaluation. } {code} - Sometimes we do both: {code} { const Future& accepted = socket.accept().then(lambda::bind(_accepted)); // Dangerous! 'accepted' lifetime will not be valid till end of scope. Disallow! } {code} h3. Reasoning - Although (1) is ok, and considered a [feature|http://herbsutter.com/2008/01/01/gotw-88-a-candidate-for-the-most-important-const/], (2) is extremely dangerous and leads to hard to track bugs. - If we explicitly allow (1), but disallow (2), then my worry is that someone coming along to maintain the code later on may accidentally turn (1) into (2), without recognizing the severity of this mistake. For example: {code} // Original code: const T& val = T(); std::cout << val << std::endl; // New code: const T& val = T().removeWhiteSpace(); std::cout << val << std::endl; // val could be corrupted since the destructor has been invoked and T's memory freed. {code} - If we disallow both cases: it will be easier to catch these mistakes early on in code reviews (and avoid these painful bugs), at the same cost of introducing a new style guide rule. h3. Performance Implications - BenH suggests c++ developers are commonly taught to capture by constant reference to hint to the compiler that the copy can be elided. - Modern compilers use a Data Flow Graph to make optimizations such as - *In-place-construction*: leveraged by RVO and NRVO to construct the object in place on the stack. Similar to ""*Placement new*"": http://en.wikipedia.org/wiki/Placement_syntax - *RVO* (Return Value Optimization): http://en.wikipedia.org/wiki/Return_value_optimization - *NRVO* (Named Return Value Optimization): https://msdn.microsoft.com/en-us/library/ms364057%28v=vs.80%29.aspx - Since modern compilers perform these optimizations, we no longer need to 'hint' to the compiler that the copies can be elided. h3. Example program {code} #include class T { public: T(const char* str) : Str(str) { printf(""+ T(%s)\n"", Str); } ~T() { printf(""- T(%s)\n"", Str); } const T& Member() const { return *this; } private: const char* Str; }; T f(const char* s) { return T(s); } int main() { const T& good = T(""Ok""); const T& good_f = f(""Ok function""); const T& bad = T(""Bad!"").Member(); const T& bad_f = T(""Bad function!"").Member(); printf(""End of function scope...\n""); } {code} Output: {code} + T(Ok) + T(Ok function) + T(Bad!) - T(Bad!) + T(Bad function!) - T(Bad function!) End of function scope... - T(Ok function) - T(Ok) {code}",1 MESOS-2630,"Remove capture by reference of temporaries in Stout",NULL,1 MESOS-2631,"Remove capture by reference of temporaries in libprocess",NULL,1 MESOS-2633,"Move implementations of Framework struct functions out of master.hpp","To help reduce compile time and keep the header easy to read, let's move the implementations of the Framework struct functions out of master.hpp",1 MESOS-2636,"Segfault in inline Try getIP(const std::string& hostname, int family)","We saw a segfault in production. Attaching the coredump, we see: Core was generated by `/usr/local/sbin/mesos-slave --port=5051 --resources=cpus:23;mem:70298;ports:[31'. Program terminated with signal 11, Segmentation fault. #0 0x00007f639867c77e in free () from /lib64/libc.so.6 (gdb) bt #0 0x00007f639867c77e in free () from /lib64/libc.so.6 #1 0x00007f63986c25d0 in freeaddrinfo () from /lib64/libc.so.6 #2 0x00007f6399deeafa in net::getIP (hostname="""", family=2) at ./3rdparty/stout/include/stout/net.hpp:201 #3 0x00007f6399e1f273 in process::initialize (delegate=Unhandled dwarf expression opcode 0xf3 ) at src/process.cpp:837 #4 0x000000000042342f in main ()",1 MESOS-2637,"Consolidate 'foo', 'bar', ... string constants in test and example code","We are using 'foo', 'bar', ... string constants and pairs in src/tests/master_tests.cpp, src/tests/slave_tests.cpp, src/tests/hook_tests.cpp and src/examples/test_hook_module.cpp for label and hooks tests. These values should be stored in local variables to avoid the possibility of assignment getting out of sync with checking for that same value.",2 MESOS-2645,"Design doc for resource oversubscription",NULL,13 MESOS-2646,"Update Master to send revocable resources in separate offers","Master will send separate offers for revocable and non-revocable/regular resources. This allows master to rescind revocable offers (e.g, when a new oversubscribed resources estimate comes from the slave) without impacting regular offers.",3 MESOS-2647,"Slave should validate tasks using oversubscribed resources","The latest oversubscribed resource estimate might render a revocable task launch invalid. Slave should check this and send TASK_LOST with appropriate REASON. We need to add a new REASON for this (REASON_RESOURCE_OVERSUBSCRIBED?).",5 MESOS-2648,"Update Resource Monitor to return resource usage","Add usage() API call to return usage of all containers",3 MESOS-2649,"Implement Resource Estimator","Resource estimator is the component in the slave that estimates the amount of oversubscribable resources. This needs to be integrated with the slave and resource monitor.",5 MESOS-2650,"Modularize the Resource Estimator","Modularizing the resource estimator opens up the door for org specific implementations. Test the estimator module.",3 MESOS-2651,"Implement QoS controller","This is a component of the slave that informs the slave about the possible ""corrections"" that need to be performed (e.g., shutdown container using recoverable resources). This needs to be integrated with the resource monitor. Need to figure out the metrics used for sending corrections (e.g., scheduling latency, usage, informed by executor/scheduler) We also need to figure out the feedback loop between the QoS controller and the Resource Estimator. {code} class QoSController { public: QoSController(ResourceMonitor* monitor); process::Queue correction(); }; {code} ",3 MESOS-2652,"Update Mesos containerizer to understand revocable cpu resources","The CPU isolator needs to properly set limits for revocable and non-revocable containers. The proposed strategy is to use a two-way split of the cpu cgroup hierarchy -- normal (non-revocable) and low priority (revocable) subtrees -- and to use a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split (TBD). Containers would be present in only one of the subtrees. CFS quotas will *not* be set on subtree roots, only cpu.shares. Each container would set CFS quota and shares as done currently. ",5 MESOS-2653,"Slave should act on correction events from QoS controller","Slave might want to kill revocable tasks based on correction events from the QoS controller. The QoS controller communicates corrections through a stream (or process::Queue) to the slave which corrections it needs to carry out, in order to mitigate interference with production tasks. The correction is communicated through a message: [code] message QoSCorrection { enum CorrectionType { KillExecutor = 1 // KillTask = 2 // Resize, throttle task } optional string reason = X; optional ExecutorID executor_id = X; // optional TaskID task_id = X; } [/code] And the slave will setup a handler to process these events. Initially, only executor termination is supported and cause the slave to issue 'containerizer->destroy()'.",8 MESOS-2654,"Update FrameworkInfo to opt in to revocable resources","Add a new field to FrameworkInfo that lets the frameworks explicitly choose revocable offers (for backwards compatibility).",1 MESOS-2655,"Implement a stand alone test framework that uses revocable cpu resources","Ideally this would be an example framework (or stand alone binary like load generator framework) that helps us evaluate oversubscription in a real cluster. We need to come up with metrics that need to be exposed by this framework for evaluation (e.g., how many revocable offers, rescinds, preemptions etc).",5 MESOS-2660,"ROOT_CGROUPS_Listen and ROOT_IncreaseRSS tests are flaky","[==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from CgroupsAnyHierarchyWithCpuMemoryTest [ RUN ] CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Listen Failed to allocate RSS memory: Failed to lock memory, mlock: Resource temporarily unavailable../../../mesos/src/tests/cgroups_tests.cpp:571: Failure Failed to wait 15secs for future [ FAILED ] CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Listen (15121 ms) [----------] 1 test from CgroupsAnyHierarchyWithCpuMemoryTest (15121 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test case ran. (15174 ms total) [ PASSED ] 0 tests. [ FAILED ] 1 test, listed below: [ FAILED ] CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Listen",3 MESOS-2665,"Fix queuing discipline wrapper in linux/routing/queueing ","qdisc search function is dependent on matching a single hard coded handle and does not correctly test for interface, making the implementation fragile. Additionally, the current setup scripts (using dynamically created shell commands) do not match the hard coded handles. ",5 MESOS-2671,"Port mapping isolator causes SIGABRT during slave recovery.","There is a bug in the code. If there are namespaces created by other party (say ip netns), the slave recovery will abort.",1 MESOS-2672,"ContainerizerTest.ROOT_CGROUPS_BalloonFramework flaky","{noformat} I0429 00:58:35.267629 2086 slave.cpp:3210] Executor 'default' of framework 20150429-005830-16777343-5432-2023-0000 terminated with signal Aborted I0429 00:58:35.270761 2086 slave.cpp:2512] Handling status update TASK_LOST (UUID: f969e350-6f91-4fa9-980e-1852554bd704) for task 1 of framework 201 50429-005830-16777343-5432-2023-0000 from @0.0.0.0:0 I0429 00:58:35.270983 2086 slave.cpp:4604] Terminating task 1 W0429 00:58:35.271574 2080 containerizer.cpp:903] Ignoring update for unknown container: 1298549a-a3d2-46ff-aad0-9dbc777affcc I0429 00:58:35.272541 2074 status_update_manager.cpp:317] Received status update TASK_LOST (UUID: f969e350-6f91-4fa9-980e-1852554bd704) for task 1 o f framework 20150429-005830-16777343-5432-2023-0000 I0429 00:58:35.272624 2074 status_update_manager.cpp:494] Creating StatusUpdate stream for task 1 of framework 20150429-005830-16777343-5432-2023-00 00 I0429 00:58:35.273217 2053 master.cpp:3493] Executor default of framework 20150429-005830-16777343-5432-2023-0000 on slave 20150429-005830-16777343- 5432-2023-S0 at slave(1)@10.35.12.124:5051 (smfd-aki-27-sr1.devel.twitter.com): terminated with signal Aborted {noformat} which is from {code} 60 // We use mlock and memset here to make sure that the memory 61 // actually gets paged in and thus accounted for. 62 if (mlock(buffer, chunk) != 0) { 63 perror(""Failed to lock memory, mlock""); 64 abort(); 65 } 66 67 if (memset(buffer, 1, chunk) != buffer) { 68 perror(""Failed to fill memory, memset""); 69 abort(); 70 } {code} This is the same as MESOS-2660: I've confirmed that swapping them fixed it. ",1 MESOS-2673,"Follow Google Style Guide for header file include order completely.","The header include order for Mesos actually follows the Google Styleguide but omits step 1 without mentioning this exception in the Mesos styleguide. This proposal suggests to adapt to the include order explained in the Google Styleguide i.e. include the direct headers first in the .cpp files implementing them. A gist of the proposal can be found here: https://gist.github.com/joerg84/65cb9611d24b2e35b69b The corresponding Review Board review can be found here: https://reviews.apache.org/r/33646/ ",5 MESOS-2680,"Update modules doc with hook usage example","Modules doc states the possibility of using hooks, but doesn't refer to necessary flags and usage example.",1 MESOS-2687,"Add a slave flag to enable oversubscription","Slave sends oversubscribable resources to master only when the flag is enabled.",2 MESOS-2688,"Slave should kill revocable tasks if oversubscription is disabled","If oversubscription is disabled on a restarted slave (that had it previously enabled), it should kill revocable tasks. Slave knows this information from the Resources of a container that it checkpoints and recovers. Add a new reason OVERSUBSCRIPTION_DISABLED.",3 MESOS-2689,"Slave should forward oversubscribable resources to the master","Slave simply forwards resource estimates from ResourceEstimator to the master. Use a new message and handler on the master. A slave flag for the interval between the messages. ",5 MESOS-2691,"Update Resource message to include revocable resources","Need to update Resource message with a new subtype that indicates that the resource is revocable. It might also need to specify ""why"" it is revocable (e.g., oversubscribed). Also need to make sure all the operations on Resource(s) takes this new message into account.",3 MESOS-2693,"Printing a resource should show information about reservation, disk etc","While new fields like DiskInfo and ReservationInfo have been added to Resource protobuf, the output stream operator hasn't been updated to show these. This is valuable information to have in the logs during debugging.",1 MESOS-2695,"Add master flag to enable/disable oversubscription","This flag lets an operator control cluster level oversubscription. The master should send revocable offers to framework if this flag is enabled and the framework opts in to receive them. Master should ignore revocable resources from slaves if the flag is disabled. Need tests for all these scenarios.",5 MESOS-2696,"Explore exposing stats from kernel","Exploratory work. Additional tickets to follow.",5 MESOS-2697,"Add a /teardown endpoint on master to teardown a framework","We plan to rename ""/shutdown"" endpoint to ""/teardown"" to be compatible with the new API. ""/shutdown"" will be deprecated in 0.23.0 or later.",2 MESOS-2700,"Determine CFS behavior with biased cpu.shares subtrees","See this [ticket|https://issues.apache.org/jira/browse/MESOS-2652] for context. * Understand the relationship between cpu.shares and CFS quota. * Determine range of possible bias splits * Determine how to achieve bias, e.g., should 20:1 be 20480:1024 or ~1024:50 * Rigorous testing of behavior with varying loads, particularly the combination of latency sensitive loads for high biased tasks (non-revokable), and cpu intensive loads for the low biased tasks (revokable). * Discover any performance edge cases?",13 MESOS-2701,"Implement bi-level cpu.shares subtrees in cgroups/cpu isolator.","See this [ticket|https://issues.apache.org/jira/browse/MESOS-2652] for context. # Configurable bias # Change cgroup layout ** Implement roll-forward migration path in isolator recover ** Document roll-back migration path",8 MESOS-2702,"Compare split/flattened cgroup hierarchy for CPU oversubscription","Investigate if a flat hierarchy is sufficient for oversubscription of CPU or if a two-way split is necessary/preferred.",3 MESOS-2703,"Modularize the QoS Controller","Modularize the QoS controller to enable custom correction policies",3 MESOS-2704,"Add tests for QoS controller corrections",NULL,5 MESOS-2705,"Add correct format template declarations to the styleguide","The general rule to format templates is to declare them as: {code} template // notice the space between template and < class Foo { … }; {code} However, the style is not documented anywhere nor it is inherited from the Google style guide.",1 MESOS-2707,"Incorrect zh:// URI scheme causes Slave to SegFault","I have 4 slave nodes with the same hardware, operating system and mesos configuration. Few minutes ago, all 4 nodes were functioning well. I tried to change the config of *master* from _10.172.230.69:5050_ to _zh://10.172.230.69:2181/mesos_ and restarted them in turn. The other three had started normally but the last one got a segmentation fault as you can see below. {code} [root@iZ25to7d407Z ~]# mesos-slave --master=zh://10.172.230.69:2181/mesos --hostname=123.57.42.237 --containerizers=docker,mesos --quiet & [1] 1216 [root@iZ25to7d407Z ~]# *** Aborted at 1431085131 (unix time) try ""date -d @1431085131"" if you are using GNU date *** PC: @ 0x3aede7b53c (unknown) *** SIGSEGV (@0x0) received by PID 1216 (TID 0x7f12f984b820) from PID 0; stack trace: *** @ 0x3aee20f710 (unknown) @ 0x3aede7b53c (unknown) @ 0x3aedecf630 (unknown) @ 0x7f12fce1593f net::getIP() @ 0x7f12fce507ae process::operator>>() @ 0x7f12fce50107 process::UPID::UPID() @ 0x7f12fc52af71 mesos::internal::MasterDetector::create() @ 0x4b1290 main @ 0x3aede1ed5d (unknown) @ 0x4b00b9 (unknown) [1]+ Segmentation fault mesos-slave --master=zh://10.172.230.69:2181/mesos --hostname=123.57.42.237 --containerizers=docker,mesos --quiet {code}",2 MESOS-2708,"Design doc for the Executor HTTP API","This tracks the design of the Executor HTTP API. ",2 MESOS-2709,"Design Master discovery functionality for HTTP-only clients","When building clients that do not bind to {{libmesos}} and only use the HTTP API (via ""pure"" language bindings - eg, Java-only) there is no simple way to discover the Master's IP address to connect to. Rather than relying on 'out-of-band' configuration mechanisms, we would like to enable the ability of interrogating the ZooKeeper ensemble to discover the Master's IP address (and, possibly, other information) to which the HTTP API requests can be addressed to.",3 MESOS-2719,"Deprecating '.json' extension in master endpoints urls","Add an endpoint for each master endpoint with a '.json' extension such as `/master/stats.json` so it becomes `/master/stats` after a deprecation cycle.",1 MESOS-2720,"Implement protobufs for master operator endpoints","We should define protobufs for master operator endpoints so as to provide a structure we can refer to for each possible return from an endpoint. ",2 MESOS-2721,"Architecture document for per-container IP assignment, enforcement and isolation","There are many ways in which we can go around wiring up per-container IPs in Mesos. As there are multiple underlying mechanisms and systems for keeping track of IP pools, we probably need to aim for a very flexible architecture, similar to the oversubscription project. There are a couple of folks, companies and vendors interested in getting this capability into Mesos asap to provide a stronger networking story (https://www.mail-archive.com/dev@mesos.apache.org/msg32353.html). So let's start discussing and architecting this.",13 MESOS-2722,"Create access to the Mesos ""state abstraction"" that does not require linking with libmesos","See ""src/state/state.hpp"" and ""src/java/src/org/apache/mesos/state/*.java"" for what the ""state abstraction"" is. With the new HTTP API (see MESOS-2288, MESOS-2289), there will be no need to link to libmesos to a framework for it to communicate with a Mesos master. However, if a framework uses the Mesos ""state abstraction"", either directly in C++ or through other language bindings (e.g., Java), it still needs to link with libmesos. So, in order to achieve libmesos-free frameworks that can leverage all APIs Mesos has to offer, we need a different way to access the ""state abstraction"". --- One approach is to provide an HTTP API for state queries that get routed through the Mesos master, which relays them by making calls into libmesos. Details TBD, including how separate this will be from the general HTTP API. ",13 MESOS-2726,"Add support for enabling network namespace without enabling the network isolator","Following the discussion Kapil started, it is currently not possible to enable the linux network namespace for a container without enabling the network isolator (which requires certain kernel capabilities and dependencies). Following the pattern of enabling pid namespaces (--isolation=""namespaces/pid""). One possible solution could be to add another one for network i.e. ""namespaces/network"". ",13 MESOS-2729,"Update DRF sorter to update total resources","DRF sorter currently keeps track of allocated resources and total resources, but there is no way to update the total resources. For oversubscription, we need the ability to update total resources because total oversubscribed resources change overtime.",2 MESOS-2730,"Add a new API call to the allocator to update oversubscribed resources","This tracks just the work of adding the API call to the allocator interface. Master makes this call on the allocator whenever it gets a new oversubscribed resources estimate from the slave.",2 MESOS-2733,"Update master to handle oversubscribed resource estimate from the slave","Whenever the master gets a new oversubscribed resources estimate from the slave, it should rescind any outstanding revocable offers (with oversubscribed resources) from that slave. It should then call the allocator to update the oversubscribed resources.",3 MESOS-2734,"Update allocator to allocate revocable resources","The simplest way to add support for oversubscribed resources to the allocator is to simply add them to the already existing 'Slave.total' and 'Slave.available' variables. It is easy to distinguish the revocable resources by doing a .revocable() filter. ",5 MESOS-2735,"Change the interaction between the slave and the resource estimator from polling to pushing ","This will make the semantics more clear. The resource estimator can control the speed of sending resources estimation to the slave. To avoid cyclic dependency, slave will register a callback with the resource estimator and the resource estimator will simply invoke that callback when there's a new estimation ready. The callback will be a defer to the slave's main event queue.",3 MESOS-2736,"Upgrade the design of MasterInfo","Currently, the {{MasterInfo}} PB only supports an {{ip}} field as an {{int32}}. Beyond making it harder (and opaque; open to subtle bugs) for languages other than C/C++ to decode into an IPv4 octets, this does not allow Mesos to support IPv6 Master nodes. We should consider ways to upgrade it in ways that permit us to support both IPv4 / IPv6 nodes, and, possibly, in a way that makes it easy for languages such as Java/Python that already have PB support, so could easily deserialize this information. See also MESOS-2709 for more info.",3 MESOS-2737,"Add documentation for maintainers.","In order to scale the number of committers in the project, we proposed the concept of maintainers here: http://markmail.org/thread/cjmdn3d7qfzbxhpm To follow up on that proposal, we'll need some documentation to capture the concept of maintainers. Both how contributors can benefit from maintainer feedback and the expectations of ""maintainer-ship"". In order to not enforce an excessive amount of process, maintainers will initially only serve as an encouraged means to help contributors find reviewers and get meaningful feedback.",3 MESOS-2738,"Reported used resources for tasks in frameworks do not match slave tally","[~rcorral] recently observed that according to the master's and the slave's state.json summing up the resources allocated to tasks from different frameworks on a slave does not always match the total that is reported for the slave. The latter number is sometimes higher. It would be desirable for tools that display allocation statistics to find balanced tallies. ",3 MESOS-2741,"Exposing Resources along with ResourceStatistics from resource monitor","Right now, the resource monitor returns a Usage which contains ContainerId, ExecutorInfo and ResourceStatistics. In order for resource estimator/qos controller to calculate usage slack, or tell if a container is using revokable resources or not, we need to expose the Resources that are currently assigned to the container. This requires us the change the containerizer interface to get the Resources as well while calling 'usage()'.",5 MESOS-2742,"Architecture doc on global resources",NULL,3 MESOS-2743,"Include ExecutorInfos for custom executors in master/state.json","The slave/state.json already reports executorInfos: https://github.com/apache/mesos/blob/0.22.1/src/slave/http.cpp#L215-219 Would be great to see this in the master/state.json as well, so external tools don't have to query each slave to find out executor resources, sandbox directories, etc.",3 MESOS-2746,"As a Framework User I want to be able to discover my Task's IP","The information exposed by the Framework via the {{WebUIUrl}} does not always resolves to a routable endpoint (eg, when the {{hostname}} is not publicly resolvable, or resolvable at all). In order to facilitate service discovery (via, eg, Marathon UI) we want to add the information in {{FrameworksPid}} via the {{/state-summary}} endpoint.",3 MESOS-2748,"/help generated links point to wrong URLs","As reported by Michael Lunøe (see also MESOS-329 and MESOS-913 for background): {quote} In {{mesos/3rdparty/libprocess/src/help.cpp}} a markdown file is created, which is then converted to html through a javascript library All endpoints point to {{/help/...}}, they need to work dynamically for reverse proxy to do its thing. {{/mesos/help}} works, and displays the endpoints, but they each need to go to their respective {{/help/...}} endpoint. Note that this needs to work both for master, and for slaves. I think the route to slaves help is something like this: {{/mesos/slaves/20150518-210216-1695027628-5050-1366-S0/help}}, but please double check this. {quote} The fix appears to be not too complex (as it would require to simply manipulate the generated URL) but a quick skim of the code would suggest that something more substantial may be desirable too.",2 MESOS-2750,"Extend queueing discipline wrappers to expose network isolator statistics","Export Traffic Control statistics in queueing library to enable reporting out impact of network bandwidth statistics.",3 MESOS-2752,"Add HTB queueing discipline wrapper class","Network isolator uses a Hierarchical Token Bucket (HTB) traffic control discipline on the egress filter inside each container as the root for adding traffic filters. A HTB wrapper is needed to access the network statistics for this interface.",3 MESOS-2753,"Master should validate tasks using oversubscribed resources","Current implementation out for [review|https://reviews.apache.org/r/34310] only supports setting the priority of containers with revocable CPU if it's specified in the initial executor info resources. This should be enforced at the master. Also master should make sure that oversubscribed resources used by the task are valid.",3 MESOS-2754,"Reduce multiple use of string literals","We have several instances of string literals (e.g. ""mesos-containerizer"", ""net_tcp""rtt_microseconds_p50"") being used in multiple locations where mismatches would result in correctness issues. We should replace these with a single definition to reduce the risk.",1 MESOS-2756,"Update style guide: Avoid object slicing","In order to improve the safety of our code base, let's augment the style guide to: ""Disallow public construction of base classes"" so that we can avoid the object slicing problem. This is a good pattern to follow in general as it prevents subtle semantic bugs like the following: {code:title=ObjectSlicing.cpp|borderStyle=solid} #include #include class Base { public: Base(int _v) : v(_v) {} virtual int get() const { return v; } protected: int v; }; class Derived : public Base { public: Derived(int _v) : Base(_v) {} virtual int get() const { return v + 1; } }; int main() { Base b(5); Derived d(5); std::vector vec; vec.push_back(b); vec.push_back(d); for (const auto& v : vec) { printf(""[%d]\n"", v.get()); } } {code}",1 MESOS-2757,"Add -> operator for Option, Try, Result, Future.","Let's add operator overloads to Option to allow access to the underlying T using the `->` operator. ",3 MESOS-2758,"Reflect in documentation that isolator flags are only relevant for Mesos Containerizer","The isolator flags are only relevant when using the Mesos Containerizer. We should reflect this in the flag description to avoid confusion.",1 MESOS-2760,"Add correction message to inform slave about QoS Controller actions","The QoS controller informs the slave about correcting actions (kill, resize, throttle best-effort containers, tasks, and so forth) through a protobuf message, called a QoSCorrection. This ticket tracks designing and creating this message. For example: {code} message QoSCorrection { // NOTE: In future we can define more actions like // resize or freeze, but for now we have: // 1) kill - terminate the executor or task enum Type { KILL = 1; } //Kill action which will be performed on an executor message Kill { optional ExecutorID executor_id = 1; } required Type action = 1; optional string reason = 2; optional double timestamp = 3; optional Kill kill = 4; } {code}",1 MESOS-2761,"Delegating constructors are not allowed by styleguide","As of right now the styleguide does not allow delegating constructors (being a c++ 11 feature). They are already used in the code base (e.g. stout/option.hpp), are supported by all relevant compiler (GCC 4.7+ and Clang 3.0+), and enhance readability. Therefore we should officially whitelist them in the styleguide.",1 MESOS-2762,"Explicitly-defaulted functions are not allowed by styleguide","As of right now the styleguide does not allow explicitly defaulted functions (being a c++ 11 feature). They enhance readability, are supported by all relevant compiler (GCC 4.4+ and Clang 3.0+), and are introduced by some patches (e.g. https://reviews.apache.org/r/34277/). Therefore we should officially whitelist them in the styleguide.",1 MESOS-2763,"Consolidate functionality in stout/net and process/http","stout/net.hpp and process/http.hpp offer overlapping functionality that could be consolidated in one place, presumably the latter, since it is more elaborate to begin with. This would also remove the dependency of the former on libcurl. While we are at it, we could then turn net::contentLength() into a generalized, asynchronous process::http::head() call. (Prerequisite: MESOS-2247, with the suggestion to enhance process::http, not stout, see a comment in that JIRA.) ",8 MESOS-2764,"Allow Resource Estimator to get Resource Usage information.","This includes two things: 1) We need to expose ResourceMonitor::Usage so that module writers can access it. We could define a protobuf message for that. 2) We need to allow ResourceEstimator to call 'ResourceMonitor::usages()'. We could either expose the ResourceMonitor, or pass in a lambda to the resources estimator.",5 MESOS-2766,"Add validation behavior to FlagsBase","In every ""launcher"" file (ie, those containing some variation on {{main()}}) there is a minor variation on: {code} if (flags.help) { cout << flags.usage() << endl; // arguably this is not an error: the user asked for help, // and she got it: // the program execution ought to be // considered successful. return EXIT_SUCCESS; } {code} As this is default behavior, and we've added support for the {{--help}} flag in the {{FlagsBase}} class, we should add this too there and remove it from everywhere else. Additionally, a recurring behavior is checking for the presence of a {{required}} flag: {code} if (flags.master.isNone()) { EXIT(EXIT_FAILURE) << flags.usage(""--master is required""); } {code} or some variation thereof: we should add automatic validation for required flags during parsing. This follows the DRY principle.",1 MESOS-2769,"Metric for cpu scheduling latency from all components","The metric will provide statistics on the scheduling latency for processes/threads in a container, i.e., statistics on the delay before application code can run. This will be the aggregate effect of the normal scheduling period, contention from other threads/processes, both in the container and on the system, and any effects from the CFS bandwidth control (if enabled) or other CPU isolation strategies.",8 MESOS-2770,"Slave should forward total amount of oversubscribed resources to the master","In addition to the unallocated oversubscribed resources, the slave should also send the oversubscribed resources that are already allocated. This is needed by the master/allocator to accurately calculate the available oversubscribed resources to offer.",3 MESOS-2771,"SIGSEGV received during ResourceMonitorProcess::usage()","Observed in production. {noformat:title=slave log} I0523 17:03:59.830229 56587 port_mapping.cpp:2616] Freed ephemeral ports [33792,34816) for container with pid 47791 I0523 17:03:59.849773 56587 port_mapping.cpp:2764] Successfully performed cleanup for pid 47791 *** Aborted at 1432400641 (unix time) try ""date -d @1432400641"" if you are using GNU date *** PC: @ 0x7f100fcbfd85 _ZNSt17_Function_handlerIFvRKSsEZNK7process6FutureIN5mesos8internal5slave15ResourceMonitor5UsageEE8onFailedIZNS7_22ResourceMonitorProcess5usageENS5_11ContainerIDEEUlS1_E_vEERKSA_OT_NSA_6PreferEEUlS1_E_E9_M_invokeERKSt9_Any_dataS1_ I0523 17:03:59.898959 56587 slave.cpp:3246] Executor 'thermos-1432400210944-mesos-test-exhaust_diskspace-5-4744d0fb-e0a1-4e40-bb22-56bd5cbd9524' of framework 201103282247-0000000019-0000 terminated with signal Killed I0523 17:04:03.419869 56587 slave.cpp:2547] Handling status update TASK_FAILED (UUID: 3be19404-f737-4a70-a330-d1d924a85dbb) for task 1432400210944-mesos-test-exhaust_diskspace-5-4744d0fb-e0a1-4e40-bb22-56bd5cbd9524 of framework 201103282247-0000000019-0000 from @0.0.0.0:0 I0523 17:04:03.773061 56587 slave.cpp:4077] Received a new estimation of the oversubscribable resources I0523 17:04:03.773907 56587 slave.cpp:4077] Received a new estimation of the oversubscribable resources I0523 17:04:03.774683 56587 slave.cpp:4077] Received a new estimation of the oversubscribable resources I0523 17:04:03.776345 56587 slave.cpp:4077] Received a new estimation of the oversubscribable resources *** SIGSEGV (@0x0) received by PID 56573 (TID 0x7f100a190940) from PID 0; stack trace: *** @ 0x7f100f181ca0 (unknown) @ 0x7f100fcbfd85 _ZNSt17_Function_handlerIFvRKSsEZNK7process6FutureIN5mesos8internal5slave15ResourceMonitor5UsageEE8onFailedIZNS7_22ResourceMonitorProcess5usageENS5_11ContainerIDEEUlS1_E_vEERKSA_OT_NSA_6PreferEEUlS1_E_E9_M_invokeERKSt9_Any_dataS1_ @ 0x7f100fb01506 process::internal::run<>() @ 0x7f100fcc701b process::Future<>::fail() @ 0x7f100fccfbde process::internal::thenf<>() @ 0x7f100fd64bee _ZN7process8internal3runISt8functionIFvRKNS_6FutureIN5mesos18ResourceStatisticsEEEEEJRS6_EEEvRKSt6vectorIT_SaISD_EEDpOT0_ @ 0x7f100fd656dd process::Future<>::fail() @ 0x7f100fd6c332 process::Promise<>::associate() @ 0x7f100fe2777e _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos18ResourceStatisticsENS5_8internal5slave25MesosContainerizerProcessERKNS5_11ContainerIDESA_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSH_FSF_T1_ET2_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ @ 0x7f101015561a process::ProcessManager::resume() @ 0x7f10101558dc process::schedule() @ 0x7f100f17983d start_thread @ 0x7f100e96bfcd clone /usr/local/bin/mesos-slave.sh: line 102: 56573 Segmentation fault (core dumped) $debug /usr/local/sbin/mesos-slave ""${MESOS_FLAGS[@]}"" Slave Exit Status: 139 {noformat} {noformat:title=gdb core dump} Thread 20 (Thread 0x7f100a190940 (LWP 56574)): #0 _M_data (__functor=Unhandled dwarf expression opcode 0xf3 ) at /opt/rh/devtoolset-2/root/usr/include/c++/4.8.2/bits/basic_string.h:293 #1 _M_rep (__functor=Unhandled dwarf expression opcode 0xf3 ) at /opt/rh/devtoolset-2/root/usr/include/c++/4.8.2/bits/basic_string.h:301 #2 size (__functor=Unhandled dwarf expression opcode 0xf3 ) at /opt/rh/devtoolset-2/root/usr/include/c++/4.8.2/bits/basic_string.h:716 #3 operator<< , std::allocator > (__functor=Unhandled dwarf expression opcode 0xf3 ) at /opt/rh/devtoolset-2/root/usr/include/c++/4.8.2/bits/basic_string.h:2758 #4 operator<< (__functor=Unhandled dwarf expression opcode 0xf3 ) at ../include/mesos/type_utils.hpp:267 #5 operator() (__functor=Unhandled dwarf expression opcode 0xf3 ) at slave/monitor.cpp:129 #6 operator() (__functor=Unhandled dwarf expression opcode 0xf3 ) at ../3rdparty/libprocess/include/process/future.hpp:220 #7 std::_Function_handler, std::allocator >&), process::Future::onFailed(F&&, process::Future::Prefer) const [with F = mesos::internal::slave::ResourceMonitorProcess::usage(mesos::ContainerID)::__lambda180; = void; T = mesos::internal::slave::ResourceMonitor::Usage]::__lambda2>::_M_invoke(const std::_Any_data &, const std::basic_string, std::allocator > &) (__functor=Unhandled dwarf expression opcode 0xf3 ) at /opt/rh/devtoolset-2/root/usr/include/c++/4.8.2/functional:2071 #8 0x00007f100fb01506 in process::internal::run&)>, std::basic_string, std::allocator >&>(const std::vector, std::allocator >&)>, std::allocator, std::allocator >&)> > > &) (callbacks=std::vector of length 1, capacity 1 = {...}) at ../3rdparty/libprocess/include/process/future.hpp:420 #9 0x00007f100fcc701b in process::Future::fail (this=0x7f0ffc185ca8, _message=""Unknown container: c0ab6cd3-fe4f-49bd-8dd6-32b388fcfab2"") at ../3rdparty/libprocess/include/process/future.hpp:1406 #10 0x00007f100fccfbde in fail (f=Unhandled dwarf expression opcode 0xf3 ) at ../3rdparty/libprocess/include/process/future.hpp:649 #11 process::internal::thenf(const std::function(const mesos::ResourceStatistics&)> &, const std::shared_ptr > &, const process::Future &) (f=Unhandled dwarf expression opcode 0xf3 ) at ../3rdparty/libprocess/include/process/future.hpp:1193 #12 0x00007f100fd64bee in operator() (callbacks=std::vector of length 1, capacity 1 = {...}) at /opt/rh/devtoolset-2/root/usr/include/c++/4.8.2/functional:2464 #13 process::internal::run&)>, process::Future&>(const std::vector&)>, std::allocator&)> > > &) (callbacks=std::vector of length 1, capacity 1 = {...}) at ../3rdparty/libprocess/include/process/future.hpp:420 #14 0x00007f100fd656dd in process::Future::fail (this=0x7f0ff8046230, _message=""Unknown container: c0ab6cd3-fe4f-49bd-8dd6-32b388fcfab2"") at ../3rdparty/libprocess/include/process/future.hpp:1407 #15 0x00007f100fd6c332 in onFailed (this=Unhandled dwarf expression opcode 0xf3 ) at ../3rdparty/libprocess/include/process/future.hpp:1121 #16 onFailed::*)(const std::basic_string&)>(process::Future, std::_Placeholder<1>)>, bool> (this=Unhandled dwarf expression opcode 0xf3 ) at ../3rdparty/libprocess/include/process/future.hpp:221 #17 onFailed::*)(const std::basic_string&)>(process::Future, std::_Placeholder<1>)> > (this=Unhandled dwarf expression opcode 0xf3 ) at ../3rdparty/libprocess/include/process/future.hpp:270 #18 process::Promise::associate (this=Unhandled dwarf expression opcode 0xf3 ) at ../3rdparty/libprocess/include/process/future.hpp:635 #19 0x00007f100fe2777e in operator() (__functor=Unhandled dwarf expression opcode 0xf3 ) at ../3rdparty/libprocess/include/process/dispatch.hpp:239 #20 std::_Function_handler&, process::Future (T::*)(P0), A0) [with R = mesos::ResourceStatistics; T = mesos::internal::slave::MesosContainerizerProcess; P0 = const mesos::ContainerID&; A0 = mesos::ContainerID]::__lambda21>::_M_invoke(const std::_Any_data &, process::ProcessBase *) (__functor=Unhandled dwarf expression opcode 0xf3 ) at /opt/rh/devtoolset-2/root/usr/include/c++/4.8.2/functional:2071 #21 0x00007f101015561a in process::ProcessManager::resume (this=0xc24d20, process=0x7f0ffc0169b0) at src/process.cpp:2172 #22 0x00007f10101558dc in process::schedule (arg=Unhandled dwarf expression opcode 0xf3 ) at src/process.cpp:602 #23 0x00007f100f17983d in start_thread () from /lib64/libpthread.so.0 #24 0x00007f100e96bfcd in clone () from /lib64/libc.so.6 {noformat}",1 MESOS-2772,"Define protobuf for ResourceMonitor::Usage.","We need to expose ResourceMonitor::Usage so that module writers can access it. We will define a protobuf message for that.",1 MESOS-2775,"Slave should expose metrics about oversubscribed resources","metrics/snapshot should expose metrics on oversubscribed resources (allocated and available). ",2 MESOS-2776,"Master should expose metrics about oversubscribed resources","metrics/snapshot should expose metrics on oversubscribed resources (allocated and available). ",5 MESOS-2778,"Non-POD static variables used in fq_codel and ingress.","We declare const non-POD static variables for the following: fq_codel::HANDLE ingress::ROOT ingress::HANDLE We can eliminate the risk of indeterminate initialization by converting to C++11 constexpr",1 MESOS-2781,"getQdisc function in routing::queueing::internal.cpp returns incorrect qdisc","The getQdisc function ignores the passed link parameter and returns the first qdisc of the required type from any available interface.",1 MESOS-2783,"document the fetcher","For framework developers specifically, Mesos provides a fetcher to move binaries. This needs MVP documentation. - What is it - How does it help - What protocols or schemas are supported - Can it be extended This is important to get framework developers over the hump of learning to code against Mesos and grow the ecosystem.",5 MESOS-2784,"Added constexpr to C++11 whitelist.","constexpr is currently used to eliminate initialization dependency issues for non-POD objects. We should add it to the whitelist of acceptable c++11 features in the style guide.",1 MESOS-2791,"Create a FixedResourceEstimator to return fixed amount of oversubscribable resources.","This will be useful for testing oversubscription in a real environment. Also, it will be useful for people who has a prior knowledge about the amount of resources that can be safely oversubscribed on each slave.",5 MESOS-2792,"Remove duplicate literals in ingress & fq_codel queueing disciplines","fq_codel and ingress queueing disciplines include multiple uses of the string literals ""ingress"" and ""fq_codel"". Any mismatch in these would cause runtime errors which can be prevented at compile time.",1 MESOS-2793,"Add support for container rootfs to Mesos isolators","Mesos containers can have a different rootfs to the host. Update Isolator interface to pass rootfs during Isolator::prepare(). Update Isolators where necessary.",1 MESOS-2794,"Implement filesystem isolators","Move persistent volume support from Mesos containerizer to separate filesystem isolators, including support for container rootfs, where possible. Use symlinks for posix systems without container rootfs. Use bind mounts for Linux with/without container rootfs.",13 MESOS-2795,"Introduce filesystem provisioner abstraction","Optional filesystem provisioner component for the Mesos containerizer that can provision per-container filesystems. This is different to a filesystem isolators because it just provisions a root filesystem for a container and doesn't actually do any isolation (e.g., through a mount namespace + pivot or chroot).",5 MESOS-2796,"Implement AppC image provisioner.","Implement a filesystem provisioner that can provision container images compliant with the Application Container Image (aci) [specification|https://github.com/appc/spec].",5 MESOS-2798,"Export statistics on ""unevictable"" memory",NULL,1 MESOS-2800,"Rename Option::get(const T& _t) to getOrElse() and refactor the original function","As suggested, if we want to change the name then we should refactor the original function as opposed to having 2 copies. If we did have 2 versions of the same function, would it make more sense to delegate one of them to the other. As of today, there is only one file need to be refactor: 3rdparty/libprocess/3rdparty/stout/include/stout/os/osx.hpp at line 151, 161",3 MESOS-2801,"Remove dynamic allocation from Future","Remove the dynamic allocation of `T*` inside `Future::Data`",3 MESOS-2804,"Log framework capabilities in the master.","Now that {{Capabilities}} has been added to FrameworkInfo, we should log these in the master when a framework (re-)registers (i.e. which capabilities are enabled and disabled). This would make debugging easier for framework developers. Ideally, folding in the old {{checkpoint}} capability and logging that as well. In the past, the fact that {{checkpoint}} defaults to false has tripped up a lot of developers.",1 MESOS-2805,"Make synchronized as primary form of synchronization.","Re-organize Synchronized to allow {{synchronized(m)}} to work on: 1. {{std::mutex}} 2. {{std::recursive_mutex}} 3. {{std::atomic_flag}} Move synchronized.hpp into stout, so that developers don't think it's part of the utility suite for actors in libprocess. Remove references to internal.hpp and replace them with {{std::atomic_flag}} synchronization.",8 MESOS-2806,"Jira workflow appears inconsistent","See attached screenshot - the story is in the {{Accepted}} state, so it should now have a {{Start Progress}} button, but it has a {{Stop Progress}} one instead. Also, when in the {{In Progress}} it has an {{Accept}} button (I think) or something similar; also other states appear inconsistent. This Story is about first looking at the workflow; ensuring the stories and their status(es) are consistent; that button in the UI are consistently applied and then correct any issues that may have been identified. The assumption here is that the workflow is: {noformat} Open >> Accepted >> Progress >> Reviewable >> Resolved >> Closed Accept Start Ready Resolve Close {noformat} and, at each stage, it can be moved ""back by one"" ({{Unaccept}}, {{Stop Progress}}, {{Unresolve}}) and that, at any stage, it can be moved to {{Closed}} (for whatever reason).",2 MESOS-2807,"As a developer I need an easy way to convert MasterInfo protobuf to/from JSON","As a preliminary to MESOS-2340, this requires the implementation of a simple (de)serialization mechanism to JSON from/to {{MasterInfo}} protobuf.",3 MESOS-2808,"Slave should call into resource estimator whenever it wants to forward oversubscribed resources","Currently, the polling of resource estimator is decoupled from the loop in the slave that forwards oversubscribed resources. Now that the slave only sends updates when there is a change from the previous estimate, it can just poll the resource estimator whenever it wants to send an estimate. One advantage with this is that if the estimator is slow to respond, the slave doesn't keep forwarding estimates with the stale 'oversubscribable' value causing more revocable tasks to be unintentionally launched.",3 MESOS-2814,"os::read should have one implementation","In master there are currently three implementations of the function: https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/os/read.hpp#L42 https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/os/read.hpp#L82 https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/os/read.hpp#L42 All of them have fairly radically different implementations (One uses C read(), one uses c++ ifstream, one uses c fopen) The read() based one does an excess / unnecessary copy / buffer allocation (it is going to read into one temporary buffer, then copy into the result string. Would be more efficient to do a .reserve() on the result string, and then fill the result buffer). The ifstream/ifstreambuf_iterator ignores that you can have an error partially through reading a file / doesn't find the error or propagate it up. The fopen() variant reads one newline separated line at a time. This could produce interesting / unexpected reading in the context of a binary file. It also causes glibc to insert null bytes at the end of the buffer it reads (excess computation). result isn't pre-allocated to be the right length, meaning that most of the continually read lines will result in realloc() and a lot of memory copies which will be inefficient on large files.",3 MESOS-2815,"Flaky test: FetcherCacheHttpTest.HttpCachedSerialized","FetcherCacheHttpTest.HttpCachedSerialized has been observed to fail (once so far), but normally works fine. Here is the failure output: [ RUN ] FetcherCacheHttpTest.HttpCachedSerialized GMOCK WARNING: Uninteresting mock function call - returning directly. Function call: resourceOffers(0x3cca8e0, @0x2b1053422b20 { 128-byte object }) Stack trace: F0604 13:08:16.377907 6813 fetcher_cache_tests.cpp:354] CHECK_READY(offers): is PENDING Failed to wait for resource offers *** Check failure stack trace: *** @ 0x2b10488ff6c0 google::LogMessage::Fail() @ 0x2b10488ff60c google::LogMessage::SendToLog() @ 0x2b10488ff00e google::LogMessage::Flush() @ 0x2b1048901f22 google::LogMessageFatal::~LogMessageFatal() @ 0x9721e4 _CheckFatal::~_CheckFatal() @ 0xb4da86 mesos::internal::tests::FetcherCacheTest::launchTask() @ 0xb53f8d mesos::internal::tests::FetcherCacheHttpTest_HttpCachedSerialized_Test::TestBody() @ 0x116ac21 testing::internal::HandleSehExceptionsInMethodIfSupported<>() @ 0x1165e1e testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x114e1df testing::Test::Run() @ 0x114e902 testing::TestInfo::Run() @ 0x114ee8a testing::TestCase::Run() @ 0x1153b54 testing::internal::UnitTestImpl::RunAllTests() @ 0x116ba93 testing::internal::HandleSehExceptionsInMethodIfSupported<>() @ 0x1166b0f testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x1152a60 testing::UnitTest::Run() @ 0xcbc50f main @ 0x2b104af78ec5 (unknown) @ 0x867559 (unknown) make[4]: *** [check-local] Aborted make[4]: Leaving directory `/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.23.0/_build/src' make[3]: *** [check-am] Error 2 make[3]: Leaving directory `/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.23.0/_build/src' make[2]: *** [check] Error 2 make[2]: Leaving directory `/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.23.0/_build/src' make[1]: *** [check-recursive] Error 1 make[1]: Leaving directory `/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.23.0/_build' make: *** [distcheck] Error 1 ",2 MESOS-2817,"Support revocable/non-revocable CPU updates in Mesos containerizer","MESOS-2652 provided preliminary support for revocable cpu resources only when specified in the initial resources for a container. Improve this to support updates to/from revocable cpu. Note, *any* revocable cpu will result in the entire container's cpu being treated as revocable at the cpu isolator level. Higher level logic is responsible for adding/removing based on some policy.",3 MESOS-2818,"Pass 'allocated' resources for each executor to the resource estimator.","Resource estimator obviously need this information to calculate, say the usage slack. Now the question is how. There are two approaches: 1) Pass in the allocated resources for each executor through the 'oversubscribable()' interface. 2) Let containerizer return total resources allocated for each container when 'usages()' are invoked. I would suggest to take route (1) for several reasons: 1) Eventually, we'll need to pass in slave's total resources to the resource estimator (so that RE can calculate allocation slack). There is no way that we can get that from containerizer. The slave's total resources keep changing due to dynamic reservation. So we cannot pass in the slave total resources during initialization. 2) The current implementation of usages() might skip some containers if it fails to get statistics for that container (not an error). This will cause in-complete information to the RE. 3) We may want to calculate 'unallocated = total - allocated' so that we can send allocation slack as well. Getting 'total' and 'allocated' from two different components might result in inconsistent value. Remember that 'total' keeps changing due to dynamic reservation.",3 MESOS-2821,"Document and consolidate qdisc handles","The structure of traffic control qdiscs and filters in non-trivial with the knowledge of which handles are the parents of which filters or qdiscs are in the create and recovery functions and will be needed to collect statistics on the links. Lets pull out the constants and document them.",1 MESOS-2822,"Add `EXPECT_NO_FUTURE_DISPATCHES` macro for tests.","We already have {{EXPECT_NO_FUTURE_MESSAGES}}, {{EXPECT_NO_FUTURE_DISPATCHES}} should be done the same way. We already have a use case for it: https://github.com/apache/mesos/blob/master/src/tests/master_contender_detector_tests.cpp#L251",1 MESOS-2823,"Pass callback to the QoS Controller to retrieve ResourceUsage from Resource Monitor on demand.","We need to allow QoS Controller to call 'ResourceMonitor::usages()'. We will pass it in a lambda. ",2 MESOS-2824,"Support pre-fetching images","Default container images can be specified with the --default_container_info flag to the slave. This may be a large image that will take a long time to initially fetch/hash/extract when the first container is provisioned. Add optional support to start fetching the image when the slave starts and consider not registering until the fetch is complete. To extend that, we should support an operator endpoint so that operators can specify images to pre-fetch.",5 MESOS-2830,"Add an endpoint to slaves to allow launching system administration tasks","As a System Administrator often times I need to run a organization-mandated task on every machine in the cluster. Ideally I could do this within the framework of mesos resources if it is a ""cleanup"" or auditing task, but sometimes I just have to run something, and run it now, regardless if a machine has un-accounted resources (Ex: Adding/removing a user). Currently to do this I have to completely bypass Mesos and SSH to the box. Ideally I could tell a mesos slave (With proper authentication) to run a container with the limited special permissions needed to get the task done.",8 MESOS-2832,"Enable configuring Mesos with environment variables without having them leak to tasks launched","Currently if mesos is configured with environment variables (MESOS_MODULES), those show up in every task which is launched unless the executor explicitly cleans them up. If the task being launched happens to be something libprocess / mesos based, this can often prevent the task from starting up (A scheduler has issues loading a module intended for the slave). There are also cases where it would be nice to be able to change what the PATH is that tasks launch with (the host may have more in the path than tasks are supposed to / allowed to depend upon).",8 MESOS-2834,"Support different perf output formats","The output format of perf changes in 3.14 (inserting an additional field) and in again in 4.1 (appending additional) fields. See kernel commits: 410136f5dd96b6013fe6d1011b523b1c247e1ccb d73515c03c6a2706e088094ff6095a3abefd398b Update the perf::parse() function to understand all these formats.",3 MESOS-2836,"Report per-container metrics for network bandwidth throttling to the slave","Report per-container metrics for network bandwidth throttling to the slave in the output of mesos-network-helper.",1 MESOS-2837,"Decode network statistics from mesos-network-helper","Decode network statistics from mesos-network-helper and output to slave statistics.json",1 MESOS-2838,"In Resources JSON model() resources of the same name overwrite each other.","As shown here: https://github.com/apache/mesos/blob/8559d7b7356ec91795e564767588c6f4519653a5/src/common/http.cpp#L50 So if there are two ""cpus"" of different roles, whichever comes later will overwrite the previous. We should instead aggregate different resources of the same name. However, in the presence of revocable resources, in order to maintain backwards compatibility we should exclude revocable resources.",2 MESOS-2841,"FrameworkInfo should include a Labels field to support arbitrary, lightweight metadata","A framework instance may offer specific capabilities to the cluster: storage, smartly-balanced request handling across deployed tasks, access to 3rd party services outside of the cluster, etc. These capabilities may or may not be utilized by all, or even most mesos clusters. However, it should be possible for processes running in the cluster to discover capabilities or features of frameworks in order to achieve a higher level of functionality and a more seamless integration experience across the cluster. A rich discovery API attached to the FrameworkInfo could result in some form of early lock-in: there are probably many ways to realize cross-framework integration and external services integration that we haven't considered yet. Rather than over-specify a discovery info message type at the framework level I think FrameworkInfo should expose a **very generic** way to supply metadata for interested consumers (other processes, tasks, etc). Adding a Labels field to FrameworkInfo reuses an existing message type and seems to fit well with the overall intent: attaching generic metadata to a framework instance. These labels should be visible when querying a mesos master's state.json endpoint.",8 MESOS-2844,"Add and document new labels field to framework info","Add and document new labels field to framework info: {code} message FrameworkInfo { // Used to determine the Unix user that an executor or task should // be launched as. If the user field is set to an empty string Mesos // will automagically set it to the current user. required string user = 1; // Name of the framework that shows up in the Mesos Web UI. required string name = 2; // Note that 'id' is only available after a framework has // registered, however, it is included here in order to facilitate // scheduler failover (i.e., if it is set then the // MesosSchedulerDriver expects the scheduler is performing // failover). optional FrameworkID id = 3; ... // This field allows a framework to advertise its set of // capabilities (e.g., ability to receive offers for revocable // resources). repeated Capability capabilities = 10; optional Labels labels = 11; } {code}",1 MESOS-2848,"Local filesystem docker image discovery","Given a docker image name and the local directory where images can be found, creates a URI with a path to the corresponding image. Done when system successfully checks for the image, untars the image if necessary, and returns the proper URI to the image.",2 MESOS-2849,"Implement Docker local image store","Given a local Docker image name and path to the image or image tarball, fetches the image's dependent layers, untarring if necessary. It will also parse the image layers' configuration json and place the layers and image into persistent store. Done when a Docker image can be successfully stored and retrieved using 'put' and 'get' methods. ",5 MESOS-2850,"Implement Docker image provisioner","Provisions a Docker image (provisions all its dependent layers), fetch an image from persistent store, and also destroy an image. Done when tested for local discovery and copy backend. ",3 MESOS-2851,"Add Docker Image Type to protobuf API",NULL,1 MESOS-2853,"Report per-container metrics from host egress filter","Export in statistics.json the fq_codel flow statistics for each container.",1 MESOS-2854,"Resources::parse(...) allows different resources of the same name to have different types.","So code like this doesn't raise Error. {code} Resources::parse(""foo(role1):1;foo(role2):[0-1]"") {code} Doesn't look like allowing this adds value and this complicates resource maths/validation/reporting. We should disallow this.",2 MESOS-2857,"FetcherCacheTest.LocalCachedExtract is flaky.","From jenkins: {noformat} [ RUN ] FetcherCacheTest.LocalCachedExtract Using temporary directory '/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj' I0610 20:04:48.591573 24561 leveldb.cpp:176] Opened db in 3.512525ms I0610 20:04:48.592456 24561 leveldb.cpp:183] Compacted db in 828630ns I0610 20:04:48.592512 24561 leveldb.cpp:198] Created db iterator in 32992ns I0610 20:04:48.592531 24561 leveldb.cpp:204] Seeked to beginning of db in 8967ns I0610 20:04:48.592545 24561 leveldb.cpp:273] Iterated through 0 keys in the db in 7762ns I0610 20:04:48.592604 24561 replica.cpp:744] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0610 20:04:48.593438 24587 recover.cpp:449] Starting replica recovery I0610 20:04:48.593698 24587 recover.cpp:475] Replica is in EMPTY status I0610 20:04:48.595641 24580 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request I0610 20:04:48.596086 24590 recover.cpp:195] Received a recover response from a replica in EMPTY status I0610 20:04:48.596607 24590 recover.cpp:566] Updating replica status to STARTING I0610 20:04:48.597507 24590 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 717888ns I0610 20:04:48.597535 24590 replica.cpp:323] Persisted replica status to STARTING I0610 20:04:48.597697 24590 recover.cpp:475] Replica is in STARTING status I0610 20:04:48.599165 24584 replica.cpp:641] Replica in STARTING status received a broadcasted recover request I0610 20:04:48.599434 24584 recover.cpp:195] Received a recover response from a replica in STARTING status I0610 20:04:48.599915 24590 recover.cpp:566] Updating replica status to VOTING I0610 20:04:48.600545 24590 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 432335ns I0610 20:04:48.600574 24590 replica.cpp:323] Persisted replica status to VOTING I0610 20:04:48.600659 24590 recover.cpp:580] Successfully joined the Paxos group I0610 20:04:48.600797 24590 recover.cpp:464] Recover process terminated I0610 20:04:48.602905 24594 master.cpp:363] Master 20150610-200448-3875541420-32907-24561 (dbade881e927) started on 172.17.0.231:32907 I0610 20:04:48.602957 24594 master.cpp:365] Flags at startup: --acls="""" --allocation_interval=""1secs"" --allocator=""HierarchicalDRF"" --authenticate=""true"" --authenticate_slaves=""true"" --authenticators=""crammd5"" --credentials=""/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/credentials"" --framework_sorter=""drf"" --help=""false"" --initialize_driver_logging=""true"" --log_auto_initialize=""true"" --logbufsecs=""0"" --logging_level=""INFO"" --quiet=""false"" --recovery_slave_removal_limit=""100%"" --registry=""replicated_log"" --registry_fetch_timeout=""1mins"" --registry_store_timeout=""25secs"" --registry_strict=""true"" --root_submissions=""true"" --slave_reregister_timeout=""10mins"" --user_sorter=""drf"" --version=""false"" --webui_dir=""/mesos/mesos-0.23.0/_inst/share/mesos/webui"" --work_dir=""/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/master"" --zk_session_timeout=""10secs"" I0610 20:04:48.603374 24594 master.cpp:410] Master only allowing authenticated frameworks to register I0610 20:04:48.603392 24594 master.cpp:415] Master only allowing authenticated slaves to register I0610 20:04:48.603404 24594 credentials.hpp:37] Loading credentials for authentication from '/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/credentials' I0610 20:04:48.603751 24594 master.cpp:454] Using default 'crammd5' authenticator I0610 20:04:48.604928 24594 master.cpp:491] Authorization enabled I0610 20:04:48.606034 24593 hierarchical.hpp:309] Initialized hierarchical allocator process I0610 20:04:48.606106 24593 whitelist_watcher.cpp:79] No whitelist given I0610 20:04:48.607430 24594 master.cpp:1476] The newly elected leader is master@172.17.0.231:32907 with id 20150610-200448-3875541420-32907-24561 I0610 20:04:48.607466 24594 master.cpp:1489] Elected as the leading master! I0610 20:04:48.607481 24594 master.cpp:1259] Recovering from registrar I0610 20:04:48.607712 24594 registrar.cpp:313] Recovering registrar I0610 20:04:48.608543 24588 log.cpp:661] Attempting to start the writer I0610 20:04:48.610231 24588 replica.cpp:477] Replica received implicit promise request with proposal 1 I0610 20:04:48.611335 24588 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 1.086439ms I0610 20:04:48.611382 24588 replica.cpp:345] Persisted promised to 1 I0610 20:04:48.612303 24588 coordinator.cpp:230] Coordinator attemping to fill missing position I0610 20:04:48.613883 24593 replica.cpp:378] Replica received explicit promise request for position 0 with proposal 2 I0610 20:04:48.619205 24593 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 5.228235ms I0610 20:04:48.619257 24593 replica.cpp:679] Persisted action at 0 I0610 20:04:48.621919 24593 replica.cpp:511] Replica received write request for position 0 I0610 20:04:48.621987 24593 leveldb.cpp:438] Reading position from leveldb took 49394ns I0610 20:04:48.622689 24593 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 668412ns I0610 20:04:48.622716 24593 replica.cpp:679] Persisted action at 0 I0610 20:04:48.623507 24584 replica.cpp:658] Replica received learned notice for position 0 I0610 20:04:48.624155 24584 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 612283ns I0610 20:04:48.624186 24584 replica.cpp:679] Persisted action at 0 I0610 20:04:48.624215 24584 replica.cpp:664] Replica learned NOP action at position 0 I0610 20:04:48.625144 24593 log.cpp:677] Writer started with ending position 0 I0610 20:04:48.626724 24589 leveldb.cpp:438] Reading position from leveldb took 72013ns I0610 20:04:48.629276 24591 registrar.cpp:346] Successfully fetched the registry (0B) in 21.520128ms I0610 20:04:48.629663 24591 registrar.cpp:445] Applied 1 operations in 129587ns; attempting to update the 'registry' I0610 20:04:48.632237 24579 log.cpp:685] Attempting to append 131 bytes to the log I0610 20:04:48.632624 24579 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 1 I0610 20:04:48.633739 24579 replica.cpp:511] Replica received write request for position 1 I0610 20:04:48.634351 24579 leveldb.cpp:343] Persisting action (150 bytes) to leveldb took 583937ns I0610 20:04:48.634382 24579 replica.cpp:679] Persisted action at 1 I0610 20:04:48.635073 24583 replica.cpp:658] Replica received learned notice for position 1 I0610 20:04:48.635442 24583 leveldb.cpp:343] Persisting action (152 bytes) to leveldb took 357122ns I0610 20:04:48.635469 24583 replica.cpp:679] Persisted action at 1 I0610 20:04:48.635494 24583 replica.cpp:664] Replica learned APPEND action at position 1 I0610 20:04:48.636337 24583 registrar.cpp:490] Successfully updated the 'registry' in 6.534144ms I0610 20:04:48.636725 24594 log.cpp:704] Attempting to truncate the log to 1 I0610 20:04:48.636858 24583 registrar.cpp:376] Successfully recovered registrar I0610 20:04:48.637073 24594 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 2 I0610 20:04:48.637789 24594 master.cpp:1286] Recovered 0 slaves from the Registry (95B) ; allowing 10mins for slaves to re-register I0610 20:04:48.638630 24583 replica.cpp:511] Replica received write request for position 2 I0610 20:04:48.639127 24583 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 396272ns I0610 20:04:48.639153 24583 replica.cpp:679] Persisted action at 2 I0610 20:04:48.639804 24583 replica.cpp:658] Replica received learned notice for position 2 I0610 20:04:48.640965 24583 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 1.147322ms I0610 20:04:48.641054 24583 leveldb.cpp:401] Deleting ~1 keys from leveldb took 72395ns I0610 20:04:48.641197 24583 replica.cpp:679] Persisted action at 2 I0610 20:04:48.641345 24583 replica.cpp:664] Replica learned TRUNCATE action at position 2 I0610 20:04:48.652274 24561 containerizer.cpp:111] Using isolation: posix/cpu,posix/mem I0610 20:04:48.658994 24590 slave.cpp:188] Slave started on 42)@172.17.0.231:32907 I0610 20:04:48.659049 24590 slave.cpp:189] Flags at startup: --authenticatee=""crammd5"" --cgroups_cpu_enable_pids_and_tids_count=""false"" --cgroups_enable_cfs=""false"" --cgroups_hierarchy=""/sys/fs/cgroup"" --cgroups_limit_swap=""false"" --cgroups_root=""mesos"" --container_disk_watch_interval=""15secs"" --containerizers=""mesos"" --credential=""/tmp/FetcherCacheTest_LocalCachedExtract_LCHuuM/credential"" --default_role=""*"" --disk_watch_interval=""1mins"" --docker=""docker"" --docker_kill_orphans=""true"" --docker_remove_delay=""6hrs"" --docker_sandbox_directory=""/mnt/mesos/sandbox"" --docker_socket=""/var/run/docker.sock"" --docker_stop_timeout=""0ns"" --enforce_container_disk_quota=""false"" --executor_registration_timeout=""1mins"" --executor_shutdown_grace_period=""5secs"" --fetcher_cache_dir=""/tmp/FetcherCacheTest_LocalCachedExtract_LCHuuM/fetch"" --fetcher_cache_size=""2GB"" --frameworks_home="""" --gc_delay=""1weeks"" --gc_disk_headroom=""0.1"" --hadoop_home="""" --help=""false"" --initialize_driver_logging=""true"" --isolation=""posix/cpu,posix/mem"" --launcher_dir=""/mesos/mesos-0.23.0/_build/src"" --logbufsecs=""0"" --logging_level=""INFO"" --oversubscribed_resources_interval=""15secs"" --perf_duration=""10secs"" --perf_interval=""1mins"" --quiet=""false"" --recover=""reconnect"" --recovery_timeout=""15mins"" --registration_backoff_factor=""10ms"" --resource_monitoring_interval=""1secs"" --resources=""cpus(*):1000; mem(*):1000"" --revocable_cpu_low_priority=""true"" --strict=""true"" --switch_user=""true"" --version=""false"" --work_dir=""/tmp/FetcherCacheTest_LocalCachedExtract_LCHuuM"" I0610 20:04:48.659570 24590 credentials.hpp:85] Loading credential for authentication from '/tmp/FetcherCacheTest_LocalCachedExtract_LCHuuM/credential' I0610 20:04:48.659803 24590 slave.cpp:319] Slave using credential for: test-principal I0610 20:04:48.660441 24590 slave.cpp:352] Slave resources: cpus(*):1000; mem(*):1000; disk(*):3.70122e+06; ports(*):[31000-32000] I0610 20:04:48.660555 24590 slave.cpp:382] Slave hostname: dbade881e927 I0610 20:04:48.660578 24590 slave.cpp:387] Slave checkpoint: true I0610 20:04:48.661550 24588 state.cpp:35] Recovering state from '/tmp/FetcherCacheTest_LocalCachedExtract_LCHuuM/meta' I0610 20:04:48.661913 24590 status_update_manager.cpp:201] Recovering status update manager I0610 20:04:48.662253 24590 containerizer.cpp:312] Recovering containerizer I0610 20:04:48.663207 24581 slave.cpp:3950] Finished recovery I0610 20:04:48.663761 24581 slave.cpp:4104] Querying resource estimator for oversubscribable resources I0610 20:04:48.664077 24581 slave.cpp:678] New master detected at master@172.17.0.231:32907 I0610 20:04:48.664088 24586 status_update_manager.cpp:175] Pausing sending status updates I0610 20:04:48.664245 24581 slave.cpp:741] Authenticating with master master@172.17.0.231:32907 I0610 20:04:48.664388 24581 slave.cpp:746] Using default CRAM-MD5 authenticatee I0610 20:04:48.664611 24581 slave.cpp:714] Detecting new master I0610 20:04:48.664647 24594 authenticatee.hpp:139] Creating new client SASL connection I0610 20:04:48.664813 24581 slave.cpp:4125] Received oversubscribable resources from the resource estimator I0610 20:04:48.665060 24581 slave.cpp:4129] No master detected. Re-querying resource estimator after 15secs I0610 20:04:48.665096 24594 master.cpp:4181] Authenticating slave(42)@172.17.0.231:32907 I0610 20:04:48.665247 24581 authenticator.cpp:406] Starting authentication session for crammd5_authenticatee(130)@172.17.0.231:32907 I0610 20:04:48.665657 24581 authenticator.cpp:92] Creating new server SASL connection I0610 20:04:48.666013 24581 authenticatee.hpp:230] Received SASL authentication mechanisms: CRAM-MD5 I0610 20:04:48.666159 24581 authenticatee.hpp:256] Attempting to authenticate with mechanism 'CRAM-MD5' I0610 20:04:48.666443 24592 authenticator.cpp:197] Received SASL authentication start I0610 20:04:48.666591 24592 authenticator.cpp:319] Authentication requires more steps I0610 20:04:48.666779 24592 authenticatee.hpp:276] Received SASL authentication step I0610 20:04:48.667007 24585 authenticator.cpp:225] Received SASL authentication step I0610 20:04:48.667043 24585 auxprop.cpp:101] Request to lookup properties for user: 'test-principal' realm: 'dbade881e927' server FQDN: 'dbade881e927' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0610 20:04:48.667058 24585 auxprop.cpp:173] Looking up auxiliary property '*userPassword' I0610 20:04:48.667110 24585 auxprop.cpp:173] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0610 20:04:48.667142 24585 auxprop.cpp:101] Request to lookup properties for user: 'test-principal' realm: 'dbade881e927' server FQDN: 'dbade881e927' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0610 20:04:48.667155 24585 auxprop.cpp:123] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0610 20:04:48.667163 24585 auxprop.cpp:123] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0610 20:04:48.667181 24585 authenticator.cpp:311] Authentication success I0610 20:04:48.667331 24585 authenticatee.hpp:316] Authentication success I0610 20:04:48.667414 24585 master.cpp:4211] Successfully authenticated principal 'test-principal' at slave(42)@172.17.0.231:32907 I0610 20:04:48.667505 24585 authenticator.cpp:424] Authentication session cleanup for crammd5_authenticatee(130)@172.17.0.231:32907 I0610 20:04:48.667809 24585 slave.cpp:812] Successfully authenticated with master master@172.17.0.231:32907 I0610 20:04:48.667982 24585 slave.cpp:1146] Will retry registration in 7.257154ms if necessary I0610 20:04:48.668226 24585 master.cpp:3157] Registering slave at slave(42)@172.17.0.231:32907 (dbade881e927) with id 20150610-200448-3875541420-32907-24561-S0 I0610 20:04:48.668737 24585 registrar.cpp:445] Applied 1 operations in 90255ns; attempting to update the 'registry' I0610 20:04:48.672297 24585 log.cpp:685] Attempting to append 305 bytes to the log I0610 20:04:48.672541 24585 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 3 I0610 20:04:48.673528 24593 replica.cpp:511] Replica received write request for position 3 I0610 20:04:48.674321 24593 leveldb.cpp:343] Persisting action (324 bytes) to leveldb took 766804ns I0610 20:04:48.674355 24593 replica.cpp:679] Persisted action at 3 I0610 20:04:48.675138 24587 replica.cpp:658] Replica received learned notice for position 3 I0610 20:04:48.675866 24587 leveldb.cpp:343] Persisting action (326 bytes) to leveldb took 714643ns I0610 20:04:48.675897 24587 replica.cpp:679] Persisted action at 3 I0610 20:04:48.675922 24587 replica.cpp:664] Replica learned APPEND action at position 3 I0610 20:04:48.677471 24587 registrar.cpp:490] Successfully updated the 'registry' in 8.656128ms I0610 20:04:48.677759 24587 log.cpp:704] Attempting to truncate the log to 3 I0610 20:04:48.678423 24593 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 4 I0610 20:04:48.678621 24587 master.cpp:3214] Registered slave 20150610-200448-3875541420-32907-24561-S0 at slave(42)@172.17.0.231:32907 (dbade881e927) with cpus(*):1000; mem(*):1000; disk(*):3.70122e+06; ports(*):[31000-32000] I0610 20:04:48.678959 24593 hierarchical.hpp:496] Added slave 20150610-200448-3875541420-32907-24561-S0 (dbade881e927) with cpus(*):1000; mem(*):1000; disk(*):3.70122e+06; ports(*):[31000-32000] (and cpus(*):1000; mem(*):1000; disk(*):3.70122e+06; ports(*):[31000-32000] available) I0610 20:04:48.679157 24593 hierarchical.hpp:933] No resources available to allocate! I0610 20:04:48.679183 24593 hierarchical.hpp:852] Performed allocation for slave 20150610-200448-3875541420-32907-24561-S0 in 175519ns I0610 20:04:48.679805 24593 replica.cpp:511] Replica received write request for position 4 I0610 20:04:48.684160 24587 slave.cpp:846] Registered with master master@172.17.0.231:32907; given slave ID 20150610-200448-3875541420-32907-24561-S0 I0610 20:04:48.684229 24587 fetcher.cpp:77] Clearing fetcher cache I0610 20:04:48.684666 24587 slave.cpp:869] Checkpointing SlaveInfo to '/tmp/FetcherCacheTest_LocalCachedExtract_LCHuuM/meta/slaves/20150610-200448-3875541420-32907-24561-S0/slave.info' I0610 20:04:48.687366 24587 slave.cpp:2895] Received ping from slave-observer(42)@172.17.0.231:32907 I0610 20:04:48.687453 24584 status_update_manager.cpp:182] Resuming sending status updates I0610 20:04:48.690901 24593 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 3.385583ms I0610 20:04:48.690975 24593 replica.cpp:679] Persisted action at 4 I0610 20:04:48.692137 24593 replica.cpp:658] Replica received learned notice for position 4 I0610 20:04:48.692603 24593 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 449838ns I0610 20:04:48.692674 24593 leveldb.cpp:401] Deleting ~2 keys from leveldb took 52471ns I0610 20:04:48.692699 24593 replica.cpp:679] Persisted action at 4 I0610 20:04:48.692726 24593 replica.cpp:664] Replica learned TRUNCATE action at position 4 I0610 20:04:48.693544 24561 sched.cpp:157] Version: 0.23.0 I0610 20:04:48.695550 24590 sched.cpp:254] New master detected at master@172.17.0.231:32907 I0610 20:04:48.697090 24590 sched.cpp:310] Authenticating with master master@172.17.0.231:32907 I0610 20:04:48.697136 24590 sched.cpp:317] Using default CRAM-MD5 authenticatee I0610 20:04:48.697511 24586 authenticatee.hpp:139] Creating new client SASL connection I0610 20:04:48.697937 24586 master.cpp:4181] Authenticating scheduler-51f5c1b5-bb50-4118-bde8-4dcdfd69205d@172.17.0.231:32907 I0610 20:04:48.698185 24584 authenticator.cpp:406] Starting authentication session for crammd5_authenticatee(131)@172.17.0.231:32907 I0610 20:04:48.698575 24584 authenticator.cpp:92] Creating new server SASL connection I0610 20:04:48.698807 24584 authenticatee.hpp:230] Received SASL authentication mechanisms: CRAM-MD5 I0610 20:04:48.699898 24584 authenticatee.hpp:256] Attempting to authenticate with mechanism 'CRAM-MD5' I0610 20:04:48.700040 24584 authenticator.cpp:197] Received SASL authentication start I0610 20:04:48.700119 24584 authenticator.cpp:319] Authentication requires more steps I0610 20:04:48.700193 24584 authenticatee.hpp:276] Received SASL authentication step I0610 20:04:48.700287 24584 authenticator.cpp:225] Received SASL authentication step I0610 20:04:48.700320 24584 auxprop.cpp:101] Request to lookup properties for user: 'test-principal' realm: 'dbade881e927' server FQDN: 'dbade881e927' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0610 20:04:48.700333 24584 auxprop.cpp:173] Looking up auxiliary property '*userPassword' I0610 20:04:48.700392 24584 auxprop.cpp:173] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0610 20:04:48.700425 24584 auxprop.cpp:101] Request to lookup properties for user: 'test-principal' realm: 'dbade881e927' server FQDN: 'dbade881e927' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0610 20:04:48.700439 24584 auxprop.cpp:123] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0610 20:04:48.700448 24584 auxprop.cpp:123] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0610 20:04:48.700467 24584 authenticator.cpp:311] Authentication success I0610 20:04:48.700640 24584 authenticatee.hpp:316] Authentication success I0610 20:04:48.700742 24584 authenticator.cpp:424] Authentication session cleanup for crammd5_authenticatee(131)@172.17.0.231:32907 I0610 20:04:48.701282 24590 sched.cpp:398] Successfully authenticated with master master@172.17.0.231:32907 I0610 20:04:48.701315 24590 sched.cpp:521] Sending registration request to master@172.17.0.231:32907 I0610 20:04:48.701386 24590 sched.cpp:554] Will retry registration in 1.128089605secs if necessary ...",1 MESOS-2860,"Create the basic infrastructure to handle /scheduler endpoint","This is the first basic step in ensuring the basic {{/call}} functionality: processing a {noformat} POST /call {noformat} and returning: - {{202}} if all goes well; - {{401}} if not authorized; and - {{403}} if the request is malformed. We'll get more sophisticated as the work progressed (eg, supporting {{415}} if the content-type is not of the right kind).",3 MESOS-2862,"mesos-fetcher won't fetch uris which begin with a "" ""","Discovered while running mesos with marathon on top. If I launch a marathon task with a URI which is "" http://apache.osuosl.org/mesos/0.22.1/mesos-0.22.1.tar.gz"" mesos will log to stderr: {code} I0611 22:39:22.815636 35673 logging.cpp:177] Logging to STDERR I0611 22:39:25.643889 35673 fetcher.cpp:214] Fetching URI ' http://apache.osuosl.org/mesos/0.22.1/mesos-0.22.1.tar.gz' I0611 22:39:25.648111 35673 fetcher.cpp:94] Hadoop Client not available, skipping fetch with Hadoop Client Failed to fetch: http://apache.osuosl.org/mesos/0.22.1/mesos-0.22.1.tar.gz Failed to synchronize with slave (it's probably exited) {code} It would be nice if mesos trimmed leading whitespace before doing protocol detection so that simple mistakes are just fixed. ",2 MESOS-2866,"Slave should send oversubscribed resource information after master failover.","After a master failover, if the total amount of oversubscribed resources does not change, then the slave will not send the UpdateSlave message to the new master. The slave needs to send the information to the new master regardless of this.",3 MESOS-2869,"OversubscriptionTest.FixedResourceEstimator is flaky","Came up in https://reviews.apache.org/r/35395/ {code} [ RUN ] OversubscriptionTest.FixedResourceEstimator I0613 13:41:02.604904 19367 exec.cpp:132] Version: 0.23.0 I0613 13:41:02.610995 19398 exec.cpp:206] Executor registered on slave 20150613-134102-3142697795-48295-13678-S0 Registered executor on pomona.apache.org Starting task 7d78a3ef-2de9-46c9-811c-b2c0e2d50578 Forked command at 19410 sh -c 'sleep 1000' ../../src/tests/oversubscription_tests.cpp:579: Failure Mock function called more times than expected - returning directly. Function call: statusUpdate(0x7ffffbc0c4e0, @0x2ade2bffa910 96-byte object <50-3E D7-22 DE-2A 00-00 00-00 00-00 00-00 00-00 D0-C4 00-48 DE-2A 00-00 50-71 AC-01 00-00 00-00 01-00 00-00 02-00 00-00 50-71 AC-01 00-00 00-00 B0-66 00-48 DE-2A 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-2A 00-00 E7-17 A8-BB 0C-5F D5-41 10-31 01-48 DE-2A 00-00 00-00 00-00 4B-03 00-00>) Expected: to be called once Actual: called twice - over-saturated and active [ FAILED ] OversubscriptionTest.FixedResourceEstimator (714 ms) {code}",1 MESOS-2873,"style hook prevent's valid markdown files from getting committed","According to the original [markdown specification|http://daringfireball.net/projects/markdown/syntax#p] and to the most [recent standarization|http://spec.commonmark.org/0.20/#hard-line-breaks] effort, two spaces at the end of a line create a hard line break (it breaks the line without starting a new paragraph), similar to the html code {{
}}. However, there's a hook in mesos which prevent files with trailing whitespace to be committed.",1 MESOS-2874,"Convert PortMappingStatistics to use automatic JSON encoding/decoding","Simplify PortMappingStatistics by using JSON::Protocol and protobuf::parse to convert ResourceStatistics to/from line format. This change will simplify the implementation of MESOS-2332.",2 MESOS-2879,"Random recursive_mutex errors in when running make check","While running make check on OS X, from time to time {{recursive_mutex}} errors appear after running all the test successfully. Just one of the experience messages actually stops {{make check}} reporting an error. The following error messages have been experienced: {code} libc++abi.dylib: libc++abi.dylib: libc++abi.dylib: libc++abi.dylib: libc++abi.dylib: libc++abi.dylib: terminating with uncaught exception of type std::__1::system_error: recursive_mutex lock failed: Invalid argumentterminating with uncaught exception of type std::__1::system_error: recursive_mutex lock failed: Invalid argumentterminating with uncaught exception of type std::__1::system_error: recursive_mutex lock failed: Invalid argumentterminating with uncaught exception of type std::__1::system_error: recursive_mutex lock failed: Invalid argumentterminating with uncaught exception of type std::__1::system_error: recursive_mutex lock failed: Invalid argumentterminating with uncaught exception of type std::__1::system_error: recursive_mutex lock failed: Invalid argument *** Aborted at 1434553937 (unix time) try ""date -d @1434553937"" if you are using GNU date *** {code} {code} libc++abi.dylib: terminating with uncaught exception of type std::__1::system_error: recursive_mutex lock failed: Invalid argument *** Aborted at 1434557001 (unix time) try ""date -d @1434557001"" if you are using GNU date *** libc++abi.dylib: PC: @ 0x7fff93855286 __pthread_kill libc++abi.dylib: *** SIGABRT (@0x7fff93855286) received by PID 88060 (TID 0x10fc40000) stack trace: *** @ 0x7fff8e1d6f1a _sigtramp libc++abi.dylib: @ 0x10fc3f1a8 (unknown) libc++abi.dylib: @ 0x7fff979deb53 abort libc++abi.dylib: libc++abi.dylib: libc++abi.dylib: terminating with uncaught exception of type std::__1::system_error: recursive_mutex lock failed: Invalid argumentterminating with uncaught exception of type std::__1::system_error: recursive_mutex lock failed: Invalid argumentterminating with uncaught exception of type std::__1::system_error: recursive_mutex lock failed: Invalid argumentterminating with uncaught exception of type std::__1::system_error: recursive_mutex lock failed: Invalid argumentterminating with uncaught exception of type std::__1::system_error: recursive_mutex lock failed: Invalid argumentterminating with uncaught exception of type std::__1::system_error: recursive_mutex lock failed: Invalid argumentMaking check in include {code} {code} Assertion failed: (e == 0), function ~recursive_mutex, file /SourceCache/libcxx/libcxx-120/src/mutex.cpp, line 82. *** Aborted at 1434555685 (unix time) try ""date -d @1434555685"" if you are using GNU date *** PC: @ 0x7fff93855286 __pthread_kill *** SIGABRT (@0x7fff93855286) received by PID 60235 (TID 0x7fff7ebdc300) stack trace: *** @ 0x7fff8e1d6f1a _sigtramp @ 0x10b512350 google::CheckNotNull<>() @ 0x7fff979deb53 abort @ 0x7fff979a6c39 __assert_rtn @ 0x7fff9bffdcc9 std::__1::recursive_mutex::~recursive_mutex() @ 0x10b881928 process::ProcessManager::~ProcessManager() @ 0x10b874445 process::ProcessManager::~ProcessManager() @ 0x10b874418 process::finalize() @ 0x10b2f7aec main @ 0x7fff98edc5c9 start make[5]: *** [check-local] Abort trap: 6 make[4]: *** [check-am] Error 2 make[3]: *** [check-recursive] Error 1 make[2]: *** [check-recursive] Error 1 make[1]: *** [check] Error 2 make: *** [check-recursive] Error 1 {code}",1 MESOS-2883,"Do not call hook manager if no hooks installed","Hooks modules allow us to provide decorators during various aspects of a task lifecycle such as label decorator, environment decorator, etc. Often the call into such a decorator hooks results in a new copy of labels, environment, etc., being returned to the call site. This is an unnecessary overhead if there are no hooks installed. The proper way would be to call decorators via the hook manager only if there are some hooks installed. This would prevent unnecessary copying overhead if no hooks are available.",2 MESOS-2884,"Allow isolators to specify required namespaces","Currently, the LinuxLauncher looks into SlaveFlags to compute the namespaces that should be enabled when launching the executor. This means that a custom Isolator module doesn't have any way to specify dependency on a set of namespaces. The proposed solution is to extend the Isolator interface to also export the namespaces dependency. This way the MesosContainerizer can directly query all loaded Isolators (inbuilt and custom modules) to compute the set of namespaces required by the executor. This set of namespaces is then passed on to the LinuxLauncher. ",5 MESOS-2886,"Capture some testing patterns we use in a doc","In Mesos tests we use some tricks and patterns to express certain expectations. These are not always obvious and not documented. The intent of the ticket is to kick-start the document with the description of those tricks for posterity.",1 MESOS-2888,"Add SSL socket tests","commit beac384c77d4a9c235a813e9286716f4509bdd55 Author: Joris Van Remoortere Date: Fri Jun 26 18:30:12 2015 -0700 Add SSL tests. Review: https://reviews.apache.org/r/35889",5 MESOS-2889,"Add SSL switch to python configuration","The python egg requires explicit dependencies for SSL. Add these to the python configuration if ssl is enabled.",3 MESOS-2890,"Sandbox URL doesn't work in web-ui when using SSL","The links to the sandbox in the web ui don't work when ssl is enabled. This can happen if the certificate for the master and the slave do not match. This is a consequence of the redirection that happens when serving files. The resolution to this is currently to set up your certificates to serve the hostnames of the master and slaves.",3 MESOS-2891,"Performance regression in hierarchical allocator.","For large clusters, the 0.23.0 allocator cannot keep up with the volume of slaves. After the following slave was re-registered, it took the allocator a long time to work through the backlog of slaves to add: {noformat:title=45 minute delay} I0618 18:55:40.738399 10172 master.cpp:3419] Re-registered slave 20150422-211121-2148346890-5050-3253-S4695 I0618 19:40:14.960636 10164 hierarchical.hpp:496] Added slave 20150422-211121-2148346890-5050-3253-S4695 {noformat} Empirically, [addSlave|https://github.com/apache/mesos/blob/dda49e688c7ece603ac7a04a977fc7085c713dd1/src/master/allocator/mesos/hierarchical.hpp#L462] and [updateSlave|https://github.com/apache/mesos/blob/dda49e688c7ece603ac7a04a977fc7085c713dd1/src/master/allocator/mesos/hierarchical.hpp#L533] have become expensive. Some timings from a production cluster reveal that the allocator spending in the low tens of milliseconds for each call to {{addSlave}} and {{updateSlave}}, when there are tens of thousands of slaves this amounts to the large delay seen above. We also saw a slow steady increase in memory consumption, hinting further at a queue backup in the allocator. A synthetic benchmark like we did for the registrar would be prudent here, along with visibility into the allocator's queue size.",3 MESOS-2892,"Add benchmark for hierarchical allocator.","In light of the performance regression in MESOS-2891, we'd like to have a synthetic benchmark of the allocator code, in order to analyze and direct improvements.",3 MESOS-2893,"Add queue size metrics for the allocator.","In light of the performance regression in MESOS-2891, we'd like to have visibility into the queue size of the allocator. This will enable alerting on performance problems. We currently have no metrics in the allocator. I will also look into MESOS-1286 now that we have gcc 4.8, current queue size gauges require a trip through the Process' queue.",1 MESOS-2898,"Write tests for new JSON (ZooKeeper) functionality","Follow up from MESOS-2340, need to ensure this does not break the ZooKeeper discovery functionality.",2 MESOS-2902,"Enable Mesos to use arbitrary script / module to figure out IP, HOSTNAME","Currently Mesos tries to guess the IP, HOSTNAME by doing a reverse DNS lookup. This doesn't work on a lot of clouds as we want things like public IPs (which aren't the default DNS), there aren't FQDN names (Azure), or the correct way to figure it out is to call some cloud-specific endpoint. If Mesos / Libprocess could load a mesos-module (Or run a script) which is provided per-cloud, we can figure out perfectly the IP / Hostname for the given environment. It also means we can ship one identical set of files to all hosts in a given provider which doesn't happen to have the DNS scheme + hostnames that libprocess/Mesos expects. Currently we have to generate host-specific config files which Mesos uses to guess. The host-specific files break / fall apart if machines change IP / hostname without being reinstalled.",5 MESOS-2903,"Network isolator should not fail when target state already exists","Network isolator has multiple instances of the following pattern: {noformat} Try something = ....::create(); if (something.isError()) { ++metrics.something_errors; return Failure(""Failed to create something ..."") } else if (!icmpVethToEth0.get()) { ++metrics.adding_veth_icmp_filters_already_exist; return Failure(""Something already exists""); } {noformat} These failures have occurred in operation due to the failure to recover or delete an orphan, causing the slave to remain on line but unable to create new resources. We should convert the second failure message in this pattern to an information message since the final state of the system is the state that we requested.",3 MESOS-2904,"Add slave metric to count container launch failures","We have seen circumstances where a machine has been consistently unable to launch containers due to an inconsistent state (for example, unexpected network configuration). Adding a metric to track container launch failures will allow us to detect and alert on slaves in such a state.",1 MESOS-2906,"Slave : Synchronous Validation for Calls","/call endpoint on the slave will return a 202 accepted code but has to do some basic validations before. In case of invalidation it will return a {{BadRequest}} back to the client. - We need to create the required infrastructure to validate the request and then process it similar to {{src/master/validation.cpp}} in the {{namespace scheduler}} i.e. check if the protobuf is properly initialized, has the required attributes set pertaining to the call message etc.",3 MESOS-2907,"Agent : Create Basic Functionality to handle /call endpoint","This is the first basic step in ensuring the basic /call functionality: - Set up the route on the agent for ""api/v1/executor"" endpoint. - The endpoint should perform basic header/protobuf validation and return {{501 NotImplemented}} for now. - Introduce initial tests in executor_api_tests.cpp that just verify the status code. ",5 MESOS-2909,"Add version field to RegisterFrameworkMessage and ReregisterFrameworkMessage","In the same way we added 'version' field to RegisterSlaveMessage and ReregisterSlaveMessage, we should do it framework (re-)registration messages. This would help master determine which version of scheduler driver it is talking to. We want this so that master can start sending Event messages to the scheduler driver (and scheduler library). In the long term, master will send a streaming response to the libraries, but in the meantime we can test the event protobufs by sending Event messages.",3 MESOS-2910,"Add an Event message handler to scheduler driver","Adding this handler lets master send Event messages to the driver. See MESOS-2909 for additional context.",8 MESOS-2911,"Add an Event message handler to scheduler library","Adding this handler lets master send Event messages to the library. See MESOS-2909 for additional context. This ticket only tracks the installation of the handler and maybe handling of a single event for testing. Additional events handling will be captured in a different ticket(s).",3 MESOS-2912,"Provide a Python library for master detection","When schedulers start interacting with Mesos master via HTTP endpoints, they need a way to detect masters. Mesos should provide a master detection Python library to make this easy for frameworks.",5 MESOS-2913,"Scheduler driver should send Call messages to the master","To vet the new Call protobufs, it is prudent to have the scheduler driver (sched.cpp) send Call messages to the master (similar to what we are doing with the scheduler library).",8 MESOS-2914,"Port mapping isolator should cleanup unknown orphan containers after all known orphan containers are recovered during recovery.","Otherwise, the icmp/arp filter on host eth0 might be removed as a result of _cleanup if 'infos' is empty, causing subsequent '_cleanup' to fail on both known/unknown orphan containers. {noformat} I0612 17:46:51.518501 16308 containerizer.cpp:314] Recovering containerizer I0612 17:46:51.520612 16308 port_mapping.cpp:1567] Discovered network namespace handle symlink ddcb8397-3552-44f9-bc99-b5b69aa72944 -> 31607 I0612 17:46:51.521183 16308 port_mapping.cpp:1567] Discovered network namespace handle symlink d8c48a4a-fdfb-47dd-b8d8-07188c21600d -> 41020 I0612 17:46:51.521883 16308 port_mapping.cpp:1567] Discovered network namespace handle symlink 8953fc7f-9fca-4931-b0cb-2f4959ddee74 -> 3302 I0612 17:46:51.522542 16308 port_mapping.cpp:1567] Discovered network namespace handle symlink 50f9986f-ebbc-440d-86a7-9fa1a7c55a75 -> 19805 I0612 17:46:51.523643 16308 port_mapping.cpp:2597] Removing IP packet filters with ports [33792,34815] for container with pid 52304 I0612 17:46:51.525063 16308 port_mapping.cpp:2616] Freed ephemeral ports [33792,34816) for container with pid 52304 I0612 17:46:51.547696 16308 port_mapping.cpp:2762] Successfully performed cleanup for pid 52304 I0612 17:46:51.550027 16308 port_mapping.cpp:1698] Network isolator recovery complete I0612 17:46:51.550946 16329 containerizer.cpp:449] Removing orphan container 111ea69c-6184-4da1-a0e9-c34e8c6deb30 I0612 17:46:51.552686 16329 containerizer.cpp:449] Removing orphan container ddcb8397-3552-44f9-bc99-b5b69aa72944 I0612 17:46:51.552734 16309 cgroups.cpp:2377] Freezing cgroup /sys/fs/cgroup/freezer/mesos/111ea69c-6184-4da1-a0e9-c34e8c6deb30 I0612 17:46:51.554932 16329 containerizer.cpp:449] Removing orphan container 8953fc7f-9fca-4931-b0cb-2f4959ddee74 I0612 17:46:51.555032 16309 cgroups.cpp:2377] Freezing cgroup /sys/fs/cgroup/freezer/mesos/ddcb8397-3552-44f9-bc99-b5b69aa72944 I0612 17:46:51.555629 16308 cgroups.cpp:1420] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos/111ea69c-6184-4da1-a0e9-c34e8c6deb30 after 1.730304ms I0612 17:46:51.557507 16329 containerizer.cpp:449] Removing orphan container 50f9986f-ebbc-440d-86a7-9fa1a7c55a75 I0612 17:46:51.557611 16309 cgroups.cpp:2377] Freezing cgroup /sys/fs/cgroup/freezer/mesos/8953fc7f-9fca-4931-b0cb-2f4959ddee74 I0612 17:46:51.557896 16313 cgroups.cpp:1420] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos/ddcb8397-3552-44f9-bc99-b5b69aa72944 after 1.685248ms I0612 17:46:51.559412 16310 cgroups.cpp:2394] Thawing cgroup /sys/fs/cgroup/freezer/mesos/111ea69c-6184-4da1-a0e9-c34e8c6deb30 I0612 17:46:51.561564 16329 containerizer.cpp:449] Removing orphan container d8c48a4a-fdfb-47dd-b8d8-07188c21600d I0612 17:46:51.562489 16315 cgroups.cpp:2377] Freezing cgroup /sys/fs/cgroup/freezer/mesos/50f9986f-ebbc-440d-86a7-9fa1a7c55a75 I0612 17:46:51.562988 16313 cgroups.cpp:2394] Thawing cgroup /sys/fs/cgroup/freezer/mesos/ddcb8397-3552-44f9-bc99-b5b69aa72944 I0612 17:46:51.563303 16310 cgroups.cpp:1449] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/111ea69c-6184-4da1-a0e9-c34e8c6deb30 after 2.076928ms I0612 17:46:51.566052 16308 cgroups.cpp:2377] Freezing cgroup /sys/fs/cgroup/freezer/mesos/d8c48a4a-fdfb-47dd-b8d8-07188c21600d I0612 17:46:51.566102 16313 slave.cpp:3911] Finished recovery W0612 17:46:51.566432 16323 disk.cpp:299] Ignoring cleanup for unknown container 111ea69c-6184-4da1-a0e9-c34e8c6deb30 I0612 17:46:51.566651 16317 cgroups.cpp:1449] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/ddcb8397-3552-44f9-bc99-b5b69aa72944 after 2.12096ms I0612 17:46:51.566987 16313 slave.cpp:3944] Garbage collecting old slave 20150319-213133-2080910346-5050-57551-S3314 I0612 17:46:51.567777 16318 cgroups.cpp:1420] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos/d8c48a4a-fdfb-47dd-b8d8-07188c21600d after 1.323008ms W0612 17:46:51.568042 16323 port_mapping.cpp:2544] Ignoring cleanup for unknown container 111ea69c-6184-4da1-a0e9-c34e8c6deb30 I0612 17:46:51.569522 16311 gc.cpp:56] Scheduling '/var/lib/mesos/slaves/20150319-213133-2080910346-5050-57551-S3314' for gc 6.99999341503407days in the future W0612 17:46:51.569725 16329 disk.cpp:299] Ignoring cleanup for unknown container ddcb8397-3552-44f9-bc99-b5b69aa72944 I0612 17:46:51.570911 16325 cgroups.cpp:2394] Thawing cgroup /sys/fs/cgroup/freezer/mesos/d8c48a4a-fdfb-47dd-b8d8-07188c21600d I0612 17:46:51.573581 16316 port_mapping.cpp:2597] Removing IP packet filters with ports [35840,36863] for container with pid 31607 I0612 17:46:51.575127 16316 port_mapping.cpp:2616] Freed ephemeral ports [35840,36864) for container with pid 31607 I0612 17:46:51.588284 16330 cgroups.cpp:1449] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/d8c48a4a-fdfb-47dd-b8d8-07188c21600d after 14.503936ms E0612 17:46:51.622140 16310 containerizer.cpp:480] Failed to clean up an isolator when destroying orphan container ddcb8397-3552-44f9-bc99-b5b69aa72944: The ICMP packet filter on host eth0 does not exist, The ARP packet filter on host eth0 does not exist W0612 17:46:51.773123 16313 disk.cpp:299] Ignoring cleanup for unknown container d8c48a4a-fdfb-47dd-b8d8-07188c21600d I0612 17:46:51.774153 16325 port_mapping.cpp:2597] Removing IP packet filters with ports [32768,33791] for container with pid 41020 I0612 17:46:51.775167 16325 port_mapping.cpp:2616] Freed ephemeral ports [32768,33792) for container with pid 41020 E0612 17:46:51.817221 16323 containerizer.cpp:480] Failed to clean up an isolator when destroying orphan container d8c48a4a-fdfb-47dd-b8d8-07188c21600d: The ICMP packet filter on host eth0 does not exist, The ARP packet filter on host eth0 does not exist I0612 17:46:51.872231 16314 cgroups.cpp:1420] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos/50f9986f-ebbc-440d-86a7-9fa1a7c55a75 after 308.33792ms I0612 17:46:51.874572 16314 cgroups.cpp:2394] Thawing cgroup /sys/fs/cgroup/freezer/mesos/50f9986f-ebbc-440d-86a7-9fa1a7c55a75 I0612 17:46:51.876566 16314 cgroups.cpp:1449] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/50f9986f-ebbc-440d-86a7-9fa1a7c55a75 after 1.593344ms 2015-06-12 17:46:54,833:16307(0x7f172eb07940):ZOO_INFO@auth_completion_func@1286: Authentication scheme digest succeeded I0612 17:46:54.835737 16321 group.cpp:385] Trying to create path '/home/mesos/prod/master' in ZooKeeper I0612 17:46:54.839110 16321 detector.cpp:138] Detected a new leader: (id='1') I0612 17:46:54.840276 16330 group.cpp:659] Trying to get '/home/mesos/prod/master/info_0000000001' in ZooKeeper I0612 17:46:54.842350 16330 detector.cpp:452] A new leading master (UPID=master@10.44.14.132:5050) is detected I0612 17:46:54.843297 16330 slave.cpp:653] New master detected at master@10.44.14.132:5050 I0612 17:46:54.843298 16312 status_update_manager.cpp:171] Pausing sending status updates I0612 17:46:54.844091 16330 slave.cpp:678] No credentials provided. Attempting to register without authentication I0612 17:46:54.845087 16330 slave.cpp:689] Detecting new master I0612 17:47:01.561920 16309 cgroups.cpp:2394] Thawing cgroup /sys/fs/cgroup/freezer/mesos/8953fc7f-9fca-4931-b0cb-2f4959ddee74 I0612 17:47:01.564687 16309 cgroups.cpp:1449] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/8953fc7f-9fca-4931-b0cb-2f4959ddee74 after 1.924096ms I0612 17:47:01.565467 16309 cgroups.cpp:2377] Freezing cgroup /sys/fs/cgroup/freezer/mesos/8953fc7f-9fca-4931-b0cb-2f4959ddee74 W0612 17:47:09.978946 16326 disk.cpp:299] Ignoring cleanup for unknown container 50f9986f-ebbc-440d-86a7-9fa1a7c55a75 I0612 17:47:09.979818 16327 port_mapping.cpp:2597] Removing IP packet filters with ports [34816,35839] for container with pid 19805 I0612 17:47:09.981474 16327 port_mapping.cpp:2616] Freed ephemeral ports [34816,35840) for container with pid 19805 E0612 17:47:10.278715 16325 containerizer.cpp:480] Failed to clean up an isolator when destroying orphan container 50f9986f-ebbc-440d-86a7-9fa1a7c55a75: The ICMP packet filter on host eth0 does not exist, The ARP packet filter on host eth0 does not exist I0612 17:47:11.568151 16326 cgroups.cpp:2394] Thawing cgroup /sys/fs/cgroup/freezer/mesos/8953fc7f-9fca-4931-b0cb-2f4959ddee74 I0612 17:47:11.570915 16326 cgroups.cpp:1449] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/8953fc7f-9fca-4931-b0cb-2f4959ddee74 after 1.987072ms I0612 17:47:11.571728 16326 cgroups.cpp:2377] Freezing cgroup /sys/fs/cgroup/freezer/mesos/8953fc7f-9fca-4931-b0cb-2f4959ddee74 I0612 17:47:14.549536 16316 slave.cpp:821] Registered with master master@10.44.14.132:5050; given slave ID 20150602-190100-2215521290-5050-39399-S23257 I0612 17:47:14.550220 16318 status_update_manager.cpp:178] Resuming sending status updates I0612 17:47:21.574513 16319 cgroups.cpp:2394] Thawing cgroup /sys/fs/cgroup/freezer/mesos/8953fc7f-9fca-4931-b0cb-2f4959ddee74 I0612 17:47:21.576817 16319 cgroups.cpp:1449] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/8953fc7f-9fca-4931-b0cb-2f4959ddee74 after 1.587968ms I0612 17:47:21.577466 16319 cgroups.cpp:2377] Freezing cgroup /sys/fs/cgroup/freezer/mesos/8953fc7f-9fca-4931-b0cb-2f4959ddee74 I0612 17:47:31.580281 16310 cgroups.cpp:2394] Thawing cgroup /sys/fs/cgroup/freezer/mesos/8953fc7f-9fca-4931-b0cb-2f4959ddee74 I0612 17:47:31.582365 16310 cgroups.cpp:1449] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/8953fc7f-9fca-4931-b0cb-2f4959ddee74 after 1.410048ms I0612 17:47:31.582895 16310 cgroups.cpp:2377] Freezing cgroup /sys/fs/cgroup/freezer/mesos/8953fc7f-9fca-4931-b0cb-2f4959ddee74 I0612 17:47:41.585619 16322 cgroups.cpp:2394] Thawing cgroup /sys/fs/cgroup/freezer/mesos/8953fc7f-9fca-4931-b0cb-2f4959ddee74 I0612 17:47:41.587703 16322 cgroups.cpp:1449] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/8953fc7f-9fca-4931-b0cb-2f4959ddee74 after 1.418752ms I0612 17:47:41.588436 16322 cgroups.cpp:2377] Freezing cgroup /sys/fs/cgroup/freezer/mesos/8953fc7f-9fca-4931-b0cb-2f4959ddee74 I0612 17:47:51.515689 16330 slave.cpp:3733] Current disk usage 11.43%. Max allowed age: 5.499938588861215days E0612 17:47:51.557831 16330 containerizer.cpp:468] Failed to destroy orphan container 8953fc7f-9fca-4931-b0cb-2f4959ddee74: Timed out after 1mins {noformat}",3 MESOS-2917,"Specify correct libnl version for configure check","Currently configure.ac lists 3.2.24 as the required libnl version. However, https://reviews.apache.org/r/31503 caused the minimum required version to be bumped to 3.2.26. The configure check thus fails to error out during execution and the dependency is captured only during the build step.",1 MESOS-2919,"Framework can overcommit oversubscribable resources during master failover.","This is due to a bug in the hierarchical allocator. Here is the sequence of events: 1) slave uses a fixed resource estimator which advertise 4 revocable cpus 2) a framework A launches a task that uses all the 4 revocable cpus 3) master fails over 4) slave re-registers with the new master, and sends UpdateSlaveMessage with 4 revocable cpus as oversubscribed resources 5) framework A hasn't registered yet, therefore, the slave's available resources will be 4 revocable cpus 6) framework A registered and will receive an additional 4 revocable cpus. So it can launch another task with 4 revocable cpus (that means 8 total!) The problem is due to the way we calculate 'allocated' resource in allocator when 'updateSlave'. If the framework is not registered, the 'allocation' below is not accurate (check that if block in 'addSlave'). {code} template void HierarchicalAllocatorProcess::updateSlave( const SlaveID& slaveId, const Resources& oversubscribed) { CHECK(initialized); CHECK(slaves.contains(slaveId)); // Check that all the oversubscribed resources are revocable. CHECK_EQ(oversubscribed, oversubscribed.revocable()); // Update the total resources. // First remove the old oversubscribed resources from the total. slaves[slaveId].total -= slaves[slaveId].total.revocable(); // Now add the new estimate of oversubscribed resources. slaves[slaveId].total += oversubscribed; // Now, update the total resources in the role sorter. roleSorter->update( slaveId, slaves[slaveId].total.unreserved()); // Calculate the current allocation of oversubscribed resources. Resources allocation; foreachkey (const std::string& role, roles) { allocation += roleSorter->allocation(role, slaveId).revocable(); } // Update the available resources. // First remove the old oversubscribed resources from available. slaves[slaveId].available -= slaves[slaveId].available.revocable(); // Now add the new estimate of available oversubscribed resources. slaves[slaveId].available += oversubscribed - allocation; LOG(INFO) << ""Slave "" << slaveId << "" ("" << slaves[slaveId].hostname << "") updated with oversubscribed resources "" << oversubscribed << "" (total: "" << slaves[slaveId].total << "", available: "" << slaves[slaveId].available << "")""; allocate(slaveId); } template void HierarchicalAllocatorProcess::addSlave( const SlaveID& slaveId, const SlaveInfo& slaveInfo, const Resources& total, const hashmap& used) { CHECK(initialized); CHECK(!slaves.contains(slaveId)); roleSorter->add(slaveId, total.unreserved()); foreachpair (const FrameworkID& frameworkId, const Resources& allocated, used) { if (frameworks.contains(frameworkId)) { const std::string& role = frameworks[frameworkId].role; // TODO(bmahler): Validate that the reserved resources have the // framework's role. roleSorter->allocated(role, slaveId, allocated.unreserved()); frameworkSorters[role]->add(slaveId, allocated); frameworkSorters[role]->allocated( frameworkId.value(), slaveId, allocated); } } ... } {code}",3 MESOS-2920,"Add move constructors / assignment to Try.","Now that we have C++11, let's add move constructors and move assignment operators for Try, similarly to what was done for Option.",3 MESOS-2921,"Add move constructors / assignment to Result.","Now that we have C++11, let's add move constructors and move assignment operators for Result, similarly to what was done for Option.",3 MESOS-2922,"Add move constructors / assignment to Future.","Now that we have C++11, let's add move constructors and move assignment operators for Future, similarly to what was done for Option. There is currently one move constructor for Future, but not for T, U, and no assignment operator.",3 MESOS-2923,"fetcher.cpp - problem with certificates..?","Mesos 0.22.0/0.22.1 built and installed from sources accordingly to the instructions given [here|http://mesos.apache.org/gettingstarted/] has some problem with certificates. Every time I try to deploy something that requires downloading any resource via HTTPS (with URI specified via Marathon), such deployment fails and I get this message in failed app's sandbox: {code} E0617 09:58:44.339409 12380 fetcher.cpp:138] Error downloading resource: Problem with the SSL CA cert (path? access rights?) {code} Trying to download the same resource on the same slave with {{curl}} or {{wget}} works without problems. Moreover, when I install exactly the same version of Mesos from Mesosphere's debs on identical machines (i.e., set up by the same Ansible scripts), everything works fine as well. I guess it must be something related to the way how Mesos is built - maybe some missing switch for {{configure}} or {{make}}..? Any ideas..?",2 MESOS-2925,"Invalid usage of ATOMIC_FLAG_INIT in member initialization","The C++ specification states: The macro ATOMIC_FLAG_INIT shall be defined in such a way that it can be used to initialize an object of type atomic_flag to the clear state. The macro can be used in the form: ""atomic_flag guard = ATOMIC_FLAG_INIT; ""It is unspecified whether the macro can be used in other initialization contexts."" Clang catches this (although reports it erroneously as a braced scaled init issue) and refuses to compile libprocess.",1 MESOS-2926,"Extend mesos-style.py/cpplint.py to check #include files","cpplint.py provides the capability to enforce the style guide requirements for #including everything you use and ordering files based on type but it does not work for mesos because we do use #include <...> for project files where it expects #include ""..."". We should update the style checker to support our include usage and then turn it on by default in the commit hook.",1 MESOS-2928,"Update stout #include headers","Update stout to #include headers for symbols we rely on and reorder to comply with the style guide.",2 MESOS-2936,"Create a design document for Quota support in Master","Create a design document for the Quota feature support in Mesos Master (excluding allocator) to be shared with the Mesos community. Design Doc: https://docs.google.com/document/d/16iRNmziasEjVOblYp5bbkeBZ7pnjNlaIzPQqMTHQ-9I/edit?usp=sharing",8 MESOS-2937,"Initial design document for Quota support in Allocator.","Create a design document for the Quota feature support in the built-in Hierarchical DRF allocator to be shared with the Mesos community.",5 MESOS-2938,"Linux docker inspect crashes","On linux, when a simple task is being executed on docker container executor, the sandbox stderr shows a backtrace: *** Aborted at 1435254156 (unix time) try ""date -d @1435254156"" if you are using GNU date *** PC: @ 0x7ffff2b1364d (unknown) *** SIGSEGV (@0xfffffffffffffff8) received by PID 88424 (TID 0x7fffe88fb700) from PID 18446744073709551608; stack trace: *** @ 0x7ffff25a4340 (unknown) @ 0x7ffff2b1364d (unknown) @ 0x7ffff2b724df (unknown) @ 0x4a6466 Docker::Container::~Container() @ 0x7ffff5bfa49a Option<>::~Option() @ 0x7ffff5c15989 Option<>::operator=() @ 0x7ffff5c09e9f Try<>::operator=() @ 0x7ffff5c09ee3 Result<>::operator=() @ 0x7ffff5c0a938 process::Future<>::set() @ 0x7ffff5bff412 process::Promise<>::set() @ 0x7ffff5be53e3 Docker::___inspect() @ 0x7ffff5be3cf8 _ZZN6Docker9__inspectERKSsRKN7process5OwnedINS2_7PromiseINS_9ContainerEEEEERK6OptionI8DurationENS2_6FutureISsEERKNS2_10SubprocessEENKUlRKSG_E1_clESL_ @ 0x7ffff5be91e9 _ZZNK7process6FutureISsE5onAnyIZN6Docker9__inspectERKSsRKNS_5OwnedINS_7PromiseINS3_9ContainerEEEEERK6OptionI8DurationES1_RKNS_10SubprocessEEUlRKS1_E1_vEESM_OT_NS1_6PreferEENUlSM_E_clESM_ @ 0x7ffff5be9d9d _ZNSt17_Function_handlerIFvRKN7process6FutureISsEEEZNKS2_5onAnyIZN6Docker9__inspectERKSsRKNS0_5OwnedINS0_7PromiseINS7_9ContainerEEEEERK6OptionI8DurationES2_RKNS0_10SubprocessEEUlS4_E1_vEES4_OT_NS2_6PreferEEUlS4_E_E9_M_invokeERKSt9_Any_dataS4_ @ 0x7ffff5c1eadd std::function<>::operator()() @ 0x7ffff5c15e07 process::Future<>::onAny() @ 0x7ffff5be93a1 _ZNK7process6FutureISsE5onAnyIZN6Docker9__inspectERKSsRKNS_5OwnedINS_7PromiseINS3_9ContainerEEEEERK6OptionI8DurationES1_RKNS_10SubprocessEEUlRKS1_E1_vEESM_OT_NS1_6PreferE @ 0x7ffff5be87f6 _ZNK7process6FutureISsE5onAnyIZN6Docker9__inspectERKSsRKNS_5OwnedINS_7PromiseINS3_9ContainerEEEEERK6OptionI8DurationES1_RKNS_10SubprocessEEUlRKS1_E1_EESM_OT_ @ 0x7ffff5be459c Docker::__inspect() @ 0x7ffff5be337c _ZZN6Docker8_inspectERKSsRKN7process5OwnedINS2_7PromiseINS_9ContainerEEEEERK6OptionI8DurationEENKUlvE_clEv @ 0x7ffff5be8c5a _ZZNK7process6FutureI6OptionIiEE5onAnyIZN6Docker8_inspectERKSsRKNS_5OwnedINS_7PromiseINS5_9ContainerEEEEERKS1_I8DurationEEUlvE_vEERKS3_OT_NS3_10LessPreferEENUlSL_E_clESL_ @ 0x7ffff5be9b36 _ZNSt17_Function_handlerIFvRKN7process6FutureI6OptionIiEEEEZNKS4_5onAnyIZN6Docker8_inspectERKSsRKNS0_5OwnedINS0_7PromiseINS9_9ContainerEEEEERKS2_I8DurationEEUlvE_vEES6_OT_NS4_10LessPreferEEUlS6_E_E9_M_invokeERKSt9_Any_dataS6_ @ 0x7ffff5c1e9b3 std::function<>::operator()() @ 0x7ffff6184a1a _ZN7process8internal3runISt8functionIFvRKNS_6FutureI6OptionIiEEEEEJRS6_EEEvRKSt6vectorIT_SaISD_EEDpOT0_ @ 0x7ffff617e64d process::Future<>::set() @ 0x7ffff6752e46 process::Promise<>::set() @ 0x7ffff675faec process::internal::cleanup() @ 0x7ffff6765293 _ZNSt5_BindIFPFvRKN7process6FutureI6OptionIiEEEPNS0_7PromiseIS3_EERKNS0_10SubprocessEESt12_PlaceholderILi1EES9_SA_EE6__callIvIS6_EILm0ELm1ELm2EEEET_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE @ 0x7ffff6764bcd _ZNSt5_BindIFPFvRKN7process6FutureI6OptionIiEEEPNS0_7PromiseIS3_EERKNS0_10SubprocessEESt12_PlaceholderILi1EES9_SA_EEclIJS6_EvEET0_DpOT_ @ 0x7ffff67642a5 _ZZNK7process6FutureI6OptionIiEE5onAnyISt5_BindIFPFvRKS3_PNS_7PromiseIS2_EERKNS_10SubprocessEESt12_PlaceholderILi1EESA_SB_EEvEES7_OT_NS3_6PreferEENUlS7_E_clES7_ @ 0x7ffff676531d _ZNSt17_Function_handlerIFvRKN7process6FutureI6OptionIiEEEEZNKS4_5onAnyISt5_BindIFPFvS6_PNS0_7PromiseIS3_EERKNS0_10SubprocessEESt12_PlaceholderILi1EESC_SD_EEvEES6_OT_NS4_6PreferEEUlS6_E_E9_M_invokeERKSt9_Any_dataS6_ @ 0x7ffff5c1e9b3 std::function<>::operator()() (END) ",1 MESOS-2939,"Testing the new workflow","This is a simple test story to try out the new workflow. Unfortunately, testing and getting it to work seems to be something that actually does take up time, so I'm tracking this here.",3 MESOS-2940,"Reconciliation is expensive for large numbers of tasks.","We've observed that both implicit and explicit reconciliation are expensive for large numbers of tasks: {noformat: title=Explicit O(100,000) tasks: 70secs} I0625 20:55:23.716320 21937 master.cpp:3863] Performing explicit task state reconciliation for N tasks of framework F (NAME) at S@IP:PORT I0625 20:56:34.812464 21937 master.cpp:5041] Removing task T with resources R of framework F on slave S at slave(1)@IP:PORT (HOST) {noformat} {noformat: title=Implicit with O(100,000) tasks: 60secs} I0625 20:25:22.310601 21936 master.cpp:3802] Performing implicit task state reconciliation for framework F (NAME) at S@IP:PORT I0625 20:26:23.874528 21921 master.cpp:218] Scheduling shutdown of slave S due to health check timeout {noformat} Let's add a benchmark to see if there are any bottlenecks here, and to guide improvements.",3 MESOS-2941,"Add a benchmark for task reconciliation.","Per MESOS-2940, it would be great to have a benchmark for task reconciliation, given large numbers of tasks. This can guide attempts at improving performance.",1 MESOS-2942,"Create documentation for using SSL",NULL,5 MESOS-2943,"mesos fails to compile under mac when libssl and libevent are enabled","../configure --enable-debug --enable-libevent --enable-ssl && make produces the following error: poll.cpp' || echo '../../../3rdparty/libprocess/'`src/libevent_poll.cpp libtool: compile: g++ -DPACKAGE_NAME=\""libprocess\"" -DPACKAGE_TARNAME=\""libprocess\"" -DPACKAGE_VERSION=\""0.0.1\"" ""-DPACKAGE_STRING=\""libprocess 0.0.1\"""" -DPACKAGE_BUGREPORT=\""\"" -DPACKAGE_URL=\""\"" -DPACKAGE=\""libprocess\"" -DVERSION=\""0.0.1\"" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\"".libs/\"" -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBCURL=1 -DHAVE_EVENT2_EVENT_H=1 -DHAVE_LIBEVENT=1 -DHAVE_EVENT2_THREAD_H=1 -DHAVE_LIBEVENT_PTHREADS=1 -DHAVE_OPENSSL_SSL_H=1 -DHAVE_LIBSSL=1 -DHAVE_LIBCRYPTO=1 -DHAVE_EVENT2_BUFFEREVENT_SSL_H=1 -DHAVE_LIBEVENT_OPENSSL=1 -DUSE_SSL_SOCKET=1 -DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBDL=1 -I. -I../../../3rdparty/libprocess -I../../../3rdparty/libprocess/include -I../../../3rdparty/libprocess/3rdparty/stout/include -I3rdparty/boost-1.53.0 -I3rdparty/libev-4.15 -I3rdparty/picojson-4f93734 -I3rdparty/glog-0.3.3/src -I3rdparty/ry-http-parser-1c3624a -I/usr/local/opt/openssl/include -I/usr/local/opt/libevent/include -I/usr/local/opt/subversion/include/subversion-1 -I/usr/include/apr-1 -I/usr/include/apr-1.0 -g1 -O0 -std=c++11 -stdlib=libc++ -DGTEST_USE_OWN_TR1_TUPLE=1 -MT libprocess_la-libevent_poll.lo -MD -MP -MF .deps/libprocess_la-libevent_poll.Tpo -c ../../../3rdparty/libprocess/src/libevent_poll.cpp -fno-common -DPIC -o libprocess_la-libevent_poll.o mv -f .deps/libprocess_la-socket.Tpo .deps/libprocess_la-socket.Plo mv -f .deps/libprocess_la-subprocess.Tpo .deps/libprocess_la-subprocess.Plo mv -f .deps/libprocess_la-libevent.Tpo .deps/libprocess_la-libevent.Plo mv -f .deps/libprocess_la-metrics.Tpo .deps/libprocess_la-metrics.Plo In file included from ../../../3rdparty/libprocess/src/libevent_ssl_socket.cpp:11: In file included from ../../../3rdparty/libprocess/include/process/queue.hpp:9: ../../../3rdparty/libprocess/include/process/future.hpp:849:7: error: no viable conversion from 'const process::Future >' to 'const process::network::Socket' set(u); ^ ../../../3rdparty/libprocess/src/libevent_ssl_socket.cpp:769:10: note: in instantiation of function template specialization 'process::Future::Future > >' requested here return accept_queue.get() ^ ../../../3rdparty/libprocess/include/process/socket.hpp:21:7: note: candidate constructor (the implicit move constructor) not viable: no known conversion from 'const process::Future >' to 'process::network::Socket &&' for 1st argument class Socket ^ ../../../3rdparty/libprocess/include/process/socket.hpp:21:7: note: candidate constructor (the implicit copy constructor) not viable: no known conversion from 'const process::Future >' to 'const process::network::Socket &' for 1st argument class Socket ^ ../../../3rdparty/libprocess/include/process/future.hpp:411:21: note: passing argument to parameter '_t' here bool set(const T& _t); ^ 1 error generated. make[4]: *** [libprocess_la-libevent_ssl_socket.lo] Error 1 make[4]: *** Waiting for unfinished jobs.... mv -f .deps/libprocess_la-libevent_poll.Tpo .deps/libprocess_la-libevent_poll.Plo mv -f .deps/libprocess_la-openssl.Tpo .deps/libprocess_la-openssl.Plo mv -f .deps/libprocess_la-process.Tpo .deps/libprocess_la-process.Plo make[3]: *** [all-recursive] Error 1 make[2]: *** [all-recursive] Error 1 make[1]: *** [all] Error 2 make: *** [all-recursive] Error 1",2 MESOS-2944,"Use of EXPECT in test and relying on the checked condition afterwards.","In docker_containerizer_test we have the following pattern. {code} EXPECT_NE(0u, offers.get().size()); const Offer& offer = offers.get()[0]; {code} As we rely on the value afterwards we should use ASSERT_NE instead. In that case the test will fail immediately. ",1 MESOS-2946,"Authorizer Module: Interface design","h4.Motivation Design an interface covering authorizer modules while staying minimally invasive in regards to changes to the existing {{LocalAuthorizer}} implementation. ",2 MESOS-2947,"Authorizer Module: Implementation, Integration & Tests","h4.Motivation Provide an example authorizer module based on the {{LocalAuthorizer}} implementation. Make sure that such authorizer module can be fully unit- and integration- tested within the mesos test suite. ",8 MESOS-2949,"Draft design for generalized Authorizer interface","As mentioned in MESOS-2948 the current {{mesos::Authorizer}} interface is rather inflexible if new _Actions_ or _Objects_ need to be added. A new API needs to be designed in a way that allows for arbitrary _Actions_ and _Objects_ to be added to the authorization mechanism without having to recompile mesos.",3 MESOS-2950,"Implement current mesos Authorizer in terms of generalized Authorizer interface","In order to maintain compatibility with existent versions of Mesos, as well as to prove the flexibility of the generalized {{mesos::Authorizer}} design, the current authorization mechanism through ACL definitions needs to run under the updated interface without any changes being noticeable by the current authorization users.",8 MESOS-2951,"Inefficient container usage collection","docker containerizer currently collects usage statistics by calling os's process statistics (eg ps ). There is scope for making this efficient, say by querying cgroups file system. ",3 MESOS-2956,"Stack trace in isolator tests on Linux VM","PerfEventIsolatorTest fails with stack trace when run in Linux VM [----------] 1 test from PerfEventIsolatorTest [ RUN ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample F0629 11:38:17.088412 14114 isolator_tests.cpp:837] CHECK_SOME(isolator): Failed to create PerfEvent isolator, invalid events: { cycles, task-clock } *** Check failure stack trace: *** @ 0x2ab5e5aeeb1a google::LogMessage::Fail() @ 0x2ab5e5aeea66 google::LogMessage::SendToLog() @ 0x2ab5e5aee468 google::LogMessage::Flush() @ 0x2ab5e5af137c google::LogMessageFatal::~LogMessageFatal() @ 0x864b0c _CheckFatal::~_CheckFatal() @ 0xc458ed mesos::internal::tests::PerfEventIsolatorTest_ROOT_CGROUPS_Sample_Test::TestBody() @ 0x119fb17 testing::internal::HandleSehExceptionsInMethodIfSupported<>() @ 0x119ac9e testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x118305f testing::Test::Run() @ 0x1183782 testing::TestInfo::Run() @ 0x1183d0a testing::TestCase::Run() @ 0x11889d4 testing::internal::UnitTestImpl::RunAllTests() @ 0x11a09ae testing::internal::HandleSehExceptionsInMethodIfSupported<>() @ 0x119b9c3 testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x11878e0 testing::UnitTest::Run() @ 0xcdc8c7 main @ 0x2ab5e7fdbec5 (unknown) @ 0x861a89 (unknown) make[3]: *** [check-local] Aborted (core dumped) [ RUN ] UserCgroupIsolatorTest/2.ROOT_CGROUPS_UserCgroup F0629 11:49:38.763434 18836 isolator_tests.cpp:1200] CHECK_SOME(isolator): Failed to create PerfEvent isolator, invalid events: { cpu-cycles } *** Check failure stack trace: *** @ 0x2ba40eb2db1a google::LogMessage::Fail() @ 0x2ba40eb2da66 google::LogMessage::SendToLog() @ 0x2ba40eb2d468 google::LogMessage::Flush() @ 0x2ba40eb3037c google::LogMessageFatal::~LogMessageFatal() @ 0x864b0c _CheckFatal::~_CheckFatal() @ 0xc5ddb1 mesos::internal::tests::UserCgroupIsolatorTest_ROOT_CGROUPS_UserCgroup_Test<>::TestBody() @ 0x119fc43 testing::internal::HandleSehExceptionsInMethodIfSupported<>() @ 0x119adca testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x118318b testing::Test::Run() @ 0x11838ae testing::TestInfo::Run() @ 0x1183e36 testing::TestCase::Run() @ 0x1188b00 testing::internal::UnitTestImpl::RunAllTests() @ 0x11a0ada testing::internal::HandleSehExceptionsInMethodIfSupported<>() @ 0x119baef testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x1187a0c testing::UnitTest::Run() @ 0xcdc9f3 main @ 0x2ba41101aec5 (unknown) @ 0x861a89 (unknown) make[3]: *** [check-local] Aborted (core dumped) ",1 MESOS-2957,"Add version to MasterInfo","This will help schedulers figure out the version of the master that they are interacting with. See MESOS-2736 for additional context.",1 MESOS-2958,"Update Call protobuf to move top level FrameworkInfo inside Subscribe","It is better for FrameworkInfo to be only included in 'Subscribe' message (that needs to be added) instead of for every call. Instead the top level Call should contain a FrameworkID to identify the framework making the call.",3 MESOS-2961,"Add cpuacct subsystem utils to cgroups","Current cgroups implementation does not have a cpuacct subsystem implementation. This subsystem reports important metrics like user and system CPU ticks spent by a process. ""cgroups"" namespace has subsystem specific utilities for ""cpu"", ""memory"" etc. It could use other subsystems specific utils (eg. cpuacct). In the future, we could also view cgroups as a mesos-subsystem with features like event notifications. Although refactoring cgroups would be a different epic, listing the possible tasks: - Have hierarchies, subsystems abstracted to represent the domain - Create ""cgroups service"" - ""cgroups service"" listen to update events from the OS on files like stats. This would be an interrupt based system(maybe use linux fsnotify) - ""cgroups service"" services events to mesos (containers for example). ",2 MESOS-2962,"Slave fails with Abort stacktrace when DNS cannot resolve hostname","If the DNS cannot resolve the hostname-to-IP for a slave node, we correctly return an {{Error}} object, but we then fail with a segfault. This code adds a more user-friendly message and exits normally (with an {{EXIT_FAILURE}} code). For example, forcing {{net::getIp()}} to always return an {{Error}}, now causes the slave to exit like this: {noformat} $ ./bin/mesos-slave.sh --master=10.10.1.121:5405 WARNING: Logging before InitGoogleLogging() is written to STDERR E0630 11:31:45.777465 1944417024 process.cpp:899] Could not obtain the IP address for stratos.local; the DNS service may not be able to resolve it: >>> Marco was here!!! $ echo $? 1 {noformat}",1 MESOS-2963,"Configure Jenkins to build ssl",NULL,5 MESOS-2964,"libprocess io does not support peek()","Finally, I so wish we could just do: {code} io::peek(request->socket, 6) .then([request](const string& data) { // Comment about the rules ... if (data.length() < 2) { // Rule 1 } else if (...) { // Rule 2. } else if (...) { // Rule 3. } if (ssl) { accept_SSL_callback(request); } else { ...; } }); {code} from: https://reviews.apache.org/r/31207/",3 MESOS-2965,"Add implicit cast to string operator to Path.","For example: {code}inline Try rm(const std::string& path){code} does not have an overload for {code}inline Try rm(const Path& path){code} The implementation should be something like: {code} inline Try rm(const Path& path) { rm(path.value); } {code}",2 MESOS-2966,"socket::peer() and socket::address() might fail with SSL enabled","libevent SSL currently uses a secondary FD so we need to virtualize the get() function on socket interface. ",5 MESOS-2967,"Missing doxygen documentation for libprocess socket interface ","Convert existing comments to doxygen format. ",5 MESOS-2968,"Implement shared copy based provisioner backend","Currently Appc and Docker both implemented its own copy backend, but most of the logic is the same where the input is just a image name with its dependencies. We can refactor both so that we just have one implementation that is shared between both provisioners, so appc and docker can reuse the shared copy backend.",3 MESOS-2971,"Implement OverlayFS based provisioner backend","Part of the image provisioning process is to call a backend to create a root filesystem based on the image on disk layout. The problem with the copy backend is that it's both waste of IO and space, and bind only can deal with one layer. Overlayfs backend allows us to utilize the filesystem to merge multiple filesystems into one efficiently.",5 MESOS-2972,"Serialize Docker image spec as protobuf","The Docker image specification defines a schema for the metadata json that it puts into each image. Currently the docker image provisioner needs to be able to parse and understand this metadata json, and we should create a protobuf equivelent schema so we can utilize the json to protobuf conversion to read and validate the metadata.",3 MESOS-2973,"SSL tests don't work with --gtest_repeat","commit bfa89f22e9d6a3f365113b32ee1cac5208a0456f Author: Joris Van Remoortere Date: Wed Jul 1 16:16:52 2015 -0700 MESOS-2973: Allow SSL tests to run using gtest_repeat. The SSL ctx object carried some settings between reinitialize() calls. Re-construct the object to avoid this state transition. Review: https://reviews.apache.org/r/36074",3 MESOS-2974,"stout flags can't have their defaults reset","Stout flags don't remember their default values, and so can't have their defaults reset. This makes it hard to reset flags to their defaults between tests.",5 MESOS-2975,"SSL tests don't work with --gtest_shuffle",NULL,3 MESOS-2980,"Allow runtime configuration to be returned from provisioner","Image specs also includes execution configuration (e.g: Env, user, ports, etc). We should support passing those information from the image provisioner back to the containerizer.",5 MESOS-2983,"Deprecating '.json' extension in slave endpoints url","Remove the '.json' extension on endpoints such as `/slave/state.json` so it become `/slave/state`",1 MESOS-2984,"Deprecating '.json' extension in files endpoints url","Remove the '.json' extension on endpoints such as `/files/browse.json` so it become `/files/browse`",1 MESOS-2986,"Docker version output is not compatible with Mesos","We currently use docker version to get Docker version, in Docker master branch and soon in Docker 1.8 [1] the output for this command changes. The solution for now will be to use the unchanged docker --version output, in the long term we should consider stop using the cli and use the API instead. [1] https://github.com/docker/docker/pull/14047",1 MESOS-2991,"Compilation Error on Mac OS 10.10.4 with clang 3.5.0","Compiling 0.23.0 (rc1) produces compilation errors on Mac OS 10.10.4 with {{g++}} based on LLVM 3.5. It looks like the issue was introduced in {{a5640ad813e6256b548fca068f04fd9fa3a03eda}}, https://reviews.apache.org/r/32838. In contrast to the commit message, compiling the rc with gcc4.4 on CentOS worked fine for me. According to 0.23 release notes and MESOS-2604, we should support clang 3.5. {code} ../../../../../3rdparty/libprocess/3rdparty/stout/tests/os_tests.cpp:543:25: error: conversion from 'void ()' to 'const Option' is ambiguous Fork(dosetsid, // Great-great-granchild. ^~~~~~~~ ../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:40:3: note: candidate constructor Option(const T& _t) : state(SOME), t(_t) {} ^ ../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:42:3: note: candidate constructor Option(T&& _t) : state(SOME), t(std::move(_t)) {} ^ ../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:45:3: note: candidate constructor [with U = void ()] Option(const U& u) : state(SOME), t(u) {} ^ {code} Compiler version: {code} $ g++ --version Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 6.0 (clang-600.0.54) (based on LLVM 3.5svn) Target: x86_64-apple-darwin14.4.0 Thread model: posix {code} ",1 MESOS-2993,"Document per container unique egress flow and network queueing statistics","Document new network isolation capabilities in 0.23",3 MESOS-2994,"Design doc for creating user namespaces inside containers",NULL,5 MESOS-2995,"Standardize use of Path ","As per the discussion in MESOS-2965, the use of the Path object should be standardized: * Functions which effectively use Paths (as strings) should instead take Paths. * Functions which modify and return Paths (as strings) should instead return Paths. * Extraneous uses of Path.value should be removed.",3 MESOS-2997,"SSL connection failure causes failed CHECK.","{code} [ RUN ] SSLTest.BasicSameProcess F0706 18:32:28.465451 238583808 libevent_ssl_socket.cpp:507] Check failed: 'self->bev' Must be non NULL {code}",3 MESOS-3001,"Create a ""demo"" HTTP API client","We want to create a simple ""demo"" HTTP API Client (in Java, Python or Go) that can serve as an ""example framework"" for people who will want to use the new API for their Frameworks. The scope should be fairly limited (eg, launching a simple Container task?) but sufficient to exercise most of the new API endpoint messages/capabilities. Scope: TBD Non-Goals: - create a ""best-of-breed"" Framework to deliver any specific functionality; - create an Integration Test for the HTTP API.",8 MESOS-3002,"Rename Option::get(const T& _t) to getOrElse() broke network isolator","Change to Option from get() to getOrElse() breaks network isolator. Building with '../configure --with-network-isolator' generates the following error: ../../src/slave/containerizer/isolators/network/port_mapping.cpp: In static member function 'static Try mesos::internal::slave::PortMappingIsolatorProcess::create(const mesos::internal::slave::Flags&)': ../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: error: no matching function for call to 'Option >::get(const char [1]) const' flags.resources.get(""""), ^ ../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: note: candidates are: In file included from ../../3rdparty/libprocess/3rdparty/stout/include/stout/check.hpp:26:0, from ../../3rdparty/libprocess/include/process/check.hpp:19, from ../../3rdparty/libprocess/include/process/collect.hpp:7, from ../../src/slave/containerizer/isolators/network/port_mapping.cpp:30: ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: note: const T& Option::get() const [with T = std::basic_string] const T& get() const { assert(isSome()); return t; } ^ ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: note: candidate expects 0 arguments, 1 provided ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: note: T& Option::get() [with T = std::basic_string] T& get() { assert(isSome()); return t; } ^ ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: note: candidate expects 0 arguments, 1 provided make[2]: *** [slave/containerizer/isolators/network/libmesos_no_3rdparty_la-port_mapping.lo] Error 1 make[2]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src' make[1]: *** [check] Error 2 make[1]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src' make: *** [check-recursive] Error 1 ",1 MESOS-3004,"Design support running the command executor with provisioned image for running a task in a container","Mesos Containerizer uses the command executor to actually launch the user defined command, and the command executor then can communicate with the slave about the process lifecycle. When we provision a new container with the user specified image, we also need to be able to run the command executor in the container to support the same semantics. One approach is to dynamically mount in a static binary of the command executor with all its dependencies in a special directory so it doesn't interfere with the provisioned root filesystem and configure the mesos containerizer to run the command executor in that directory.",5 MESOS-3005,"SSL tests can fail depending on hostname configuration","Depending on how /etc/hosts is configured, the SSL tests can fail with a bad hostname match for the certificate. We can avoid this by explicitly matching the hostname for the certificate to the IP that will be used during the test.",3 MESOS-3006,"Add cgroups memory stats API","cgroups API current does expose ""stats"" from the memory namespace. Having this API would enable isolators to use its various fields(eg. rss, rss_huge, writeback etc) in use cases like usage metrics.",2 MESOS-3008,"Libevent SSL doesn't use EPOLL","we currently disable to epoll in libevent to allow SSL to work. It would be more scalable if we didn't have to do that.",8 MESOS-3009,"Reproduce systemd cgroup behavior ","It has been noticed before that systemd reorganizes cgroup hierarchy created by mesos slave. Because of this mesos is no longer able to find the cgroup, and there is also a chance of undoing the isolation that mesos slave puts in place. ",5 MESOS-3012,"Support existing message passing optimization with Event/Call.","See the thread here: http://markmail.org/thread/wvapc7vkbv7z6gbx The scheduler driver currently sends framework messages directly to the slave, when possible: {noformat} (through master) Scheduler —————> Master —————> Slave ————> Executor Driver ————————————————————> Driver (skip master) {noformat} The slave always sends messages directly to the scheduler driver: {noformat} Scheduler Master Slave <———— Executor Driver <———————————————————— Driver (skip master) {noformat} In order for the scheduler driver to receive Events from the master, it needs enough information to continue directly sending messages to slaves. This was previously accomplished by sending the slave's pid inside the [offer message|https://github.com/apache/mesos/blob/0.23.0-rc1/src/messages/messages.proto#L168]: {code} message ResourceOffersMessage { repeated Offer offers = 1; repeated string pids = 2; } {code} We could add an 'Address' to the Offer protobuf to provide the scheduler driver with the same information: {code} message Address { required string ip; required string hostname; required uint32_t port; // All HTTP requests to this address must begin with this prefix. required string path_prefix; } message Offer { required OfferID id = 1; required FrameworkID framework_id = 2; required SlaveID slave_id = 3; required string hostname = 4; // Deprecated in favor of 'address'. optional Address address = 8; // Obviates 'hostname'. ... } {code} The path prefix is required for testing purposes, where we can have multiple slaves within a process (e.g. {{localhost:5051/slave(1)/state.json}} vs. {{localhost:5051/slave(2)/state.json}}). This provides enough information to allow the scheduler driver to continue to directly send messages to the slaves, which unblocks MESOS-2910.",1 MESOS-3013,"Extend ContainerInfo to include ""NetworkInfo"" message","As per the [design doc|https://docs.google.com/document/d/17mXtAmdAXcNBwp_JfrxmZcQrs7EO6ancSbejrqjLQ0g], we need to enable frameworks to specify network requirements. The proposed message could be along the lines of: {code} /** * Collection of network request. * TODO(kapil): Add a high-level explanation/motivation. */ message NetworkInfo { // Specify IPAddress requirement. enum Protocol { IPv4 = 0, IPv6 = 1 } // TODO: Document how to use this field to request an // 1) IPv4 address // 2) IPv6 address // 3) Any of the above optional Protocol protocol = 1; // Statically assigned IPs provided by the Framework. optional string ip_address = 2; // A group is the name given to a set of logically-related IPs that are // allowed to communicate within themselves. For example, one might want // to create separate groups for dev, testing, qa and prod deployment // environments. repeated string groups = 3; // To tag certain metadata to be used by Isolator/IPAM. E.g., rack, pop, etc. optional Labels labels = 4; }; message ContainerInfo { … repeated NetworkInfo network_infos; … }; message ContainerStatus { repeated NetworkInfo network_infos; } message TaskStatus { … // TODO: Comment on the fact that this is resolved during container setup. optional ContainerStatus container; … }; {code}",2 MESOS-3015,"Add hooks for Slave exits","The hook will be triggered on slave exits. A master hook module can use this to do Slave-specific cleanups. In our particular use case, the hook would trigger cleanup of IPs assigned to the given Slave (see the [design doc | https://docs.google.com/document/d/17mXtAmdAXcNBwp_JfrxmZcQrs7EO6ancSbejrqjLQ0g/edit#]).",2 MESOS-3016,"Add task status update hooks for Master/Slave","The task termination hooks are needed for doing task-specific cleanup in Master/Slave.",3 MESOS-3017,"Make container-IP available via Master endpoint",NULL,5 MESOS-3018,"A mechanism for messages between Master modules and Slave modules","A slave module should be able to send a message to a master module and vice-versa to allow out-of-band communication between master/slave modules.",8 MESOS-3020,"Expose major, minor and patch components from stout Version ","Stout version class does not expose version components, preventing computations manipulation of version information. Solution is to make major, minor and patch public.",1 MESOS-3021,"Implement Docker Image Provisioner Reference Store","Create a comprehensive store to look up an image and tag's associated image layer ID. Implement add, remove, save, and update images and their associated tags.",3 MESOS-3023,"Factoring out the pattern for URL generation ","fetcher_test.cpp uses the following code for generating URLs: string url = ""http://"" + net::getHostname(process.self().address.ip).get() + "":"" + stringify(process.self().address.port) + ""/"" + process.self().id it would be good to isolate that code in a function, and replace the code above with something like: string url = ""http://"" + endpoint_url(process, ""uri_test""); ",1 MESOS-3024,"HTTP endpoint authN is enabled merely by specifying --credentials","If I set `--credentials` on the master, framework and slave authentication are allowed, but not required. On the other hand, http authentication is now required for authenticated endpoints (currently only `/shutdown`). That means that I cannot enable framework or slave authentication without also enabling http endpoint authentication. This is undesirable. Framework and slave authentication have separate flags (`\--authenticate` and `\--authenticate_slaves`) to require authentication for each. It would be great if there was also such a flag for http authentication. Or maybe we get rid of these flags altogether and rely on ACLs to determine which unauthenticated principals are even allowed to authenticate for each endpoint/action.",8 MESOS-3025,"0.22.x scheduler driver drops 0.23.x reconciliation status updates due to missing StatusUpdate.uuid.","In the process of fixing MESOS-2940, we accidentally introduced a non-backwards compatible change: --> StatusUpdate.uuid was required in 0.22.x and was always set. --> StatusUpdate.uuid is optional in 0.23.x and the master is not setting it for master-generated updates. In 0.22.x, the scheduler driver ignores the 'uuid' for master/driver generated updates already. I'd suggest the following fix: # In 0.23.x, rather than not setting StatusUpdate.uuid, set it to an empty string. # In 0.23.x, ensure the scheduler driver also ignores empty StatusUpdate.uuids. # In 0.24.x, stop setting StatusUpdate.uuid.",3 MESOS-3026,"ProcessTest.Cache fails and hangs","{code} [ RUN ] ProcessTest.Cache ../../../3rdparty/libprocess/src/tests/process_tests.cpp:1726: Failure Value of: response.get().status Actual: ""200 OK"" Expected: ""304 Not Modified"" [ FAILED ] ProcessTest.Cache (1 ms) {code} The tests then finish running, but the gtest framework fails to terminate and uses 100% CPU.",5 MESOS-3032,"Document containerizer launch ","We currently dont have enough documentation for the containerizer component. This task adds documentation for containerizer launch sequence. The mail goals are: - Have diagrams (state, sequence, class etc) depicting the containerizer launch process. - Make the documentation newbie friendly. - Usable for future design discussions.",3 MESOS-3035,"As a Developer I would like a standard way to run a Subprocess in libprocess","As part of MESOS-2830 and MESOS-2902 I have been researching the ability to run a {{Subprocess}} and capture the {{stdout / stderr}} along with the exit status code. {{process::subprocess()}} offers much of the functionality, but in a way that still requires a lot of handiwork on the developer's part; we would like to further abstract away the ability to just pass a string, an optional set of command-line arguments and then collect the output of the command (bonus: without blocking).",3 MESOS-3037,"Add a SUPPRESS call to the scheduler","SUPPRESS call is the complement to the current REVIVE call i.e., it will inform Mesos to stop sending offers to the framework. For the scheduler driver to send only Call messages (MESOS-2913), DeactivateFrameworkMessage needs to be converted to Call(s). We can implement this by having the driver send a SUPPRESS call followed by a DECLINE call for outstanding offers.",3 MESOS-3038,"Resource offers do not contain Unavailability, given a maintenance schedule","Given a schedule, defined elsewhere, any resource offers to affected slaves must include an Unavailability field. The maintenance schedule for a single slave should be held in [persistent storage|MESOS-2075] and locally by the master. i.e. In src/master/master.hpp: {code} struct Slave { ... // Existing fields. // New field that the master/allocator can access Maintenances pendingDowntime; } {code} The new field should be populated via an API call (see [MESOS-2067]). The Unavailability field can be added to Master::offer (src/master/master.cpp). {code} offer->mutable_unavailability()->MergeFrom(slave->pendingDowntime); {code} Possible test(s): * PendingUnavailibilityTest ** Start master, slave. ** Check unavailability of offer == none. ** Set unavailability to the future. ** Check offer has unavailability. ",8 MESOS-3039,"Allow executors binding IP to be different than Slave binding IP","Currently, the Slave will bind either to the loopback IP (127.0.0.1) or to the IP passed via the '--ip' flag. When it launches a containerized executor (e.g, via Mesos Containerizer), the executor inherits the binding IP of the Slave. This is due to the fact that the '--ip' flags sets the environment variable `LIBPROCESS_IP` to the passed IP. The executor then inherits this environment variable and is forced to bind to the Slave IP. If an executor is running in its own containerized environment, with a separate IP than that of the Slave, currently there is no way of forcing it to bind to its own IP. A potential solution is to use the executor environment decorator hooks to update LIBPROCESS_IP environment variable for the executor.",2 MESOS-3041,"Decline call does not include an optional ""reason"", in the Event/Call API","In the Event/Call API, the Decline call is currently used by frameworks to reject resource offers. In the case of InverseOffers, the framework could give additional information to the operators and/or allocator, as to why the InverseOffer is declined. i.e. Suppose a cluster running some consensus algorithm is given an InverseOffer on one of its nodes. It may decline saying ""Too few nodes"" (or, more verbosely, ""Specified InverseOffer would lower the number of active nodes below quorum""). This change requires the following changes: * include/mesos/scheduler/scheduler.proto: {code} message Call { ... message Decline { repeated OfferID offer_ids = 1; optional Filters filters = 2; // Add this extra string for each OfferID // i.e. reasons[i] is for offer_ids[i] repeated string reasons = 3; } ... } {code} * src/master/master.cpp Change Master::decline to either store the reason, or log it. * Add a declineOffer overload in the (Mesos)SchedulerDriver with an optional ""reason"". ** Extend the interface in include/mesos/scheduler.hpp ** Add/change the declineOffer method in src/sched/sched.cpp",3 MESOS-3042,"Master/Allocator does not send InverseOffers to resources to be maintained","Offers are currently sent from master/allocator to framework via ResourceOffersMessage's. InverseOffers, which are roughly equivalent to negative Offers, can be sent in the same package. In src/messages/messages.proto {code} message ResourceOffersMessage { repeated Offer offers = 1; repeated string pids = 2; // New field with InverseOffers repeated InverseOffer inverseOffers = 3; } {code} Sent InverseOffers can be tracked in the master's local state: i.e. In src/master/master.hpp: {code} struct Slave { ... // Existing fields. // Active InverseOffers on this slave. // Similar pattern to the ""offers"" field hashset inverseOffers; } {code} One actor (master or allocator) should populate the new InverseOffers field. * In master (src/master/master.cpp) ** Master::offer is where the ResourceOffersMessage and Offer object is constructed. ** The same method could also check for maintenance and send InverseOffers. * In the allocator (src/master/allocator/mesos/hierarchical.hpp) ** HierarchicalAllocatorProcess::allocate is where slave resources are aggregated an sent off to the frameworks. ** InverseOffers (i.e. negative resources) allocation could be calculated in this method. ** A change to Master::offer (i.e. the ""offerCallback"") may be necessary to account for the negative resources. Possible test(s): * InverseOfferTest ** Start master, slave, framework. ** Accept resource offer, start task. ** Set maintenance schedule to the future. ** Check that InverseOffer(s) are sent to the framework. ** Decline InverseOffer. ** Check that more InverseOffer(s) are sent. ** Accept InverseOffer. ** Check that more InverseOffer(s) are sent.",8 MESOS-3043,"Master does not handle InverseOffers in the Accept call (Event/Call API)","InverseOffers are similar to Offers in that they are Accepted or Declined based on their OfferID. Some additional logic may be neccesary in Master::accept (src/master/master.cpp) to gracefully handle the acceptance of InverseOffers. * The InverseOffer needs to be removed from the set of pending InverseOffers. * The InverseOffer should not result any errors/warnings. Note: accepted InverseOffers do not preclude further InverseOffers from being sent to the framework. Instead, an accepted InverseOffer merely signifies that the framework is _currently_ fine with the expected downtime.",3 MESOS-3044,"Slaves are not deactivated upon reaching a maintenance window","After a maintenance window is reached, the slave should be deactivated to prevent further tasks from utilizing it. * For slaves that have completely drained, simply deactivate the slave. See Master::deactivate(Slave*). * For tasks which have not explicitly declined the InverseOffers (i.e. they've accepted them or do not understand InverseOffers), send kill signals. See Master::killTask * If a slave has tasks that have declined the InverseOffers, do not deactivate the slave. Possible test(s): * SlaveDrainedTest ** Start master, slave. ** Set maintenance to now. ** Check that slave gets deactivated * InverseOfferAgnosticTest ** Start master, slave, framework. ** Have a task run on the slave (ignores InverseOffers). ** Set maintenance to now. ** Check that task gets killed. ** Check that slave gets deactivated. * InverseOfferAcceptanceTest ** Start master, slave, framework. ** Run a task on the slave. ** Set maintenance to future. ** Have task accept InverseOffer. ** Check task gets killed, slave gets deactivated. * InverseOfferDeclinedTest ** Start master, slave, framework. ** Run task on slave. ** Set maintenance to future. ** Have task decline maintenance with reason. ** Check task lives, slave still active.",8 MESOS-3045,"Maintenance information is not populated in case of failover","When a master starts up, or after a master has failed, it must re-populate maintenance information (i.e. from the registry to the local state). Particularly, {{Master::recover}} in {{src/master/master.cpp}} should be changed to process maintenance information.",3 MESOS-3046,"Stout's UUID re-seeds a new random generator during each call to UUID::random.","Per [~StephanErb] and [~kevints]'s observations on MESOS-2940, stout's UUID abstraction is re-seeding the random generator during each call to {{UUID::random()}}, which is really expensive. This is confirmed in the perf graph from MESOS-2940.",3 MESOS-3050,"Failing ROOT_ tests on CentOS 7.1","Running `sudo make check` on CentOS 7.1 for Mesos 0.23.0-rc3 causes several several failures/errors: {code} [ RUN ] DockerTest.ROOT_DOCKER_CheckPortResource ../../src/tests/docker_tests.cpp:303: Failure (run).failure(): Container exited on error: exited with status 1 [ FAILED ] DockerTest.ROOT_DOCKER_CheckPortResource (709 ms) {code} ... {code} [ RUN ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample ../../src/tests/isolator_tests.cpp:837: Failure isolator: Failed to create PerfEvent isolator, invalid events: { cycles, task-clock } [ FAILED ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample (9 ms) [----------] 1 test from PerfEventIsolatorTest (9 ms total) [----------] 2 tests from SharedFilesystemIsolatorTest [ RUN ] SharedFilesystemIsolatorTest.ROOT_RelativeVolume + mount -n --bind /tmp/SharedFilesystemIsolatorTest_ROOT_RelativeVolume_4yTEAC/var/tmp /var/tmp + touch /var/tmp/492407e1-5dec-4b34-8f2f-130430f41aac ../../src/tests/isolator_tests.cpp:1001: Failure Value of: os::exists(file) Actual: true Expected: false [ FAILED ] SharedFilesystemIsolatorTest.ROOT_RelativeVolume (92 ms) [ RUN ] SharedFilesystemIsolatorTest.ROOT_AbsoluteVolume + mount -n --bind /tmp/SharedFilesystemIsolatorTest_ROOT_AbsoluteVolume_OwYrXK /var/tmp + touch /var/tmp/7de712aa-52eb-4976-b0f9-32b6a006418d ../../src/tests/isolator_tests.cpp:1086: Failure Value of: os::exists(path::join(containerPath, filename)) Actual: true Expected: false [ FAILED ] SharedFilesystemIsolatorTest.ROOT_AbsoluteVolume (100 ms) {code} ... {code} [----------] 1 test from UserCgroupIsolatorTest/0, where TypeParam = mesos::internal::slave::CgroupsMemIsolatorProcess userdel: user 'mesos.test.unprivileged.user' does not exist [ RUN ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup -bash: /sys/fs/cgroup/blkio/user.slice/cgroup.procs: Permission denied mkdir: cannot create directory ‘/sys/fs/cgroup/blkio/user.slice/user’: Permission denied ../../src/tests/isolator_tests.cpp:1274: Failure Value of: os::system( ""su - "" + UNPRIVILEGED_USERNAME + "" -c 'mkdir "" + path::join(flags.cgroups_hierarchy, userCgroup) + ""'"") Actual: 256 Expected: 0 -bash: /sys/fs/cgroup/blkio/user.slice/user/cgroup.procs: No such file or directory ../../src/tests/isolator_tests.cpp:1283: Failure Value of: os::system( ""su - "" + UNPRIVILEGED_USERNAME + "" -c 'echo $$ >"" + path::join(flags.cgroups_hierarchy, userCgroup, ""cgroup.procs"") + ""'"") Actual: 256 Expected: 0 -bash: /sys/fs/cgroup/memory/mesos/bbf8c8f0-3d67-40df-a269-b3dc6a9597aa/cgroup.procs: Permission denied -bash: /sys/fs/cgroup/cpuacct,cpu/user.slice/cgroup.procs: No such file or directory mkdir: cannot create directory ‘/sys/fs/cgroup/cpuacct,cpu/user.slice/user’: No such file or directory ../../src/tests/isolator_tests.cpp:1274: Failure Value of: os::system( ""su - "" + UNPRIVILEGED_USERNAME + "" -c 'mkdir "" + path::join(flags.cgroups_hierarchy, userCgroup) + ""'"") Actual: 256 Expected: 0 -bash: /sys/fs/cgroup/cpuacct,cpu/user.slice/user/cgroup.procs: No such file or directory ../../src/tests/isolator_tests.cpp:1283: Failure Value of: os::system( ""su - "" + UNPRIVILEGED_USERNAME + "" -c 'echo $$ >"" + path::join(flags.cgroups_hierarchy, userCgroup, ""cgroup.procs"") + ""'"") Actual: 256 Expected: 0 -bash: /sys/fs/cgroup/name=systemd/user.slice/user-2004.slice/session-3865.scope/cgroup.procs: No such file or directory mkdir: cannot create directory ‘/sys/fs/cgroup/name=systemd/user.slice/user-2004.slice/session-3865.scope/user’: No such file or directory ../../src/tests/isolator_tests.cpp:1274: Failure Value of: os::system( ""su - "" + UNPRIVILEGED_USERNAME + "" -c 'mkdir "" + path::join(flags.cgroups_hierarchy, userCgroup) + ""'"") Actual: 256 Expected: 0 -bash: /sys/fs/cgroup/name=systemd/user.slice/user-2004.slice/session-3865.scope/user/cgroup.procs: No such file or directory ../../src/tests/isolator_tests.cpp:1283: Failure Value of: os::system( ""su - "" + UNPRIVILEGED_USERNAME + "" -c 'echo $$ >"" + path::join(flags.cgroups_hierarchy, userCgroup, ""cgroup.procs"") + ""'"") Actual: 256 Expected: 0 [ FAILED ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup, where TypeParam = mesos::internal::slave::CgroupsMemIsolatorProcess (1034 ms) [----------] 1 test from UserCgroupIsolatorTest/0 (1034 ms total) [----------] 1 test from UserCgroupIsolatorTest/1, where TypeParam = mesos::internal::slave::CgroupsCpushareIsolatorProcess userdel: user 'mesos.test.unprivileged.user' does not exist [ RUN ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup -bash: /sys/fs/cgroup/blkio/user.slice/cgroup.procs: Permission denied mkdir: cannot create directory ‘/sys/fs/cgroup/blkio/user.slice/user’: Permission denied ../../src/tests/isolator_tests.cpp:1274: Failure Value of: os::system( ""su - "" + UNPRIVILEGED_USERNAME + "" -c 'mkdir "" + path::join(flags.cgroups_hierarchy, userCgroup) + ""'"") Actual: 256 Expected: 0 -bash: /sys/fs/cgroup/blkio/user.slice/user/cgroup.procs: No such file or directory ../../src/tests/isolator_tests.cpp:1283: Failure Value of: os::system( ""su - "" + UNPRIVILEGED_USERNAME + "" -c 'echo $$ >"" + path::join(flags.cgroups_hierarchy, userCgroup, ""cgroup.procs"") + ""'"") Actual: 256 Expected: 0 -bash: /sys/fs/cgroup/cpuacct,cpu/mesos/eeeb99f0-7c5c-4185-869d-635d51dcc6e1/cgroup.procs: No such file or directory mkdir: cannot create directory ‘/sys/fs/cgroup/cpuacct,cpu/mesos/eeeb99f0-7c5c-4185-869d-635d51dcc6e1/user’: No such file or directory ../../src/tests/isolator_tests.cpp:1274: Failure Value of: os::system( ""su - "" + UNPRIVILEGED_USERNAME + "" -c 'mkdir "" + path::join(flags.cgroups_hierarchy, userCgroup) + ""'"") Actual: 256 Expected: 0 -bash: /sys/fs/cgroup/cpuacct,cpu/mesos/eeeb99f0-7c5c-4185-869d-635d51dcc6e1/user/cgroup.procs: No such file or directory ../../src/tests/isolator_tests.cpp:1283: Failure Value of: os::system( ""su - "" + UNPRIVILEGED_USERNAME + "" -c 'echo $$ >"" + path::join(flags.cgroups_hierarchy, userCgroup, ""cgroup.procs"") + ""'"") Actual: 256 Expected: 0 -bash: /sys/fs/cgroup/name=systemd/user.slice/user-2004.slice/session-3865.scope/cgroup.procs: No such file or directory mkdir: cannot create directory ‘/sys/fs/cgroup/name=systemd/user.slice/user-2004.slice/session-3865.scope/user’: No such file or directory ../../src/tests/isolator_tests.cpp:1274: Failure Value of: os::system( ""su - "" + UNPRIVILEGED_USERNAME + "" -c 'mkdir "" + path::join(flags.cgroups_hierarchy, userCgroup) + ""'"") Actual: 256 Expected: 0 -bash: /sys/fs/cgroup/name=systemd/user.slice/user-2004.slice/session-3865.scope/user/cgroup.procs: No such file or directory ../../src/tests/isolator_tests.cpp:1283: Failure Value of: os::system( ""su - "" + UNPRIVILEGED_USERNAME + "" -c 'echo $$ >"" + path::join(flags.cgroups_hierarchy, userCgroup, ""cgroup.procs"") + ""'"") Actual: 256 Expected: 0 [ FAILED ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup, where TypeParam = mesos::internal::slave::CgroupsCpushareIsolatorProcess (763 ms) [----------] 1 test from UserCgroupIsolatorTest/1 (763 ms total) [----------] 1 test from UserCgroupIsolatorTest/2, where TypeParam = mesos::internal::slave::CgroupsPerfEventIsolatorProcess userdel: user 'mesos.test.unprivileged.user' does not exist [ RUN ] UserCgroupIsolatorTest/2.ROOT_CGROUPS_UserCgroup ../../src/tests/isolator_tests.cpp:1200: Failure isolator: Failed to create PerfEvent isolator, invalid events: { cpu-cycles } [ FAILED ] UserCgroupIsolatorTest/2.ROOT_CGROUPS_UserCgroup, where TypeParam = mesos::internal::slave::CgroupsPerfEventIsolatorProcess (6 ms) [----------] 1 test from UserCgroupIsolatorTest/2 (6 ms total) {code}",5 MESOS-3051,"performance issues with port ranges comparison","Testing in an environment with lots of frameworks (>200), where the frameworks permanently decline resources they don't need. The allocator ends up spending a lot of time figuring out whether offers are refused (the code path through {{HierarchicalAllocatorProcess::isFiltered()}}. In profiling a synthetic benchmark, it turns out that comparing port ranges is very expensive, involving many temporary allocations. 61% of Resources::contains() run time is in operator -= (Resource). 35% of Resources::contains() run time is in Resources::_contains(). The heaviest call chain through {{Resources::_contains}} is: {code} Running Time Self (ms) Symbol Name 7237.0ms 35.5% 4.0 mesos::Resources::_contains(mesos::Resource const&) const 7200.0ms 35.3% 1.0 mesos::contains(mesos::Resource const&, mesos::Resource const&) 7133.0ms 35.0% 121.0 mesos::operator<=(mesos::Value_Ranges const&, mesos::Value_Ranges const&) 6319.0ms 31.0% 7.0 mesos::coalesce(mesos::Value_Ranges*, mesos::Value_Ranges const&) 6240.0ms 30.6% 161.0 mesos::coalesce(mesos::Value_Ranges*, mesos::Value_Range const&) 1867.0ms 9.1% 25.0 mesos::Value_Ranges::add_range() 1694.0ms 8.3% 4.0 mesos::Value_Ranges::~Value_Ranges() 1495.0ms 7.3% 16.0 mesos::Value_Ranges::operator=(mesos::Value_Ranges const&) 445.0ms 2.1% 94.0 mesos::Value_Range::MergeFrom(mesos::Value_Range const&) 154.0ms 0.7% 24.0 mesos::Value_Ranges::range(int) const 103.0ms 0.5% 24.0 mesos::Value_Ranges::range_size() const 95.0ms 0.4% 2.0 mesos::Value_Range::Value_Range(mesos::Value_Range const&) 59.0ms 0.2% 4.0 mesos::Value_Ranges::Value_Ranges() 50.0ms 0.2% 50.0 mesos::Value_Range::begin() const 28.0ms 0.1% 28.0 mesos::Value_Range::end() const 26.0ms 0.1% 0.0 mesos::Value_Range::~Value_Range() {code} mesos::coalesce(Value_Ranges) gets done a lot and ends up being really expensive. The heaviest parts of the inverted call chain are: {code} Running Time Self (ms) Symbol Name 3209.0ms 15.7% 3209.0 mesos::Value_Range::~Value_Range() 3209.0ms 15.7% 0.0 google::protobuf::internal::GenericTypeHandler::Delete(mesos::Value_Range*) 3209.0ms 15.7% 0.0 void google::protobuf::internal::RepeatedPtrFieldBase::Destroy::TypeHandler>() 3209.0ms 15.7% 0.0 google::protobuf::RepeatedPtrField::~RepeatedPtrField() 3209.0ms 15.7% 0.0 google::protobuf::RepeatedPtrField::~RepeatedPtrField() 3209.0ms 15.7% 0.0 mesos::Value_Ranges::~Value_Ranges() 3209.0ms 15.7% 0.0 mesos::Value_Ranges::~Value_Ranges() 2441.0ms 11.9% 0.0 mesos::coalesce(mesos::Value_Ranges*, mesos::Value_Range const&) 452.0ms 2.2% 0.0 mesos::remove(mesos::Value_Ranges*, mesos::Value_Range const&) 169.0ms 0.8% 0.0 mesos::operator<=(mesos::Value_Ranges const&, mesos::Value_Ranges const&) 82.0ms 0.4% 0.0 mesos::operator-=(mesos::Value_Ranges&, mesos::Value_Ranges const&) 65.0ms 0.3% 0.0 mesos::Value_Ranges::~Value_Ranges() 2541.0ms 12.4% 2541.0 google::protobuf::internal::GenericTypeHandler::New() 2541.0ms 12.4% 0.0 google::protobuf::RepeatedPtrField::TypeHandler::Type* google::protobuf::internal::RepeatedPtrFieldBase::Add::TypeHandler>() 2305.0ms 11.3% 0.0 google::protobuf::RepeatedPtrField::Add() 2305.0ms 11.3% 0.0 mesos::Value_Ranges::add_range() 1962.0ms 9.6% 0.0 mesos::coalesce(mesos::Value_Ranges*, mesos::Value_Range const&) 343.0ms 1.6% 0.0 mesos::ranges::add(mesos::Value_Ranges*, long long, long long) 236.0ms 1.1% 0.0 void google::protobuf::internal::RepeatedPtrFieldBase::MergeFrom::TypeHandler>(google::protobuf::internal::RepeatedPtrFieldBase const&) 1471.0ms 7.2% 1471.0 google::protobuf::internal::RepeatedPtrFieldBase::Reserve(int) 1333.0ms 6.5% 0.0 google::protobuf::RepeatedPtrField::TypeHandler::Type* google::protobuf::internal::RepeatedPtrFieldBase::Add::TypeHandler>() 1333.0ms 6.5% 0.0 google::protobuf::RepeatedPtrField::Add() 1333.0ms 6.5% 0.0 mesos::Value_Ranges::add_range() 1086.0ms 5.3% 0.0 mesos::coalesce(mesos::Value_Ranges*, mesos::Value_Range const&) 247.0ms 1.2% 0.0 mesos::ranges::add(mesos::Value_Ranges*, long long, long long) 107.0ms 0.5% 0.0 void google::protobuf::internal::RepeatedPtrFieldBase::MergeFrom::TypeHandler>(google::protobuf::internal::RepeatedPtrFieldBase const&) 107.0ms 0.5% 0.0 google::protobuf::RepeatedPtrField::MergeFrom(google::protobuf::RepeatedPtrField const&) 107.0ms 0.5% 0.0 mesos::Value_Ranges::MergeFrom(mesos::Value_Ranges const&) 105.0ms 0.5% 0.0 mesos::Value_Ranges::CopyFrom(mesos::Value_Ranges const&) 105.0ms 0.5% 0.0 mesos::Value_Ranges::operator=(mesos::Value_Ranges const&) 104.0ms 0.5% 0.0 mesos::coalesce(mesos::Value_Ranges*, mesos::Value_Range const&) 1.0ms 0.0% 0.0 mesos::remove(mesos::Value_Ranges*, mesos::Value_Range const&) 2.0ms 0.0% 0.0 mesos::Resource::MergeFrom(mesos::Resource const&) 2.0ms 0.0% 0.0 google::protobuf::internal::GenericTypeHandler::Merge(mesos::Resource const&, mesos::Resource*) 2.0ms 0.0% 0.0 void google::protobuf::internal::RepeatedPtrFieldBase::MergeFrom::TypeHandler>(google::protobuf::internal::RepeatedPtrFieldBase const&) 29.0ms 0.1% 0.0 void google::protobuf::internal::RepeatedPtrFieldBase::MergeFrom::TypeHandler>(google::protobuf::internal::RepeatedPtrFieldBase const&) 898.0ms 4.4% 898.0 google::protobuf::RepeatedPtrField::TypeHandler::Type* google::protobuf::internal::RepeatedPtrFieldBase::Add::TypeHandler>() 517.0ms 2.5% 0.0 google::protobuf::RepeatedPtrField::Add() 517.0ms 2.5% 0.0 mesos::Value_Ranges::add_range() 429.0ms 2.1% 0.0 mesos::coalesce(mesos::Value_Ranges*, mesos::Value_Range const&) 88.0ms 0.4% 0.0 mesos::ranges::add(mesos::Value_Ranges*, long long, long long) 379.0ms 1.8% 0.0 void google::protobuf::internal::RepeatedPtrFieldBase::MergeFrom::TypeHandler>(google::protobuf::internal::RepeatedPtrFieldBase const&) {code} ",8 MESOS-3055,"Master doesn't properly handle SUBSCRIBE call","Master::subscribe() incorrectly handles re-registration. It handles it as a registration request (not ""re-registration"") because of a bug in the if loop (should have been !frameworkInfo.has_id()). {code} void Master::subscribe( const UPID& from, const scheduler::Call::Subscribe& subscribe) { const FrameworkInfo& frameworkInfo = subscribe.framework_info(); // TODO(vinod): Instead of calling '(re-)registerFramework()' from // here refactor those methods to call 'subscribe()'. if (frameworkInfo.has_id() || frameworkInfo.id() == """") { registerFramework(from, frameworkInfo); } else { reregisterFramework(from, frameworkInfo, subscribe.force()); } } {code}",2 MESOS-3060,"FTP response code for success not recognized by fetcher.","The response code for successful HTTP requests is 200, the response code for successful FTP file transfers is 226. The fetcher currently only checks for a response code of 200 even for FTP URIs. This results in failed fetching even though the resource gets downloaded successfully. This has been found by a dedicated external test using an FTP server. ",1 MESOS-3061,"Expose docker container IP in Master's state.json","We want to expose docker container IP to Mesos-DNS. One potential solution is to make it available via Master's state.json. We can set a label ""Docker.NetworkSettings.IPAddress"" in TaskStatus message (when it is sent the first time with TASK_RUNNING status).",2 MESOS-3062,"Add authorization for dynamic reservation","Dynamic reservations should be authorized with the {{principal}} of the reserving entity (framework or master). The idea is to introduce {{Reserve}} and {{Unreserve}} into the ACL. {code} message Reserve { // Subjects. required Entity principals = 1; // Objects. MVP: Only possible values = ANY, NONE required Entity resources = 1; } message Unreserve { // Subjects. required Entity principals = 1; // Objects. required Entity reserver_principals = 2; } {code} When a framework/operator reserves resources, ""reserve"" ACLs are checked to see if the framework ({{FrameworkInfo.principal}}) or the operator ({{Credential.user}}) is authorized to reserve the specified resources. If not authorized, the reserve operation is rejected. When a framework/operator unreserves resources, ""unreserve"" ACLs are checked to see if the framework ({{FrameworkInfo.principal}}) or the operator ({{Credential.user}}) is authorized to unreserve the resources reserved by a framework or operator ({{Resource.ReservationInfo.principal}}). If not authorized, the unreserve operation is rejected.",2 MESOS-3064,"Add 'principal' field to 'Resource.DiskInfo.Persistence'","In order to support authorization for persistent volumes, we should add the {{principal}} to {{Resource.DiskInfo}}, analogous to {{Resource.ReservationInfo.principal}}.",1 MESOS-3065,"Add framework authorization for persistent volume","This is the third in a series of tickets that adds authorization support to persistent volumes. When a framework creates a persistent volume, ""create"" ACLs are checked to see if the framework (FrameworkInfo.principal) or the operator (Credential.user) is authorized to create persistent volumes. If not authorized, the create operation is rejected. When a framework destroys a persistent volume, ""destroy"" ACLs are checked to see if the framework (FrameworkInfo.principal) or the operator (Credential.user) is authorized to destroy the persistent volume created by a framework or operator (Resource.DiskInfo.principal). If not authorized, the destroy operation is rejected. A separate ticket will use the structures created here to enable authorization of the ""/create"" and ""/destroy"" HTTP endpoints: https://issues.apache.org/jira/browse/MESOS-3903",5 MESOS-3066,"Replicated registry needs a representation of maintenance schedules","In order to persist maintenance schedules across failovers of the master, the schedule information must be kept in the replicated registry. This means adding an additional message in the Registry protobuf in src/master/registry.proto. The status of each individual slave's maintenance will also be persisted in this way. {code} message Maintenance { message HostStatus { required string hostname = 1; // True if the slave is deactivated for maintenance. // False if the slave is draining in preparation for maintenance. required bool is_down = 2; // Or an enum } message Schedule { // The set of affected slave(s). repeated HostStatus hosts = 1; // Interval in which this set of slaves is expected to be down for. optional Unavailability interval = 2; } message Schedules { repeated Schedule schedules; } optional Schedules schedules = 1; } {code} Note: There can be multiple SlaveID's attached to a single hostname.",3 MESOS-3067,"Implement a streaming response decoder for events stream","We need a streaming response decoder to de-serialize chunks sent from the master on the events stream. From the HTTP API design doc: Master encodes each Event in RecordIO format, i.e. a string representation of length of the event in bytes followed by JSON or binary Protobuf (possibly compressed) encoded event. As of now for getting the basic features right , this is being done in the test-cases: {code} auto reader = response.get().reader; ASSERT_SOME(reader); Future eventFuture = reader.get().read(); AWAIT_READY(eventFuture); Event event; event.ParseFromString(eventFuture.get()); {code} Two things need to happen: - We need master to emit events in RecordIO format i.e. event size followed by the serialized event instead of just the serialized events as is the case now. - The decoder class should then abstract away the logic of reading the response and de-serializing events from the stream. Ideally, the decoder should work with both ""json"" and ""protobuf"" responses. ",3 MESOS-3068,"Registry operations are hardcoded for a single key (Registry object)","This is primarily a refactoring. The prototype for modifying the registry is currently: {code} Try operator () ( Registry* registry, hashset* slaveIDs, bool strict); {code} In order to support Maintenance schedules (possibly Quotas as well), there should be an alternate prototype for Maintenance. Something like: {code} Try operation () ( Maintenance* maintenance, bool strict); {code} The existing RegistrarProcess::update (src/master/registrar.cpp) should be refactored to allow for more than one key. If necessary, refactor existing operations defined in src/master/master.hpp (AdminSlave, ReadminSlave, RemoveSlave).",5 MESOS-3069,"Registry operations do not exist for manipulating maintanence schedules","In order to modify the maintenance schedule in the replicated registry, we will need Operations (src/master/registrar.hpp). The operations will likely correspond to the HTTP API: * UpdateMaintenanceSchedule: Given a blob representing a maintenance schedule, perform some verification on the blob. Write the blob to the registry. * StartMaintenance: Given a set of machines, verify then transition machines from Draining to Deactivated. * StopMaintenance: Given a set of machines, verify then transition machines from Deactivated to Normal. Remove affected machines from the schedule(s).",8 MESOS-3072,"Unify initialization of modularized components","h1.Introduction As it stands right now, default implementations of modularized components are required to have a non parametrized {{create()}} static method. This allows to write tests which can cover default implementations and modules based on these default implementations on a uniform way. For example, with the interface {{Foo}}: {code} class Foo { public: virtual ~Foo() {} virtual Future hello() = 0; protected: Foo() {} }; {code} With a default implementation: {code} class LocalFoo { public: Try create() { return new Foo; } virtual Future hello() { return 1; } }; {code} This allows to create typed tests which look as following: {code} typedef ::testing::Types> FooTestTypes; TYPED_TEST_CASE(FooTest, FooTestTypes); TYPED_TEST(FooTest, ATest) { Try foo = TypeParam::create(); ASSERT_SOME(foo); AWAIT_CHECK_EQUAL(foo.get()->hello(), 1); } {code} The test will be applied to each of types in the template parameters of {{FooTestTypes}}. This allows to test different implementation of an interface. In our code, it tests default implementations and a module which uses the same default implementation. The class {{tests::Module}} needs a little explanation, it is a wrapper around {{ModuleManager}} which allows the tests to encode information about the requested module in the type itself instead of passing a string to the factory method. The wrapper around create, the real important method looks as follows: {code} template static Try test::Module::create() { Try moduleName = getModuleName(N); if (moduleName.isError()) { return Error(moduleName.error()); } return mesos::modules::ModuleManager::create(moduleName.get()); } {code} h1.The Problem Consider the following implementation of {{Foo}}: {code} class ParameterFoo { public: Try create(int i) { return new ParameterFoo(i); } ParameterFoo(int i) : i_(i) {} virtual Future hello() { return i; } private: int i_; }; {code} As it can be seen, this implementation cannot be used as a default implementation since its create API does not match the one of {{test::Module<>}}: {{create()}} has a different signature for both types. It is still a common situation to require initialization parameters for objects, however this constraint (keeping both interfaces alike) forces default implementations of modularized components to have default constructors, therefore the tests are forcing the design of the interfaces. Implementations which are supposed to be used as modules only, i.e. non default implementations are allowed to have constructor parameters, since the actual signature of their factory method is, this factory method's function is to decode the parameters and call the appropriate constructor: {code} template T* Module::create(const Parameters& params); {code} where parameters is just an array of key-value string pairs whose interpretation is left to the specific module. Sadly, this call is wrapped by {{ModuleManager}} which only allows module parameters to be passed from the command line and does not offer a programmatic way to feed construction parameters to modules. h1.The Ugly Workaround With the requirement of a default constructor and parameters devoid {{create()}} factory function, a common pattern (see [Authenticator|https://github.com/apache/mesos/blob/9d4ac11ed757aa5869da440dfe5343a61b07199a/include/mesos/authentication/authenticator.hpp]) has been introduced to feed construction parameters into default implementation, this leads to adding an {{initialize()}} call to the public interface, which will have {{Foo}} become: {code} class Foo { public: virtual ~Foo() {} virtual Try initialize(Option i) = 0; virtual Future hello() = 0; protected: Foo() {} }; {code} {{ParameterFoo}} will thus look as follows: {code} class ParameterFoo { public: Try create() { return new ParameterFoo; } ParameterFoo() : i_(None()) {} virtual Try initialize(Option i) { if (i.isNone()) { return Error(""Need value to initialize""); } i_ = i; return Nothing; } virtual Future hello() { if (i_.isNone()) { return Future::failure(""Not initialized""); } return i_.get(); } private: Option i_; }; {code} Look that this {{initialize()}} method now has to be implemented by all descendants of {{Foo}}, even if there's a {{DatabaseFoo}} which takes is return value for {{hello()}} from a DB, it will need to support {{int}} as an initialization parameter. The problem is more severe the more specific the parameter to {{initialize()}} is. For example, if there is a very complex structure implementing ACLs, all implementations of an authorizer will need to import this structure even if they can completely ignore it. In the {{Foo}} example if {{ParameterFoo}} were to become the default implementation of {{Foo}}, the tests would look as follows: {code} typedef ::testing::Types> FooTestTypes; TYPED_TEST_CASE(FooTest, FooTestTypes); TYPED_TEST(FooTest, ATest) { Try foo = TypeParam::create(); ASSERT_SOME(foo); int fooValue = 1; foo.get()->initialize(fooValue); AWAIT_CHECK_EQUAL(foo.get()->hello(), fooValue); } {code}",3 MESOS-3073,"Introduce HTTP endpoints for Quota","We need to implement the HTTP endpoints for Quota as outlined in the Design Doc: (https://docs.google.com/document/d/16iRNmziasEjVOblYp5bbkeBZ7pnjNlaIzPQqMTHQ-9I). ",3 MESOS-3074,"Add capacity heuristic for quota requests in Master","We need to to validate quota requests in the Mesos Master as outlined in the Design Doc: https://docs.google.com/document/d/16iRNmziasEjVOblYp5bbkeBZ7pnjNlaIzPQqMTHQ-9I This ticket aims to validate satisfiability (in terms of available resources) of a quota request using a heuristic algorithm in the Mesos Master, rather than validating the syntax of the request.",3 MESOS-3076,"Add Labels to TaskStatus and expose them via state.json","This would allow the executors and Slave modules to expose some meta-data to frameworks and Mesos-DNS via state.json. A typical use case is to allow the containers to expose their IP to framework/Mesos-DNS.",2 MESOS-3077,"Registry recovery does not recover the maintenance object.","Persisted info is fetched from the registry when a master is elected or after failover. Currently, this process involves 3 steps: * Fetch the ""registry"". * Start an operation to add the new master to the fetched registry. * Check the success of the operation and finish recovering. These methods can be found in src/master/registrar.cpp {code}RegistrarProcess::recover, ::_recover, ::__recover{code} Since the maintenance schedule is stored in a separate key, the recover process must also fetch a new ""maintenance"" object. This object needs to be passed along to the master along with the existing ""registry"" object. Possible test(s): * src/tests/registrar_tests.cpp ** Change the ""Recovery"" test to include checks for the new object.",5 MESOS-3078,"Recovered resources are not re-allocated until the next allocation delay.","Currently, when resources are recovered, we do not perform an allocation for that slave. Rather, we wait until the next allocation interval. For small task, high throughput frameworks, this can have a significant impact on overall throughput, see the following thread: http://markmail.org/thread/y6mzfwzlurv6nik3 We should consider immediately performing a re-allocation for the slave upon resource recovery.",5 MESOS-3079,"`sudo make distcheck` fails on Ubuntu 14.04 (and possibly other OSes too)","Running tests as root causes a large number of failures. {noformat} $ lsb_release -a LSB Version: core-2.0-amd64:core-2.0-noarch:core-3.0-amd64:core-3.0-noarch:core-3.1-amd64:core-3.1-noarch:core-3.2-amd64:core-3.2-noarch:core-4.0-amd64:core-4.0-noarch:core-4.1-amd64:core-4.1-noarch:cxx-3.0-amd64:cxx-3.0-noarch:cxx-3.1-amd64:cxx-3.1-noarch:cxx-3.2-amd64:cxx-3.2-noarch:cxx-4.0-amd64:cxx-4.0-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-3.1-amd64:desktop-3.1-noarch:desktop-3.2-amd64:desktop-3.2-noarch:desktop-4.0-amd64:desktop-4.0-noarch:desktop-4.1-amd64:desktop-4.1-noarch:graphics-2.0-amd64:graphics-2.0-noarch:graphics-3.0-amd64:graphics-3.0-noarch:graphics-3.1-amd64:graphics-3.1-noarch:graphics-3.2-amd64:graphics-3.2-noarch:graphics-4.0-amd64:graphics-4.0-noarch:graphics-4.1-amd64:graphics-4.1-noarch:languages-3.2-amd64:languages-3.2-noarch:languages-4.0-amd64:languages-4.0-noarch:languages-4.1-amd64:languages-4.1-noarch:multimedia-3.2-amd64:multimedia-3.2-noarch:multimedia-4.0-amd64:multimedia-4.0-noarch:multimedia-4.1-amd64:multimedia-4.1-noarch:printing-3.2-amd64:printing-3.2-noarch:printing-4.0-amd64:printing-4.0-noarch:printing-4.1-amd64:printing-4.1-noarch:qt4-3.1-amd64:qt4-3.1-noarch:security-4.0-amd64:security-4.0-noarch:security-4.1-amd64:security-4.1-noarch Distributor ID: Ubuntu Description: Ubuntu 14.04.2 LTS Release: 14.04 Codename: trusty $ sudo make -j12 V=0 check [==========] 712 tests from 116 test cases ran. (318672 ms total) [ PASSED ] 676 tests. [ FAILED ] 36 tests, listed below: [ FAILED ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample [ FAILED ] UserCgroupIsolatorTest/2.ROOT_CGROUPS_UserCgroup, where TypeParam = mesos::internal::slave::CgroupsPerfEventIsolatorProcess [ FAILED ] SlaveRecoveryTest/0.RecoverSlaveState, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.RecoverStatusUpdateManager, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.RecoverUnregisteredExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.RecoverTerminatedExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.RecoverCompletedExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.CleanupExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.RemoveNonCheckpointingFramework, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.NonCheckpointingFramework, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.KillTask, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.Reboot, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.GCExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.ShutdownSlave, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.ShutdownSlaveSIGUSR1, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.RegisterDisconnectedSlave, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.ReconcileKillTask, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.ReconcileShutdownFramework, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.ReconcileTasksMissingFromSlave, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.SchedulerFailover, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.PartitionedSlave, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.MasterFailover, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.MultipleFrameworks, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.MultipleSlaves, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] MesosContainerizerSlaveRecoveryTest.ResourceStatistics [ FAILED ] MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PerfRollForward [ FAILED ] MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceForward [ FAILED ] MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceBackward [ FAILED ] CgroupsAnyHierarchyWithPerfEventTest.ROOT_CGROUPS_Perf [ FAILED ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics [ FAILED ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery [ FAILED ] NsTest.ROOT_setns [ FAILED ] PerfTest.ROOT_Events [ FAILED ] PerfTest.ROOT_SamplePid 36 FAILED TESTS {noformat} Full log attached.",2 MESOS-3082,"Perf related tests rely on 'cycles' which might not always be present.","When running the tests on Ubuntu 14.04 the 'cycles' value collected by perf is always 0, meaning certain tests always fail. These lines in the test have been commented out for now and a TODO has been attached which links to this JIRA issue, since the solution is unclear. In particular, 'cycles' might not properly be counted because it is a hardware counter and this particular machine was a virtual machine. Either way, we should determine the best events to collect from perf in either VM or physical settings.",5 MESOS-3083,"Doing 'clone' on Linux with the CLONE_NEWUSER namespace type can drop root privileges.","The namespace tests attempt to clone a process with all namespaces that are available from the kernel which includes the 'user' namespace in Ubuntu 14.04 which causes the child process to be user 'nobody' instead of user 'root' after invoking 'clone' which is bad because the test requires that the child process is 'root' and so things fail (because of insufficient permissions). For now, we explicitly ignore the 'user' namespace in the tests, but this issue is to track exactly how we might want to manage this going forward.",5 MESOS-3086,"Create cgroups TasksKiller for non freeze subsystems.","We have a number of test issues when we cannot remove cgroups (in case there are still related tasks running) in cases where the freezer subsystem is not available. In the current code (https://github.com/apache/mesos/blob/0.22.1/src/linux/cgroups.cpp#L1728) we will fallback to a very simple mechnism of recursivly trying to remove the cgroups which fails if there are still tasks running. Therefore we need an additional (NonFreeze)TasksKiller which doesn't rely on the freezer subsystem. This problem caused issues when running 'sudo make check' during 0.23 release testing, where BenH provided already a better error message with b1a23d6a52c31b8c5c840ab01902dbe00cb1feef / https://reviews.apache.org/r/36604. ",4 MESOS-3087,"Typos in oversubscription doc","* In docs/oversubscription.md: there are three cases where ""revocable"" is written as ""recovable"", including the name of a JSON field. {noformat} $ grep -niR recovable . ./docs/oversubscription.md:51:with revocable resources. Further more, recovable resources cannot be ./docs/oversubscription.md:95:Launching tasks using recovable resources is done through the existing ./docs/oversubscription.md:96:`launchTasks` API. Revocable resources will have the `recovable` field set. See {noformat} * Also in `docs/oversubscription.md`: the last sentence doesn't make sense {noformat} To select custom a resource estimator and QoS controller, please refer to the [modules documentation](modules.md). {noformat} Maybe should say ""To select a custom..."" or ""To install a custom...""",1 MESOS-3088,"Update scheduler driver to send SUBSCRIBE call","See MESOS-2913 for context.",2 MESOS-3089,"Update scheduler library to send REQUEST call","See MESOS-2913 for context. From the dev list it looks like users depend on this call for their custom allocator, so we need to support it going forward.",2 MESOS-3092,"Configure Jenkins to run Docker tests","Add a jenkin job to run the Docker tests",2 MESOS-3093,"Support HTTPS requests in libprocess","In order to pull images from Docker registries, https calls are needed to securely communicate with the registry hosts. Currently, only http requests are supported through libprocess. Now that SSL sockets are available through libprocess, support for https can be added.",3 MESOS-3095,"PoC running command executor with image provisioner","This is to implement a PoC of the alternative design choices with MESOS-3004",3 MESOS-3096,"Authentication for Communicating with Docker Registry","In order to pull Docker images from Docker Hub and private Docker registries, the provisioner must support two primary authentication frameworks to authenticate with the registries, basic authentication and the OAuth2.0 authorization framework, as per the docker registry spec. A Docker registry can also operate in standalone mode and may not require authentication.",5 MESOS-3097,"OS-specific code touched by the containerizer tests is not Windows compatible","In the process of adding the Cmake build system, [~hausdorff] noted and stubbed out all OS-specific code. That sweep (mostly of libprocess and stout) is here: https://github.com/hausdorff/mesos/commit/b862f66c6ff58c115a009513621e5128cb734d52 Instead of having inline {{#if defined(...)}}, the OS-specific code will be separated into directories. The Windows code will be stubbed out.",13 MESOS-3098,"Implement WindowsContainerizer and WindowsDockerContainerizer","The MVP for Windows support is a containerizer that (1) runs on Windows, and (2) runs and passes all the tests that are relevant to the Windows platform (_e.g._, not the tests that involve cgroups). To do this we require at least a `WindowsContainerizer` (to be implemented alongside the `MesosContainerizer`), which provides no meaningful (_e.g._) process namespacing (much like the default unix containerizer). In the long term (hopefully before MesosCon) we want to support also the Windows container API. This will require implementing a separate containerizer, maybe called `WindowsDockerContainerizer`. Since the Windows container API is actually officially supported through the Docker interface (_i.e._, MSFT actually ported the Docker engine to Windows, and that is the official API), the interfaces (like the fetcher) shouldn't change much. The tests probably will have to change, as we don't have access to any isolation primitives like cgroups for those tests. Outstanding TODO([~hausdorff]): Flesh out this description when more details are available, regarding: * The container API for Windows (when we know them) * The nuances of Windows vs Linux (when we know them) * etc.",13 MESOS-3099,"Validation of Docker Image Manifests from Docker Registry","Docker image manifests pulled from remote Docker registries should be verified against their signature digest before they are used. ",3 MESOS-3100,"Validation of Docker Layers Pulled From Docker Registry","Docker layers should be verified against their checksum digests before they are stored to ensure the integrity of the docker layer content. This includes supporting sha256, sha384, sha512 hash algorithms.",3 MESOS-3101,"Standardize separation of Windows/Linux-specific OS code","There are 50+ files that must be touched to separate OS-specific code. First, we will standardize the changes by using stout/abort.hpp as an example. The review/discussion can be found here: https://reviews.apache.org/r/36625/",3 MESOS-3102,"Separate OS-specific code in the stout library","This issue tracks changes for all files under {{3rdparty/libprocess/3rdparty/stout/}} The changes will be based on this commit: https://github.com/hausdorff/mesos/commit/b862f66c6ff58c115a009513621e5128cb734d52#diff-a6d038bad64b154996452bec020cfa7c",5 MESOS-3103,"Separate OS-specific code in the libprocess library","This issue tracks changes for all files under {{3rdparty/libprocess/include/}} and {{3rdparty/libprocess/src}}. The changes will be based on this commit: https://github.com/hausdorff/mesos/commit/b862f66c6ff58c115a009513621e5128cb734d52#diff-a6d038bad64b154996452bec020cfa7c",5 MESOS-3105,"Port flag generation logic from the autotools solution to CMake","One major barrier to widespread adoption of the CMake-based build system (other than the fact that we haven't implemented it yet!) is that most of our institutional knowledge of the quirks of how to build Mesos across many platforms is tied up in files like `configure.ac`. Therefore, a ""good"" CMake-based build system will require us to go through these files systematically and manually port this logic to CMake (as well as testing it).",3 MESOS-3106,"Extend CMake build system to support building against third-party libraries from either the system or the local Mesos rebundling","Currently Mesos has third-party dependencies of two types: (1) those that are expected to be on the system (such as APR, libsvn, _etc_.), and (2) those that have been historically bundled as tarballs inside the Mesos repository, and are not expected to be on the system when Mesos is installed (these are located in the `3rdparty/` directory, and includes things like boost and glog). For type (2), the MVP of the CMake-based build system will always pull down a fresh tarball from an external source, instead of using the bundled tarballs in the `3rdparty/` folder. However, many CI systems do not have Internet access, so in the long term, we need to provide many options for getting these dependencies.",5 MESOS-3107,"Define CMake style guide","The short story is that it is important to be principled about how the CMake build system is maintained, because there CMake language makes it difficult to statically verify that a configuration is correct. It is not unique in this regard, but (make is arguably even worse) but it is something that's important to make sure we get right. The longer story is, CMake's language is dynamically scoped and often has somewhat odd defaults for variable values (_e.g._, IIRC, target names passed to ExternalProject_Add default to ""PREFIX"" instead of erroring out). This means that it is rare to get a configuration-time error (_i.e._, CMake usually doesn't say something like ""hey this variable isn't defined""), and in large projects, this can make it very difficult to know where definitions come from, or whether it's important that one config routine runs before another. Dynamic scoping also makes it particularly easy to write spaghetti code, which is clearly undesirable for something as important as a build system. Thus, it is particularly important that we lay down our expectations for how the CMake system is to be structured. This might include: * Function naming (_e.g._, making it easy to tell whether a function was defined by us, and where it was defined; so we might say that we want our functions to have an underscore to start, and start with the package the come from, like libprocess, so that we know where to look for the definition.) * What assertions we want to check variable values against, so that we can replace subtle errors (_e.g._, a library is accidentally named something silly like ""PREFIX.0.0.1"") with an obvious ones (_e.g._, ""You have failed to define your target name, so CMake has defaulted to 'PREFIX'; please check your configuration routines"") * Decisions of what goes where. (_e.g._, the most complex parts of the CMake MVPs is in the configuration routines, like `MesosConfigure.cmake`; to curb this, we should have strict rules about what goes in that file vs other files, and how we know what is to be run before what. Part of this should probably be prominent comments explaining the structure of the project, so that people aren't confused!) * And so on.",3 MESOS-3108,"Add autotools-style Mesos distributions to the CMake build system","In the autoconf-based build system, we there is a notion of building a ""distribution"" of Mesos. Essentially, it is a version of Mesos that is configured for a specific platform (Ubuntu, say); so, if a consumer knows their platform, and there is a Mesos distribution, they need only run `make all` and Mesos builds. This allows the consumer to skip the configure step. In CMake, it should be possible to do this (should be!), and we should explore making it work after we complete the MVP.",3 MESOS-3109,"Expand CMake build system to support building the containerizer and associated components","In other tasks in epic MESOS-898, we implement a CMake-based build system that allows us to build process library, the process tests, and the stout tests. For the CMake build system MVP, it's important that we expand this to build the containerizer, associated modules, and all related tests.",3 MESOS-3110,"Harden the CMake system-dependency-locating routines","Currently the Mesos project has two flavors of dependency: (1) the dependencies we expect are already on the system (_e.g._, apr, libsvn), and (2) the dependencies that are historically bundled with Mesos (_e.g._, glog). Dependency type (1) requires solid modules that will locate them on any system: Linux, BSD, or Windows. This would come for free if we were using CMake 3.0, but we're using CMake 2.8 so that Ubuntu users can install it out of the box, instead of upgrading CMake first. This is additionally useful for dependency type (2), where we will expect to have to use these routines when we support both the rebundled dependencies in the `3rdparty/` folder, and system installations of those dependencies.",3 MESOS-3112,"Fetcher should perform cache eviction based on cache file usage patterns.","Currently, the fetcher uses a trivial strategy to select eviction victims: it picks the first cache file it finds in linear iteration. This means that potentially a file that has just been used gets evicted the next moment. This performance loss can be avoided by even the simplest enhancement of the selection procedure. Proposed approach: determine an effective yet relatively uncomplex and quick algorithm and implement it in `FetcherProcess::Cache::selectVictims(const Bytes& requiredSpace)`. Suggestion: approximate MRU-retention somehow. Unit-test what actually happens!",8 MESOS-3113,"Add resource usage section to containerizer documentation","Currently, the containerizer documentation doesn't touch upon the usage() API and how to interpret the collected statistics.",3 MESOS-3114,"Simplify JSON::* by providing ""jsonify"" along the lines of ""stringify""","We want to be able to do things like: {code} JSON::Value number1 = 25; JSON::Number number2 = 26; EXPECT_NE(number1, number2); EXPECT_EQ(jsonify(12), number1); EXPECT_EQ(jsonify(12), number2); {code}",3 MESOS-3115,"Convert mesos::slave::{Limitation,ExecutorRunState} into protobufs.","Published RR: https://reviews.apache.org/r/36718/",1 MESOS-3116,"Pass ExecutorInfo argument into Isolator::isolate().","Some isolators need to lookup the executor environment variables to customize their isolation needs. Currently, one has to use the ""prepare()"" call to cache the executor-info to use it later during isolate() call.",2 MESOS-3117,"Pass ContainerId into `slaveExecutorEnvironmentDecorator` hook",NULL,1 MESOS-3118,"Remove pthread specific code from Stout",NULL,3 MESOS-3119,"Remove pthread specific code from Libprocess",NULL,3 MESOS-3120,"Remove pthread specific code from Mesos",NULL,3 MESOS-3121,"Always disable SSLV2","The SSL protocol mismatch tests are failing on Centos7 when matching SSLV2 with SSLV2. Since this version of the protocol is highly discouraged anyway, let's disable it completely unless requested otherwise.",2 MESOS-3122,"Add configurable UNIMPLEMENTED macro to stout","During the transition to support for windows, it would be great if we had the ability to use a macro that marks functions as un-implemented. To support being able to find all the unimplemented functions easily at compile time, while also being able to run the tests at the same time, we can add a configuration flag that controls whether this macro aborts or expands to a static assertion.",2 MESOS-3124,"Updating persistent volumes after slave restart is problematic.","Just realize that while reviewing https://reviews.apache.org/r/34135 Since we don't checkpoint 'resources' in Mesos containerizer, when slave restarts and recovers, the 'resources' in Container struct will be empty, but there are symlinks exists in the sandbox. We'll end up with trying to create already exist symlinks (and fail). I think we should ignore the creation if it already exists.",3 MESOS-3125,"DOCKER_HOST env variable stopped working for executors","With https://reviews.apache.org/r/36282/ no environment variables are available anymore in the docker executors. Hence, setting DOCKER_HOST outside of Mesos stopped working. Setups which use a remote Docker daemon or tools like Powerstrip stopped working.",2 MESOS-3127,"Improve task reconciliation documentation.","Include additional information about task reconciliation that explain why the master may not return the states of all tasks immediately and why an explicit task reconciliation algorithm is necessary.",1 MESOS-3129,"Move all MesosContainerizer related files under src/slave/containerizer/mesos","Currently, some MesosContainerizer specific files are not in the correct location. For example: {noformat} src/slave/containerizer/isolators/* src/slave/containerizer/provisioner.hpp|cpp {noformat} They should be put under src/slave/containerizer/mesos/",2 MESOS-3130,"Custom isolators should implement Isolator instead of IsolatorProcess.","Similar to MESOS-2213, we should not restrict custom isolators to use libprocess Process. We should do a similar refactor as we did for MESOS-2213.",3 MESOS-3131,"Master should send heartbeats on the subscription connection","In order to deal with network partitions and ensuring network intermediately do not close the persistent subscription connection, master must periodically send heartbeats. The expectation with schedulers is that they resubscribe when they do not receive heartbeats for some time.",3 MESOS-3132,"Allow slave to forward messages through the master for HTTP schedulers.","The master currently has no install handler for {{ExecutorToFramework}} messages and the slave directly sends these messages to the scheduler driver, bypassing the master entirely. We need to preserve this behavior for the driver, but HTTP schedulers will not have a libprocess 'pid'. We'll have to ensure that the {{RunTaskMessage}} and {{UpdateFrameworkMessage}} have an optional pid. For now the master will continue to set the pid, but 0.24.0 slaves will know to send messages through the master when the 'pid' is not available.",5 MESOS-3133,"Isolator::prepare() should return Executor environment vars as well","Sometimes the Isolators need to pass on some environment variables for the Executor that is being launched. For example, to successfully launch an executor inside a network namespace, one needs to set LIBPROCESS_IP to point to the container IP, otherwise the executor tries to bind to the Slave IP which may be invalid inside the namespace. Another example is where the file system isolator should be able to specify the WORK_DIR depending on if a new rootfs is used.",2 MESOS-3134,"Port bootstrap to CMake","Bootstrap does a lot of significant things, like setting up the git commit hooks. We will want something like bootstrap to run also on systems that don't have bash -- ideally this should just run in CMake itself.",5 MESOS-3135,"Publish MasterInfo to ZK using JSON","Following from MESOS-2340, which now allows Master to correctly decode JSON information ({{MasterInfo}}) published to Zookeeper, we can now enable the Master Leader Contender to serialize it too in JSON.",2 MESOS-3138,"PersistentVolumeTest.SlaveRecovery test fails on OSX","With a clean build ({{make clean}}) running this tests fails: {code} GTEST_FILTER=""PersistentVolumeTest.*"" make check {code} This is the log: {noformat} [==========] Running 7 tests from 1 test case. [----------] Global test environment set-up. [----------] 7 tests from PersistentVolumeTest [ RUN ] PersistentVolumeTest.SendingCheckpointResourcesMessage [ OK ] PersistentVolumeTest.SendingCheckpointResourcesMessage (189 ms) [ RUN ] PersistentVolumeTest.ResourcesCheckpointing [ OK ] PersistentVolumeTest.ResourcesCheckpointing (86 ms) [ RUN ] PersistentVolumeTest.PreparePersistentVolume [ OK ] PersistentVolumeTest.PreparePersistentVolume (82 ms) [ RUN ] PersistentVolumeTest.MasterFailover [ OK ] PersistentVolumeTest.MasterFailover (130 ms) [ RUN ] PersistentVolumeTest.IncompatibleCheckpointedResources [ OK ] PersistentVolumeTest.IncompatibleCheckpointedResources (74 ms) [ RUN ] PersistentVolumeTest.AccessPersistentVolume I0723 11:21:40.265787 1955922688 exec.cpp:132] Version: 0.24.0 I0723 11:21:40.268676 174858240 exec.cpp:206] Executor registered on slave 20150723-112140-16777343-61858-2866-S0 E0723 11:21:40.268697 178077696 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57] E0723 11:21:40.273510 178077696 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57] Registered executor on localhost Starting task 39e32f2b-475e-4754-9e3d-39fd56fb787b Forked command at 2911 sh -c 'echo abc > path1/file' E0723 11:21:40.281900 178077696 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57] Command exited with status 0 (pid: 2911) E0723 11:21:40.389068 178077696 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57] [ OK ] PersistentVolumeTest.AccessPersistentVolume (421 ms) [ RUN ] PersistentVolumeTest.SlaveRecovery I0723 11:21:40.639749 1955922688 exec.cpp:132] Version: 0.24.0 I0723 11:21:40.641904 187400192 exec.cpp:206] Executor registered on slave 20150723-112140-16777343-61858-2866-S0 E0723 11:21:40.641943 191156224 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57] E0723 11:21:40.646507 191156224 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57] Registered executor on localhost Starting task 809fa50f-bee0-4c9b-a770-434183a9650b sh -c 'while true; do test -d path1; done' Forked command at 2941 E0723 11:21:40.655097 191156224 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57] I0723 11:21:40.671840 186863616 exec.cpp:252] Received reconnect request from slave 20150723-112140-16777343-61858-2866-S0 E0723 11:21:40.671953 191156224 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57] I0723 11:21:40.672744 187400192 exec.cpp:229] Executor re-registered on slave 20150723-112140-16777343-61858-2866-S0 E0723 11:21:40.672839 191156224 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57] Re-registered executor on localhost ../../src/tests/persistent_volume_tests.cpp:709: Failure Value of: status2.get().state() Actual: TASK_FAILED Expected: TASK_KILLED [ FAILED ] PersistentVolumeTest.SlaveRecovery (286 ms) [----------] 7 tests from PersistentVolumeTest (1268 ms total) [----------] Global test environment tear-down [==========] 7 tests from 1 test case ran. (1289 ms total) [ PASSED ] 6 tests. [ FAILED ] 1 test, listed below: [ FAILED ] PersistentVolumeTest.SlaveRecovery 1 FAILED TEST YOU HAVE 8 DISABLED TESTS {noformat}",2 MESOS-3139,"Incorporate CMake into standard documentation","Right now it's anyone's guess how to build with CMake. If we want people to use it, we should put up documentation. The central challenge is that the CMake instructions will be slightly different for different platforms. For example, on Linux, the gist of the build is basically the same as autotools; you pull down the system dependencies (like APR, _etc_.), and then: ``` ./bootstrap mkdir build-cmake && cd build-cmake cmake .. make ``` But, on Windows, it will be somewhat more complicated. There is no bootstrap step, for example, because Windows doesn't have bash natively. And even when we put that in, you'll still have to build the glog stuff out-of-band because CMake has no way of booting up Visual Studio and calling ""build."" So practically, we need to figure out: * What our build story is for different platforms * Write specific instructions for our ""core"" target platforms.",13 MESOS-3140,"Implement Docker remote puller","Given a Docker image name and registry host URL, fetches the image. If necessary, it will download the manifest and layers from the registry host. It will place the layers and image manifest into persistent store. Done when a Docker image can be successfully stored and retrieved using 'put' and 'get' methods.",5 MESOS-3141,"Compiler warning when mocking function type has an enum return type.","The purpose of this ticket is to document a very cryptic error message (actually a warning that gets propagated by {{-Werror}}) that gets generated by {{clang-3.5}} from {{gmock}} source code when trying to mock a perfectly innocent-looking function. h3. Problem The following code is attempting to mock a {{MesosExecutorDriver}}: {code} class MockMesosExecutorDriver : public MesosExecutorDriver { public: MockMesosExecutorDriver(mesos::Executor* executor) : MesosExecutorDriver(executor) {} MOCK_METHOD1(sendStatusUpdate, Status(const TaskStatus&)); }; {code} The above code generates the following error message: {noformat} In file included from ../3rdparty/libprocess/3rdparty/gmock-1.6.0/include/gmock/gmock.h:58: In file included from ../3rdparty/libprocess/3rdparty/gmock-1.6.0/include/gmock/gmock-actions.h:46: ../3rdparty/libprocess/3rdparty/gmock-1.6.0/include/gmock/internal/gmock-internal-utils.h:355:10: error: indirection of non-volatile null pointer will be deleted, not trap [-Werror,-Wnull-dereference] return *static_cast::type*>(__null); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../3rdparty/libprocess/3rdparty/gmock-1.6.0/include/gmock/gmock-actions.h:78:22: note: in instantiation of function template specialization 'testing::internal::Invalid' requested here return internal::Invalid(); ^ ../3rdparty/libprocess/3rdparty/gmock-1.6.0/include/gmock/gmock-actions.h:190:43: note: in instantiation of member function 'testing::internal::BuiltInDefaultValue::Get' requested here internal::BuiltInDefaultValue::Get() : *value_; ^ ../3rdparty/libprocess/3rdparty/gmock-1.6.0/include/gmock/gmock-spec-builders.h:1435:34: note: in instantiation of member function 'testing::DefaultValue::Get' requested here return DefaultValue::Get(); ^ ../3rdparty/libprocess/3rdparty/gmock-1.6.0/include/gmock/gmock-spec-builders.h:1334:22: note: in instantiation of member function 'testing::internal::FunctionMockerBase::PerformDefaultAction' requested here func_mocker->PerformDefaultAction(args, call_description)); ^ ../3rdparty/libprocess/3rdparty/gmock-1.6.0/include/gmock/gmock-spec-builders.h:1448:26: note: in instantiation of function template specialization 'testing::internal::ActionResultHolder::PerformDefaultAction' requested here return ResultHolder::PerformDefaultAction(this, args, call_description); ^ ../3rdparty/libprocess/3rdparty/gmock-1.6.0/include/gmock/gmock-generated-function-mockers.h:81:7: note: in instantiation of member function 'testing::internal::FunctionMockerBase::UntypedPerformDefaultAction' requested here class FunctionMocker : public ^ ../3rdparty/libprocess/3rdparty/gmock-1.6.0/include/gmock/internal/gmock-internal-utils.h:355:10: note: consider using __builtin_trap() or qualifying pointer with 'volatile' return *static_cast::type*>(__null); ^ {noformat} The source of the issue here is that {{Status}} is an {{enum}}. In {{gmock-1.6.0/include/gmock/internal/gmock-internal-utils.h}} you can find the following function: {code} template T Invalid() { return *static_cast::type*>(NULL); } {code} This function gets called with the return type of a mocked function. In our case, the return type of the mocked function is {{Status}}. Attempting to compile the following minimal example with {{clang-3.5}} reproduces the error message: {code} #include template T invalid() { return *static_cast::type *>(nullptr); } enum E { A, B }; int main() { invalid(); } {code} * See it online on [GCC Explorer|https://goo.gl/t1FepZ] Note that if the type is not an {{enum}}, the warning is not generated. This is why existing mocked functions that return non-{{enum}} types such as {{Future}} does not encounter this issue. h3. Solutions The simplest solution is to add {{-Wno-null-deference}} to {{mesos_tests_CPPFLAGS}} in {{src/Makefile.am}}. {code} mesos_tests_CPPFLAGS = $(MESOS_CPPFLAGS) -Wno-null-dereference {code} Another solution is to upgrade {{gmock}} from *1.6* to *1.7* because this problem is solved in the newer versions. In gmock 1.7 {code} template inline T Invalid() { return const_cast::type&>( *static_cast::type*>(NULL)); } {code} Add volatile could avoid this warning. https://goo.gl/opCiLC ",3 MESOS-3142,"As a Developer I want a better way to run shell commands","When reviewing the code in [r/36425|https://reviews.apache.org/r/36425/] [~benjaminhindman] noticed that there is a better abstraction that is possible to introduce for {{os::shell()}} that will simplify the caller's life. Instead of having to handle all possible outcomes, we propose to refactor {{os::shell()}} as follows: {code} /** * Returns the output from running the specified command with the shell. */ Try shell(const string& command) { // Actually handle the WIFEXITED, WIFSIGNALED here! } {code} where the returned string is {{stdout}} and, should the program be signaled, or exit with a non-zero exit code, we will simply return a {{Failure}} with an error message that will encapsulate both the returned/signaled state, and, possibly {{stderr}}. And some test driven development: {code} EXPECT_ERROR(os::shell(""false"")); EXPECT_SOME(os::shell(""true"")); EXPECT_SOME_EQ(""hello world"", os::shell(""echo hello world"")); {code} Alternatively, the caller can ask to have {{stderr}} conflated with {{stdout}}: {code} Try outAndErr = os::shell(""myCmd --foo 2>&1""); {code} However, {{stderr}} will be ignored by default: {code} // We don't read standard error by default. EXPECT_SOME_EQ("""", os::shell(""echo hello world 1>&2"")); // We don't even read stderr if something fails (to return in Try::error). Try output = os::shell(""echo hello world 1>&2 && false""); EXPECT_ERROR(output); EXPECT_FALSE(strings::contains(output.error(), ""hello world"")); {code} An analysis of existing usage shows that in almost all cases, the caller only cares {{if not error}}; in fact, the actual exit code is read only once, and even then, in a test case. We believe this will simplify the API to the caller, and will significantly reduce the length and complexity at the calling sites (<6 LOC against the current 20+).",2 MESOS-3143,"Disable endpoints rule fails to recognize HTTP path delegates","In mesos, one can use the flag {{--firewall_rules}} to disable endpoints. Disabled endpoints will return a _403 Forbidden_ response whenever someone tries to access endpoints. Libprocess support adding one default delegate for endpoints, which is the default process id which handles endpoints if no process id was given. For example, the default id of the master libprocess process is {{master}} which is also set as the delegate for the master system process, so a request to the endpoint {{http://master-address:5050/state.json}} will effectively be resolved by {{http://master-address:5050/master/state.json}}. But if one disables {{/state.json}} because of how delegates work, it can still access {{/master/state.json}}. The only workaround is to disabled both enpoints.",2 MESOS-3144,"Update Homebrew formula for Mesos (Mac OSX)","We have pushed a [pull request|https://github.com/Homebrew/homebrew/pull/42099] to Homebrew for the new 0.23 formula. Once accepted, we must verify that this works on a Mac OSX device. This would also be a great time to ensure our documentation is up-to-date. Currently, the Homebrew check fails, as they have deprecated SHA-1 checksums: {noformat} Error Message failed: brew audit mesos Stacktrace Error: 7 problems in 1 formula mesos: * Stable resource ""protobuf"": SHA1 checksums are deprecated, please use SHA256 * Stable resource ""python-gflags"": SHA1 checksums are deprecated, please use SHA256 * Stable resource ""six"": SHA1 checksums are deprecated, please use SHA256 * Stable resource ""google-apputils"": SHA1 checksums are deprecated, please use SHA256 * Stable resource ""python-dateutil"": SHA1 checksums are deprecated, please use SHA256 * Stable resource ""boto"": SHA1 checksums are deprecated, please use SHA256 * Stable resource ""pytz"": SHA1 checksums are deprecated, please use SHA256 {noformat} Don't know enough about Homebrew to really figure out what is going on here; nor how to fix this. The Mesos SHA-256 has been correctly entered and computed via the [Online SHA/MD5 calculator|https://md5file.com/calculator]. I guess, we should go download the packages and compute their SHA-256 and/or research from the respective download sites whether they publish the SHA.",1 MESOS-3145,"Using a unresolvable hostname crashes the framework on registration","The following commands trigger the crash: {noformat} $ sudo hostname foo # an unresolvable hostname $ sudo ./bin/mesos-master.sh --ip=127.0.0.1 --work_dir=/var/lib/mesos $ LIBPROCESS_IP=127.0.0.1 ./src/mesos-execute --master=127.0.0.1:5050 --name=bar --command=""while true; do sleep 100; done"" {noformat} The crash output: {noformat} WARNING: Logging before InitGoogleLogging() is written to STDERR W0724 14:20:39.960733 1925993216 sched.cpp:1487] ************************************************** Scheduler driver bound to loopback interface! Cannot communicate with remote master(s). You might want to set 'LIBPROCESS_IP' environment variable to use a routable IP address. ************************************************** ABORT: (../../3rdparty/libprocess/3rdparty/stout/include/stout/try.hpp:85): Try::get() but state == ERROR: nodename nor servname provided, or not known[1] 24560 abort LIBPROCESS_IP=127.0.0.1 ./src/mesos-execute --master=127.0.0.1:5050 {noformat}",1 MESOS-3146,"Add a new API call to the allocator to update available resources","This ticket is to track the {{updateAvailable}} API call being added to the allocator which updates the available resources in the allocator. It's used for master endpoints for dynamic reservation and persistent volumes. {{updateAvailable}} is similar to {{updateSlave}} except that {{updateAvailable}} never leaves the allocator in an over-allocated state.",8 MESOS-3148,"Resolve issue with hanging tests with Zookeeper","See MESOS-2736 for the original issue; the submitted [Review|https://reviews.apache.org/r/36663] currently has no tests, the one posted in the subsequent [r/3687|https://reviews.apache.org/r/36807] currently hangs when ran after the other {{TEST_F(MasterZooKeeperTest, LostZooKeeperCluster)}}. The issue is around the {{await()}} in {{StartMaster()}} ({{cluster.hpp #430}}) that waits indefinitely for the master recovery. ",1 MESOS-3152,"Need for HTTP delete requests","As we decided to create a more restful api for managing Quota request. Therefore we also want to use the HTTP Delete request and hence need to enable the libprocess/http to send delete request besides get and post requests.",1 MESOS-3153,"Add tests for HTTPS SSL socket communication","Unit tests are lacking for the following cases: 1. HTTPS Post with ""None"" payload. 2. Verification of HTTPS payload on the SSL socket(maybe decode to a Request object) 3. http -> ssl socket 4. https -> raw socket.",3 MESOS-3154,"Enable Mesos Agent Node to use arbitrary script / module to figure out IP, HOSTNAME","Following from MESOS-2902 we want to enable the same functionality in the Mesos Agents too. This is probably best done once we implement the new {{os::shell}} semantics, as described in MESOS-3142.",1 MESOS-3158,"Libprocess Process: Join runqueue workers during finalization","The lack of synchronization between ProcessManager destruction and the thread pool threads running the queued processes means that the shared state that is part of the ProcessManager gets destroyed prematurely. Synchronizing the ProcessManager destructor with draining the work queues and stopping the workers will allow us to not require leaking the shared state to avoid use beyond destruction.",3 MESOS-3161,"Document using the gold linker for faster development on linux.","The [gold linker|https://en.wikipedia.org/wiki/Gold_(linker)] seems to provide a decent speedup (about ~20%) on a parallel build. From a quick test: {noformat: title=timings for make check -j24 GTEST_FILTER="""" w/ 24 hyperthreaded cores} gold: real 7m18.526s user 81m21.213s sys 5m17.224s default ld: real 9m7.908s user 85m13.466s sys 5m52.199s {noformat} On CentOS 5 w/ devtoolset-2: {noformat} sudo /usr/sbin/alternatives --altdir /opt/rh/devtoolset-2/root/etc/alternatives --admindir /opt/rh/devtoolset-2/root/var/lib/alternatives --set ld /opt/rh/devtoolset-2/root/usr/bin/ld.gold {noformat} On Ubuntu: {noformat} sudo update-alternatives --install /usr/bin/ld ld /usr/bin/gold 1 {noformat} Ideally we could this out on the website, with instructions for each OS.",3 MESOS-3162,"Provide a means to check http connection equality for streaming connections.","If one uses an http::Pipe::Writer to stream a response, one cannot compare the writer with another to see if the connection has changed. This is useful for example, in the master's http api when there is asynchronous disconnection logic. When we handle the disconnection, it's possible for the scheduler to have re-subscribed, and so the master needs to tell if the disconnection event is relevant for the current connection before taking action.",3 MESOS-3163,"Proper handling of 'query' and/or 'fragment' out of 'path' in http handler.","The libprocess http.cpp post/get handlers currently do not consider query and fragments parts of the path correctly. E.g. {code} if (path.isSome()) { // TODO(benh): Get 'query' and/or 'fragment' out of 'path'. url.path = strings::join(""/"", url.path, path.get()); } {code}",1 MESOS-3164,"Introduce QuotaInfo message","A {{QuotaInfo}} protobuf message is internal representation for quota related information (e.g. for persisting quota). The protobuf message should be extendable for future needs and allows for easy aggregation across roles and operator principals. It may also be used to pass quota information to allocators.",3 MESOS-3165,"Persist and recover quota to/from Registry","To persist quotas across failovers, the Master should save them in the registry. To support this, we shall: * Introduce a Quota state variable in registry.proto; * Extend the Operation interface so that it supports a ‘Quota’ accumulator (see src/master/registrar.hpp); * Introduce AddQuota / RemoveQuota operations; * Recover quotas from the registry on failover to the Master’s internal::master::Role struct; * Extend RegistrarTest with quota-specific tests. NOTE: Registry variable can be rather big for production clusters (see MESOS-2075). While it should be fine for MVP to add quota information to registry, we should consider storing Quota separately, as this does not need to be in sync with slaves update. However, currently adding more variable is not supported by the registrar. While the Agents are reregistering (note they may fail to do so), the information about what part of the quota is allocated is only partially available to the Master. In other words, the state of the quota allocation is reconstructed as Agents reregister. During this period, some roles may be under quota from the perspective of the newly elected Master. The same problem exists on the allocator side: it may think the cluster is under quota and may eagerly try to satisfy quotas before enough Agents reregister, which may result in resources being allocated to frameworks beyond their quota. To address this issue and also to avoid panicking and generating under quota alerts, the Master should give a certain amount of time for the majority (e.g. 80%) of the Agents to reregister before reporting any quota status and notifying the allocator about granted quotas.",5 MESOS-3166,"Design doc for docker image registry client","Create design document for the docker registry Authenticator component so that we have a baseline for the implementation. ",3 MESOS-3167,"Design doc for versioning the HTTP API","In concert with the release of the HTTP API, we would also like to come up with a versioning strategy. This enables to do a meaningful 1.0 release.",3 MESOS-3168,"MesosZooKeeperTest fixture can have side effects across tests","MesosZooKeeperTest fixture doesn't restart the ZooKeeper server for each test. This means if a test shuts down the ZooKeeper server, the next test (using the same fixture) might fail. For an example see https://reviews.apache.org/r/36807/",2 MESOS-3169,"FrameworkInfo should only be updated if the re-registration is valid","See Ben Mahler's comment in https://reviews.apache.org/r/32961/ FrameworkInfo should not be updated if the re-registration is invalid. This can happen in a few cases under the branching logic, so this requires some refactoring. Notice that a {code}FrameworkErrorMessage{code} can be generated both inside {code}else if (from != framework->pid){code} as well as from inside {code}failoverFramework(framework, from);{code}",2 MESOS-3171,"Fetcher Tests use EXPECT while subsequent logic relies on the outcome.","The fetcher tests use EXPECT validation for critical measures (e.g. non-empty results) and the subsequent logic releis on this (i.e. by accessing the first element). In such cases we should use ASSERT/CHECK.",1 MESOS-3173,"Mark Path::basename, Path::dirname as const functions.","The functions Path::basename and Path::dirname in stout/path.hpp are not marked const, although they could. Marking them const would remove some ambiguities in the usage of these functions.",1 MESOS-3174,"Fetcher logs erroneous message when successfully extracting an archive","When fetching an asset while not using the cache, the fetcher may erroneously report this: ""Copying instead of extracting resource from URI with 'extract' flag, because it does not seem to be an archive: "". This message appears in the stderr log in the sandbox no matter whether extraction succeeded or not. It should be absent after successful extraction. ",1 MESOS-3178,"Perform a self bind mount of rootfs itself in fs::chroot::enter.","Syscall 'pivot_root' requires that the old and the new root are not in the same filesystem. Otherwise, the user will receive a ""Device or resource busy"" error. Currently, we rely on the provisioner to prepare the rootfs and do proper bind mount if needed so that pivot_root can succeed. The drawback of this approach is that it potentially pollutes the host mount table which requires cleanup logics. For instance, in the test, we create a test rootfs by copying the host files. We need to do a self bind mount so that we can pivot_root on it. That pollute the host mount table and it might leak mounts if test crashes before we do the lazy umount: https://github.com/apache/mesos/blob/master/src/tests/containerizer/launch_tests.cpp#L96-L102 What I propose is that we always perform a recursive self bind mount of rootfs itself in fs::chroot::enter (after enter the new mount namespace). Seems that this is also done in libcontainer: https://github.com/opencontainers/runc/blob/master/libcontainer/rootfs_linux.go#L402",2 MESOS-3179,"Create a test abstraction for preparing test rootfs.","Several tests need this abstraction, so it's better to unify them. For example, src/tests/containerizer/launch_tests.cpp needs to create a test rootfs. We also need that to test filesystem isolators. The test rootfs can be created by copying files/directories from host file system.",3 MESOS-3182,"Make Master::registerFramework() and Master::reregisterFramework() call into Master::subscribe()","Currently Master::subscribe() calls into Master::registerFramework() and Master::reregisterFramework(). We should do it the other way around to be consistent with how we did all the other calls.",3 MESOS-3183,"Documentation images do not load","Any images which are referenced from the generated docs ({{docs/*.md}}) do not show up on the website. For example: * [Architecture|http://mesos.apache.org/documentation/latest/architecture/] * [External Containerizer|http://mesos.apache.org/documentation/latest/external-containerizer/] * [Fetcher Cache Internals|http://mesos.apache.org/documentation/latest/fetcher-cache-internals/] * [Maintenance|http://mesos.apache.org/documentation/latest/maintenance/] * [Oversubscription|http://mesos.apache.org/documentation/latest/oversubscription/] ",3 MESOS-3185,"Refactor Subprocess logic in linux/perf.cpp to use common subroutine","MESOS-2834 will enhance the perf isolator to support the different output formats provided by difference kernel versions. In order to achieve this, it requires to execute the ""perf --version"" command. We should decompose the existing Subcommand processing in perf so that we can share the implementation between the multiple uses of perf.",3 MESOS-3189,"TimeTest.Now fails with --enable-libevent","[ RUN ] TimeTest.Now ../../../3rdparty/libprocess/src/tests/time_tests.cpp:50: Failure Expected: (Microseconds(10)) < (Clock::now() - t1), actual: 8-byte object <10-27 00-00 00-00 00-00> vs 0ns [ FAILED ] TimeTest.Now (0 ms)",2 MESOS-3191,"Implement a utility for computing hash","It is useful for both appc and docker to compute and verify image hash.",2 MESOS-3192,"ContainerInfo::Image::AppC::id should be optional","As I commented here: https://reviews.apache.org/r/34136/ Currently ContainerInfo::Image::Appc is defined as the following {noformat:title=} message AppC { required string name = 1; required string id = 2; optional Labels labels = 3; } {noformat} In which the {{id}} is a required field. When users specify the image in tasks they likely will not use an image id (much like when you use docker or rkt to launch containers, you often use {{ubuntu}} or {{ubuntu:latest}} and seldom a SHA512 ID) and we should change it to be optional. The motivating scenario is that: if the frameworks in the Mesos use something like {{image=ubuntu:14.04""}} to run a task and {{image=ubuntu}} defaults to {{image=ubuntu:latest}}, the operator can swap the latest version for all new tasks requesting {{image=ubuntu}}. If they allow users to specify {{image=ubuntu:live}}, they can swap the live version under the covers as well. This allows the operator to release important image updates (e.g., security patches) and have it picked up by new tasks in the cluster without asking the users to update their job/task configs.",1 MESOS-3193,"Implement AppC image discovery.","Appc spec specifies two image discovery mechanisms: simple and meta discovery. We need to have an abstraction for image discovery in AppcStore. For MVP, we can implement the simple discovery first. https://reviews.apache.org/r/34139/",2 MESOS-3194,"Implement a 'read-only' AppC Image Store","It's going to be derived from this: https://reviews.apache.org/r/34140/ (and other related patches) but in the initial 'read-only' version the store's content is prepared by out-of-band mechanisms so the store component in Mesos only needs to provide access to images already in it and recover images upon slave restart. This greatly simplifies the initial version's responsibility and test cases. Features that fetch the images into the store will be added later and they will take into consideration its impact on task start latency and slave restart responsiveness, etc.",5 MESOS-3195,"Fix master metrics for scheduler calls","Currently the master increments metrics for old style messages from the driver but not when it receives Calls. Since the driver is now sending Calls, master should update metrics correctly.",3 MESOS-3196,"Always set TaskStatus.executor_id when sending a status update message from Executor","Currently, the Executor doesn't always set TaskStatus.executor_id. This prevents the Slave TaskStatus label decorator hook from knowing the executor id. An appropriate place to automatically fill in the executor_id is ExecutorProcesS::sendStatusUpdate() since we are already filling in some other information here.",1 MESOS-3197,"MemIsolatorTest/{0,1}.MemUsage fails on OS X","Looks like this is due to {{mlockall}} being unimplemented on OS X. {noformat} [----------] 1 test from MemIsolatorTest/0, where TypeParam = N5mesos8internal5slave23PosixMemIsolatorProcessE [ RUN ] MemIsolatorTest/0.MemUsage Failed to allocate RSS memory: Failed to make pages to be mapped unevictable: Function not implemented../../src/tests/containerizer/isolator_tests.cpp:812: Failure helper.increaseRSS(allocation): Failed to sync with the subprocess ../../src/tests/containerizer/isolator_tests.cpp:815: Failure (usage).failure(): Failed to get usage: No process found at 40558 [ FAILED ] MemIsolatorTest/0.MemUsage, where TypeParam = N5mesos8internal5slave23PosixMemIsolatorProcessE (56 ms) [----------] 1 test from MemIsolatorTest/0 (57 ms total) [----------] 1 test from MemIsolatorTest/1, where TypeParam = N5mesos8internal5tests6ModuleINS_5slave8IsolatorELNS1_8ModuleIDE0EEE [ RUN ] MemIsolatorTest/1.MemUsage Failed to allocate RSS memory: Failed to make pages to be mapped unevictable: Function not implemented../../src/tests/containerizer/isolator_tests.cpp:812: Failure helper.increaseRSS(allocation): Failed to sync with the subprocess ../../src/tests/containerizer/isolator_tests.cpp:815: Failure (usage).failure(): Failed to get usage: No process found at 40572 [ FAILED ] MemIsolatorTest/1.MemUsage, where TypeParam = N5mesos8internal5tests6ModuleINS_5slave8IsolatorELNS1_8ModuleIDE0EEE (50 ms) [----------] 1 test from MemIsolatorTest/1 (50 ms total) {noformat}",2 MESOS-3199,"Validate Quota Requests.","We need to validate quota requests in terms of syntactical and semantical correctness.",3 MESOS-3200,"Remove unused 'fatal' and 'fatalerror' macros","There exist {{fatal}} and {{fatalerror}} macros in both {{libprocess}} and {{stout}}. None of them are currently used as we favor {{glog}}'s {{LOG(FATAL)}}, and therefore should be removed.",1 MESOS-3201,"Libev handle_async can deadlock with run_in_event_loop","Due to the arbitrary nature of the functions that are executed in handle_async, invoking them under the (A) {{watchers_mutex}} can lead to deadlocks if (B) is acquired before calling {{run_in_event_loop}} and (B) is also acquired within the arbitrary function. {code} ==82679== Thread #10: lock order ""0x60774F8 before 0x60768C0"" violated ==82679== ==82679== Observed (incorrect) order is: acquisition of lock at 0x60768C0 ==82679== at 0x4C32145: pthread_mutex_lock (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so) ==82679== by 0x692C9B: __gthread_mutex_lock(pthread_mutex_t*) (gthr-default.h:748) ==82679== by 0x6950BF: std::mutex::lock() (mutex:134) ==82679== by 0x696219: Synchronized synchronize(std::mutex*)::{lambda(std::mutex*)#1}::operator()(std::mutex*) const (synchronized.hpp:58) ==82679== by 0x696238: Synchronized synchronize(std::mutex*)::{lambda(std::mutex*)#1}::_FUN(std::mutex*) (synchronized.hpp:58) ==82679== by 0x6984CF: Synchronized::Synchronized(std::mutex*, void (*)(std::mutex*), void (*)(std::mutex*)) (synchronized.hpp:35) ==82679== by 0x6962DE: Synchronized synchronize(std::mutex*) (synchronized.hpp:60) ==82679== by 0x728FE1: process::handle_async(ev_loop*, ev_async*, int) (libev.cpp:48) ==82679== by 0x761384: ev_invoke_pending (ev.c:2994) ==82679== by 0x7643C4: ev_run (ev.c:3394) ==82679== by 0x728E37: ev_loop (ev.h:826) ==82679== by 0x729469: process::EventLoop::run() (libev.cpp:135) ==82679== ==82679== followed by a later acquisition of lock at 0x60774F8 ==82679== at 0x4C32145: pthread_mutex_lock (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so) ==82679== by 0x4C6F9D: __gthread_mutex_lock(pthread_mutex_t*) (gthr-default.h:748) ==82679== by 0x4C6FED: __gthread_recursive_mutex_lock(pthread_mutex_t*) (gthr-default.h:810) ==82679== by 0x4F5D3D: std::recursive_mutex::lock() (mutex:175) ==82679== by 0x516513: Synchronized synchronize(std::recursive_mutex*)::{lambda(std::recursive_mutex*)#1}::operator()(std::recursive_mutex*) const (synchronized.hpp:58) ==82679== by 0x516532: Synchronized synchronize(std::recursive_mutex*)::{lambda(std::recursive_mutex*)#1}::_FUN(std::recursive_mutex*) (synchronized.hpp:58) ==82679== by 0x52E619: Synchronized::Synchronized(std::recursive_mutex*, void (*)(std::recursive_mutex*), void (*)(std::recursive_mutex*)) (synchronized.hpp:35) ==82679== by 0x5165D4: Synchronized synchronize(std::recursive_mutex*) (synchronized.hpp:60) ==82679== by 0x6BF4E1: process::ProcessManager::use(process::UPID const&) (process.cpp:2127) ==82679== by 0x6C2B8C: process::ProcessManager::terminate(process::UPID const&, bool, process::ProcessBase*) (process.cpp:2604) ==82679== by 0x6C6C3C: process::terminate(process::UPID const&, bool) (process.cpp:3107) ==82679== by 0x692B65: process::Latch::trigger() (latch.cpp:53) {code} This was introduced in https://github.com/apache/mesos/commit/849fc4d361e40062073324153ba97e98e294fdf2",3 MESOS-3203,"MasterAuthorizationTest.DuplicateRegistration test is flaky","[ RUN ] MasterAuthorizationTest.DuplicateRegistration Using temporary directory '/tmp/MasterAuthorizationTest_DuplicateRegistration_NKT3f7' I0804 22:16:01.578500 26185 leveldb.cpp:176] Opened db in 2.188338ms I0804 22:16:01.579172 26185 leveldb.cpp:183] Compacted db in 645075ns I0804 22:16:01.579211 26185 leveldb.cpp:198] Created db iterator in 15766ns I0804 22:16:01.579227 26185 leveldb.cpp:204] Seeked to beginning of db in 1658ns I0804 22:16:01.579238 26185 leveldb.cpp:273] Iterated through 0 keys in the db in 313ns I0804 22:16:01.579282 26185 replica.cpp:744] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0804 22:16:01.579787 26212 recover.cpp:449] Starting replica recovery I0804 22:16:01.580075 26212 recover.cpp:475] Replica is in EMPTY status I0804 22:16:01.581014 26205 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request I0804 22:16:01.581357 26211 recover.cpp:195] Received a recover response from a replica in EMPTY status I0804 22:16:01.581761 26207 recover.cpp:566] Updating replica status to STARTING I0804 22:16:01.582334 26218 master.cpp:377] Master 20150804-221601-2550141356-59302-26185 (d6d349cd895b) started on 172.17.0.152:59302 I0804 22:16:01.582355 26218 master.cpp:379] Flags at startup: --acls="""" --allocation_interval=""1secs"" --allocator=""HierarchicalDRF"" --authenticate=""true"" --authenticate_slaves=""true"" --authenticators=""crammd5"" --credentials=""/tmp/MasterAuthorizationTest_DuplicateRegistration_NKT3f7/credentials"" --framework_sorter=""drf"" --help=""false"" --initialize_driver_logging=""true"" --log_auto_initialize=""true"" --logbufsecs=""0"" --logging_level=""INFO"" --max_slave_ping_timeouts=""5"" --quiet=""false"" --recovery_slave_removal_limit=""100%"" --registry=""replicated_log"" --registry_fetch_timeout=""1mins"" --registry_store_timeout=""25secs"" --registry_strict=""true"" --root_submissions=""true"" --slave_ping_timeout=""15secs"" --slave_reregister_timeout=""10mins"" --user_sorter=""drf"" --version=""false"" --webui_dir=""/mesos/mesos-0.24.0/_inst/share/mesos/webui"" --work_dir=""/tmp/MasterAuthorizationTest_DuplicateRegistration_NKT3f7/master"" --zk_session_timeout=""10secs"" I0804 22:16:01.582711 26218 master.cpp:424] Master only allowing authenticated frameworks to register I0804 22:16:01.582722 26218 master.cpp:429] Master only allowing authenticated slaves to register I0804 22:16:01.582728 26218 credentials.hpp:37] Loading credentials for authentication from '/tmp/MasterAuthorizationTest_DuplicateRegistration_NKT3f7/credentials' I0804 22:16:01.582929 26204 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 421543ns I0804 22:16:01.582950 26204 replica.cpp:323] Persisted replica status to STARTING I0804 22:16:01.583032 26218 master.cpp:468] Using default 'crammd5' authenticator I0804 22:16:01.583132 26211 recover.cpp:475] Replica is in STARTING status I0804 22:16:01.583154 26218 master.cpp:505] Authorization enabled I0804 22:16:01.583356 26214 whitelist_watcher.cpp:79] No whitelist given I0804 22:16:01.583411 26217 hierarchical.hpp:346] Initialized hierarchical allocator process I0804 22:16:01.583976 26213 replica.cpp:641] Replica in STARTING status received a broadcasted recover request I0804 22:16:01.584187 26209 recover.cpp:195] Received a recover response from a replica in STARTING status I0804 22:16:01.584581 26213 master.cpp:1495] The newly elected leader is master@172.17.0.152:59302 with id 20150804-221601-2550141356-59302-26185 I0804 22:16:01.584609 26213 master.cpp:1508] Elected as the leading master! I0804 22:16:01.584627 26213 master.cpp:1278] Recovering from registrar I0804 22:16:01.584656 26204 recover.cpp:566] Updating replica status to VOTING I0804 22:16:01.584770 26212 registrar.cpp:313] Recovering registrar I0804 22:16:01.585261 26218 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 370526ns I0804 22:16:01.585285 26218 replica.cpp:323] Persisted replica status to VOTING I0804 22:16:01.585412 26216 recover.cpp:580] Successfully joined the Paxos group I0804 22:16:01.585667 26216 recover.cpp:464] Recover process terminated I0804 22:16:01.586047 26213 log.cpp:661] Attempting to start the writer I0804 22:16:01.587164 26211 replica.cpp:477] Replica received implicit promise request with proposal 1 I0804 22:16:01.587549 26211 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 358261ns I0804 22:16:01.587568 26211 replica.cpp:345] Persisted promised to 1 I0804 22:16:01.588173 26209 coordinator.cpp:230] Coordinator attemping to fill missing position I0804 22:16:01.589316 26208 replica.cpp:378] Replica received explicit promise request for position 0 with proposal 2 I0804 22:16:01.589700 26208 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 351778ns I0804 22:16:01.589721 26208 replica.cpp:679] Persisted action at 0 I0804 22:16:01.590698 26213 replica.cpp:511] Replica received write request for position 0 I0804 22:16:01.590754 26213 leveldb.cpp:438] Reading position from leveldb took 31557ns I0804 22:16:01.591147 26213 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 321842ns I0804 22:16:01.591167 26213 replica.cpp:679] Persisted action at 0 I0804 22:16:01.591790 26217 replica.cpp:658] Replica received learned notice for position 0 I0804 22:16:01.592133 26217 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 315281ns I0804 22:16:01.592155 26217 replica.cpp:679] Persisted action at 0 I0804 22:16:01.592180 26217 replica.cpp:664] Replica learned NOP action at position 0 I0804 22:16:01.592686 26211 log.cpp:677] Writer started with ending position 0 I0804 22:16:01.593729 26205 leveldb.cpp:438] Reading position from leveldb took 26394ns I0804 22:16:01.596165 26209 registrar.cpp:346] Successfully fetched the registry (0B) in 11.343104ms I0804 22:16:01.596281 26209 registrar.cpp:445] Applied 1 operations in 26242ns; attempting to update the 'registry' I0804 22:16:01.598415 26212 log.cpp:685] Attempting to append 178 bytes to the log I0804 22:16:01.598563 26215 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 1 I0804 22:16:01.599324 26215 replica.cpp:511] Replica received write request for position 1 I0804 22:16:01.599778 26215 leveldb.cpp:343] Persisting action (197 bytes) to leveldb took 420523ns I0804 22:16:01.599800 26215 replica.cpp:679] Persisted action at 1 I0804 22:16:01.600349 26204 replica.cpp:658] Replica received learned notice for position 1 I0804 22:16:01.600684 26204 leveldb.cpp:343] Persisting action (199 bytes) to leveldb took 310315ns I0804 22:16:01.600706 26204 replica.cpp:679] Persisted action at 1 I0804 22:16:01.600723 26204 replica.cpp:664] Replica learned APPEND action at position 1 I0804 22:16:01.601632 26213 registrar.cpp:490] Successfully updated the 'registry' in 5.287936ms I0804 22:16:01.601747 26213 registrar.cpp:376] Successfully recovered registrar I0804 22:16:01.601826 26215 log.cpp:704] Attempting to truncate the log to 1 I0804 22:16:01.601948 26210 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 2 I0804 22:16:01.602145 26208 master.cpp:1305] Recovered 0 slaves from the Registry (139B) ; allowing 10mins for slaves to re-register I0804 22:16:01.602859 26219 replica.cpp:511] Replica received write request for position 2 I0804 22:16:01.603181 26219 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 284713ns I0804 22:16:01.603209 26219 replica.cpp:679] Persisted action at 2 I0804 22:16:01.603984 26211 replica.cpp:658] Replica received learned notice for position 2 I0804 22:16:01.604313 26211 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 302445ns I0804 22:16:01.604365 26211 leveldb.cpp:401] Deleting ~1 keys from leveldb took 29354ns I0804 22:16:01.604387 26211 replica.cpp:679] Persisted action at 2 I0804 22:16:01.604408 26211 replica.cpp:664] Replica learned TRUNCATE action at position 2 I0804 22:16:01.616402 26185 sched.cpp:164] Version: 0.24.0 I0804 22:16:01.616902 26209 sched.cpp:262] New master detected at master@172.17.0.152:59302 I0804 22:16:01.617000 26209 sched.cpp:318] Authenticating with master master@172.17.0.152:59302 I0804 22:16:01.617019 26209 sched.cpp:325] Using default CRAM-MD5 authenticatee I0804 22:16:01.617324 26212 authenticatee.cpp:115] Creating new client SASL connection I0804 22:16:01.617550 26209 master.cpp:4405] Authenticating scheduler-ac5e7b68-e2d2-441c-a5f5-60c1ff8cf00c@172.17.0.152:59302 I0804 22:16:01.617641 26212 authenticator.cpp:406] Starting authentication session for crammd5_authenticatee(259)@172.17.0.152:59302 I0804 22:16:01.617858 26208 authenticator.cpp:92] Creating new server SASL connection I0804 22:16:01.618140 26216 authenticatee.cpp:206] Received SASL authentication mechanisms: CRAM-MD5 I0804 22:16:01.618191 26216 authenticatee.cpp:232] Attempting to authenticate with mechanism 'CRAM-MD5' I0804 22:16:01.618324 26213 authenticator.cpp:197] Received SASL authentication start I0804 22:16:01.618413 26213 authenticator.cpp:319] Authentication requires more steps I0804 22:16:01.618557 26216 authenticatee.cpp:252] Received SASL authentication step I0804 22:16:01.618664 26216 authenticator.cpp:225] Received SASL authentication step I0804 22:16:01.618703 26216 auxprop.cpp:102] Request to lookup properties for user: 'test-principal' realm: 'd6d349cd895b' server FQDN: 'd6d349cd895b' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0804 22:16:01.618719 26216 auxprop.cpp:174] Looking up auxiliary property '*userPassword' I0804 22:16:01.618778 26216 auxprop.cpp:174] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0804 22:16:01.618820 26216 auxprop.cpp:102] Request to lookup properties for user: 'test-principal' realm: 'd6d349cd895b' server FQDN: 'd6d349cd895b' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0804 22:16:01.618834 26216 auxprop.cpp:124] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0804 22:16:01.618839 26216 auxprop.cpp:124] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0804 22:16:01.618857 26216 authenticator.cpp:311] Authentication success I0804 22:16:01.618954 26219 authenticatee.cpp:292] Authentication success I0804 22:16:01.619035 26204 master.cpp:4435] Successfully authenticated principal 'test-principal' at scheduler-ac5e7b68-e2d2-441c-a5f5-60c1ff8cf00c@172.17.0.152:59302 I0804 22:16:01.619083 26219 authenticator.cpp:424] Authentication session cleanup for crammd5_authenticatee(259)@172.17.0.152:59302 I0804 22:16:01.619309 26208 sched.cpp:407] Successfully authenticated with master master@172.17.0.152:59302 I0804 22:16:01.619335 26208 sched.cpp:713] Sending SUBSCRIBE call to master@172.17.0.152:59302 I0804 22:16:01.619494 26208 sched.cpp:746] Will retry registration in 439203ns if necessary I0804 22:16:01.619627 26217 master.cpp:1812] Received SUBSCRIBE call for framework 'default' at scheduler-ac5e7b68-e2d2-441c-a5f5-60c1ff8cf00c@172.17.0.152:59302 I0804 22:16:01.619695 26217 master.cpp:1534] Authorizing framework principal 'test-principal' to receive offers for role '*' I0804 22:16:01.620848 26217 sched.cpp:713] Sending SUBSCRIBE call to master@172.17.0.152:59302 I0804 22:16:01.620929 26217 sched.cpp:746] Will retry registration in 2.099193326secs if necessary I0804 22:16:01.621036 26210 master.cpp:1812] Received SUBSCRIBE call for framework 'default' at scheduler-ac5e7b68-e2d2-441c-a5f5-60c1ff8cf00c@172.17.0.152:59302 I0804 22:16:01.621083 26210 master.cpp:1534] Authorizing framework principal 'test-principal' to receive offers for role '*' I0804 22:16:01.621727 26217 master.cpp:1876] Subscribing framework default with checkpointing disabled and capabilities [ ] I0804 22:16:01.621981 26208 sched.cpp:262] New master detected at master@172.17.0.152:59302 I0804 22:16:01.622131 26208 sched.cpp:318] Authenticating with master master@172.17.0.152:59302 I0804 22:16:01.622153 26208 sched.cpp:325] Using default CRAM-MD5 authenticatee I0804 22:16:01.622323 26212 authenticatee.cpp:115] Creating new client SASL connection I0804 22:16:01.622324 26210 hierarchical.hpp:391] Added framework 20150804-221601-2550141356-59302-26185-0000 I0804 22:16:01.622369 26210 hierarchical.hpp:1008] No resources available to allocate! I0804 22:16:01.622386 26210 hierarchical.hpp:908] Performed allocation for 0 slaves in 28592ns I0804 22:16:01.622511 26210 sched.cpp:640] Framework registered with 20150804-221601-2550141356-59302-26185-0000 I0804 22:16:01.622586 26210 sched.cpp:654] Scheduler::registered took 48005ns I0804 22:16:01.622592 26208 master.cpp:4405] Authenticating scheduler-ac5e7b68-e2d2-441c-a5f5-60c1ff8cf00c@172.17.0.152:59302 I0804 22:16:01.622673 26212 authenticator.cpp:406] Starting authentication session for crammd5_authenticatee(260)@172.17.0.152:59302 I0804 22:16:01.622923 26205 authenticator.cpp:92] Creating new server SASL connection I0804 22:16:01.623112 26204 authenticatee.cpp:206] Received SASL authentication mechanisms: CRAM-MD5 I0804 22:16:01.623133 26216 master.cpp:1870] Dropping SUBSCRIBE call for framework 'default' at scheduler-ac5e7b68-e2d2-441c-a5f5-60c1ff8cf00c@172.17.0.152:59302: Re-authentication in progress I0804 22:16:01.623144 26204 authenticatee.cpp:232] Attempting to authenticate with mechanism 'CRAM-MD5' I0804 22:16:01.623258 26215 authenticator.cpp:197] Received SASL authentication start I0804 22:16:01.623313 26215 authenticator.cpp:319] Authentication requires more steps I0804 22:16:01.623394 26215 authenticatee.cpp:252] Received SASL authentication step I0804 22:16:01.623512 26212 authenticator.cpp:225] Received SASL authentication step I0804 22:16:01.623546 26212 auxprop.cpp:102] Request to lookup properties for user: 'test-principal' realm: 'd6d349cd895b' server FQDN: 'd6d349cd895b' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0804 22:16:01.623564 26212 auxprop.cpp:174] Looking up auxiliary property '*userPassword' I0804 22:16:01.623603 26212 auxprop.cpp:174] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0804 22:16:01.623622 26212 auxprop.cpp:102] Request to lookup properties for user: 'test-principal' realm: 'd6d349cd895b' server FQDN: 'd6d349cd895b' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0804 22:16:01.623631 26212 auxprop.cpp:124] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0804 22:16:01.623636 26212 auxprop.cpp:124] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0804 22:16:01.623649 26212 authenticator.cpp:311] Authentication success I0804 22:16:01.623777 26212 authenticatee.cpp:292] Authentication success I0804 22:16:01.623846 26212 master.cpp:4435] Successfully authenticated principal 'test-principal' at scheduler-ac5e7b68-e2d2-441c-a5f5-60c1ff8cf00c@172.17.0.152:59302 I0804 22:16:01.623913 26212 authenticator.cpp:424] Authentication session cleanup for crammd5_authenticatee(260)@172.17.0.152:59302 I0804 22:16:01.624130 26212 sched.cpp:407] Successfully authenticated with master master@172.17.0.152:59302 I0804 22:16:02.583772 26218 hierarchical.hpp:1008] No resources available to allocate! I0804 22:16:02.583818 26218 hierarchical.hpp:908] Performed allocation for 0 slaves in 80538ns I0804 22:16:03.585110 26211 hierarchical.hpp:1008] No resources available to allocate! I0804 22:16:03.585156 26211 hierarchical.hpp:908] Performed allocation for 0 slaves in 69272ns I0804 22:16:04.586539 26214 hierarchical.hpp:1008] No resources available to allocate! I0804 22:16:04.586586 26214 hierarchical.hpp:908] Performed allocation for 0 slaves in 79232ns I0804 22:16:05.587239 26209 hierarchical.hpp:1008] No resources available to allocate! I0804 22:16:05.587293 26209 hierarchical.hpp:908] Performed allocation for 0 slaves in 85128ns I0804 22:16:06.587935 26212 hierarchical.hpp:1008] No resources available to allocate! I0804 22:16:06.587985 26212 hierarchical.hpp:908] Performed allocation for 0 slaves in 78141ns I0804 22:16:07.588817 26214 hierarchical.hpp:1008] No resources available to allocate! I0804 22:16:07.588865 26214 hierarchical.hpp:908] Performed allocation for 0 slaves in 81433ns I0804 22:16:08.589857 26214 hierarchical.hpp:1008] No resources available to allocate! I0804 22:16:08.589906 26214 hierarchical.hpp:908] Performed allocation for 0 slaves in 71929ns I0804 22:16:09.591085 26207 hierarchical.hpp:1008] No resources available to allocate! I0804 22:16:09.591133 26207 hierarchical.hpp:908] Performed allocation for 0 slaves in 78223ns I0804 22:16:10.591737 26207 hierarchical.hpp:1008] No resources available to allocate! I0804 22:16:10.591785 26207 hierarchical.hpp:908] Performed allocation for 0 slaves in 71894ns I0804 22:16:11.593166 26210 hierarchical.hpp:1008] No resources available to allocate! I0804 22:16:11.593221 26210 hierarchical.hpp:908] Performed allocation for 0 slaves in 89782ns I0804 22:16:12.593647 26212 hierarchical.hpp:1008] No resources available to allocate! I0804 22:16:12.593689 26212 hierarchical.hpp:908] Performed allocation for 0 slaves in 69426ns I0804 22:16:13.594154 26210 hierarchical.hpp:1008] No resources available to allocate! I0804 22:16:13.594202 26210 hierarchical.hpp:908] Performed allocation for 0 slaves in 70581ns I0804 22:16:14.594712 26207 hierarchical.hpp:1008] No resources available to allocate! I0804 22:16:14.594758 26207 hierarchical.hpp:908] Performed allocation for 0 slaves in 71201ns I0804 22:16:15.595412 26219 hierarchical.hpp:1008] No resources available to allocate! I0804 22:16:15.595464 26219 hierarchical.hpp:908] Performed allocation for 0 slaves in 85183ns I0804 22:16:16.596201 26217 hierarchical.hpp:1008] No resources available to allocate! I0804 22:16:16.596247 26217 hierarchical.hpp:908] Performed allocation for 0 slaves in 95132ns ../../src/tests/master_authorization_tests.cpp:794: Failure Failed to wait 15secs for frameworkRegisteredMessage I0804 22:16:16.624354 26212 master.cpp:966] Framework 20150804-221601-2550141356-59302-26185-0000 (default) at scheduler-ac5e7b68-e2d2-441c-a5f5-60c1ff8cf00c@172.17.0.152:59302 disconnected I0804 22:16:16.624398 26212 master.cpp:2092] Disconnecting framework 20150804-221601-2550141356-59302-26185-0000 (default) at scheduler-ac5e7b68-e2d2-441c-a5f5-60c1ff8cf00c@172.17.0.152:59302 I0804 22:16:16.624445 26212 master.cpp:2116] Deactivating framework 20150804-221601-2550141356-59302-26185-0000 (default) at scheduler-ac5e7b68-e2d2-441c-a5f5-60c1ff8cf00c@172.17.0.152:59302 I0804 22:16:16.624686 26212 master.cpp:988] Giving framework 20150804-221601-2550141356-59302-26185-0000 (default) at scheduler-ac5e7b68-e2d2-441c-a5f5-60c1ff8cf00c@172.17.0.152:59302 0ns to failover I0804 22:16:16.625641 26219 hierarchical.hpp:474] Deactivated framework 20150804-221601-2550141356-59302-26185-0000 I0804 22:16:16.626688 26218 master.cpp:4180] Framework failover timeout, removing framework 20150804-221601-2550141356-59302-26185-0000 (default) at scheduler-ac5e7b68-e2d2-441c-a5f5-60c1ff8cf00c@172.17.0.152:59302 I0804 22:16:16.626734 26218 master.cpp:4759] Removing framework 20150804-221601-2550141356-59302-26185-0000 (default) at scheduler-ac5e7b68-e2d2-441c-a5f5-60c1ff8cf00c@172.17.0.152:59302 I0804 22:16:16.627074 26218 master.cpp:858] Master terminating I0804 22:16:16.627218 26215 hierarchical.hpp:428] Removed framework 20150804-221601-2550141356-59302-26185-0000 ../../3rdparty/libprocess/include/process/gmock.hpp:365: Failure Actual function call count doesn't match EXPECT_CALL(filter->mock, filter(testing::A()))... Expected args: message matcher (8-byte object <98-98 02-AC 54-2B 00-00>, 1-byte object <97>, 1-byte object ) Expected: to be call...",1 MESOS-3205,"No need to checkpoint container root filesystem path.","Given the design discussed in [MESOS-3004|https://issues.apache.org/jira/browse/MESOS-3004], one container might have multiple provisioned root filesystems. Only checkpointing the root filesystem for ContainerInfo::image does not make sense. Also, we realized that checkpointing container root filesystem path is not necessary because each provisioner should be able to destroy root filesystems for a given container based on a canonical directory layout (e.g., //xxx).",3 MESOS-3207,"C++ style guide is not rendered correctly (code section syntax disregarded)","Some paragraphs at the bottom of docs/mesos-c++-style-guide.md containing code sections are not rendered correctly by the web site generator. It looks fine in a github gist and apparently the syntax used is correct. ",1 MESOS-3208,"Fetch checksum files to inform fetcher cache use","This is the first part of phase 1 as described in the comments for MESOS-2073. We add a field to CommandInfo::URI that contains the URI of a checksum file. When this file has new content, then the contents of the associated value URI needs to be refreshed in the fetcher cache. In this implementation step, we just add the above basic functionality (download, checksum comparison). In later steps, we will add more control flow to cover corner cases and thus make this feature more useful. ",3 MESOS-3211,"As a Python developer I want a simple way to obtain information about Master from ZooKeeper","With the new JSON {{MasterInfo}} published to ZK, we want to provide a simple library class for Python developers to retrieve info about the masters and the leader.",2 MESOS-3212,"As a Java developer I want a simple way to obtain information about Master from ZooKeeper","With the new JSON {{MasterInfo}} published to ZK, we want to provide a simple library class for Java Framework developers to retrieve info about the masters and the leader.",2 MESOS-3213,"Design doc for docker registry token manager","Create design document for describing the component and interaction between Docker Registry Client and remote Docker Registry for token based authorization.",2 MESOS-3215,"CgroupsAnyHierarchyWithPerfEventTest failing on Ubuntu 14.04","[ RUN ] CgroupsAnyHierarchyWithPerfEventTest.ROOT_CGROUPS_Perf ../../src/tests/containerizer/cgroups_tests.cpp:172: Failure (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup '/sys/fs/cgroup/perf_event/mesos_test': Device or resource busy ../../src/tests/containerizer/cgroups_tests.cpp:190: Failure (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup '/sys/fs/cgroup/perf_event/mesos_test': Device or resource busy [ FAILED ] CgroupsAnyHierarchyWithPerfEventTest.ROOT_CGROUPS_Perf (9 ms) [----------] 1 test from CgroupsAnyHierarchyWithPerfEventTest (9 ms total) ",3 MESOS-3222,"Implement docker registry client","Implement the following functionality: - fetch manifest from remote registry based on authorization method dictated by the registry. - fetch image layers from remote registry based on authorization method dictated by the registry.. ",5 MESOS-3223,"Implement token manager for docker registry","Implement the following: - A component that fetches JSON web authorization token from a given registry. - Caches the token keyed on registry, service and scope - Validates the cache for expiry date Nice to have: - Cache gets pruned as tokens are aged beyond expiration time. ",4 MESOS-3225,"some variables in version.hpp use `Type &var` instead of `Type& var`","Some variables in 3rdparty/libprocess/3rdparty/stout/include/stout/version.hpp violate Mesos code style of biding '&' and '*' to the type name (as opposed to binding to the variable name).",1 MESOS-3226,"Introduce an Either type.","We currently don't have an abstraction in stout to capture the notion of having a container with many types and a single value. For example, in our abstractions like Try, rather than being able to say {{Either t}} we must encode two Options ({{Option}}, {{Option}}) with the implicit invariant that exactly one will be set. This also comes in handy in many other places in the code. Note that we have the ability to (1) use C++11 unions now, as well as (2) use boost's variant directly instead of introducing Either. However, creating a named union every time this is needed is verbose, and unions require that we externally track which member is set. For variant, we already use this (e.g. json.hpp), but we can benefit from the better naming as Either. Many languages expose Either as having only two values, left and right. I'd propose making this two or more, as is the case with variant.",5 MESOS-3227,"Implement image chroot support into command executor",NULL,3 MESOS-3235,"FetcherCacheHttpTest.HttpCachedSerialized and FetcherCacheHttpTest.HttpCachedConcurrent are flaky","On OSX, {{make clean && make -j8 V=0 check}}: {code} [----------] 3 tests from FetcherCacheHttpTest [ RUN ] FetcherCacheHttpTest.HttpCachedSerialized HTTP/1.1 200 OK Date: Fri, 07 Aug 2015 17:23:05 GMT Content-Length: 30 I0807 10:23:05.673596 2085372672 exec.cpp:133] Version: 0.24.0 E0807 10:23:05.675884 184373248 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57] I0807 10:23:05.675897 182226944 exec.cpp:207] Executor registered on slave 20150807-102305-139395082-52338-52313-S0 E0807 10:23:05.683980 184373248 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57] Registered executor on 10.0.79.8 Starting task 0 Forked command at 54363 sh -c './mesos-fetcher-test-cmd 0' E0807 10:23:05.694953 184373248 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57] Command exited with status 0 (pid: 54363) E0807 10:23:05.793927 184373248 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57] I0807 10:23:06.590008 2085372672 exec.cpp:133] Version: 0.24.0 E0807 10:23:06.592244 355938304 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57] I0807 10:23:06.592243 353255424 exec.cpp:207] Executor registered on slave 20150807-102305-139395082-52338-52313-S0 E0807 10:23:06.597995 355938304 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57] Registered executor on 10.0.79.8 Starting task 1 Forked command at 54411 sh -c './mesos-fetcher-test-cmd 1' E0807 10:23:06.608708 355938304 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57] Command exited with status 0 (pid: 54411) E0807 10:23:06.707649 355938304 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57] ../../src/tests/fetcher_cache_tests.cpp:860: Failure Failed to wait 15secs for awaitFinished(task.get()) *** Aborted at 1438968214 (unix time) try ""date -d @1438968214"" if you are using GNU date *** [ FAILED ] FetcherCacheHttpTest.HttpCachedSerialized (28685 ms) [ RUN ] FetcherCacheHttpTest.HttpCachedConcurrent PC: @ 0x113723618 process::Owned<>::get() *** SIGSEGV (@0x0) received by PID 52313 (TID 0x118d59000) stack trace: *** @ 0x7fff8fcacf1a _sigtramp @ 0x7f9bc3109710 (unknown) @ 0x1136f07e2 mesos::internal::slave::Fetcher::fetch() @ 0x113862f9d mesos::internal::slave::MesosContainerizerProcess::fetch() @ 0x1138f1b5d _ZZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS2_11ContainerIDERKNS2_11CommandInfoERKNSt3__112basic_stringIcNSC_11char_traitsIcEENSC_9allocatorIcEEEERK6OptionISI_ERKNS2_7SlaveIDES6_S9_SI_SM_SP_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSW_FSU_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_ENKUlPNS_11ProcessBaseEE_clES1D_ @ 0x1138f18cf _ZNSt3__110__function6__funcIZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS5_11ContainerIDERKNS5_11CommandInfoERKNS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEERK6OptionISK_ERKNS5_7SlaveIDES9_SC_SK_SO_SR_EENS2_6FutureIT_EERKNS2_3PIDIT0_EEMSY_FSW_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_EUlPNS2_11ProcessBaseEE_NSI_IS1G_EEFvS1F_EEclEOS1F_ @ 0x1143768cf std::__1::function<>::operator()() @ 0x11435ca7f process::ProcessBase::visit() @ 0x1143ed6fe process::DispatchEvent::visit() @ 0x1127aaaa1 process::ProcessBase::serve() @ 0x114343b4e process::ProcessManager::resume() @ 0x1143431ca process::internal::schedule() @ 0x1143da646 _ZNSt3__114__thread_proxyINS_5tupleIJPFvvEEEEEEPvS5_ @ 0x7fff95090268 _pthread_body @ 0x7fff950901e5 _pthread_start @ 0x7fff9508e41d thread_start Failed to synchronize with slave (it's probably exited) make[3]: *** [check-local] Segmentation fault: 11 make[2]: *** [check-am] Error 2 make[1]: *** [check] Error 2 make: *** [check-recursive] Error 1 {code} This was encountered just once out of 3+ {{make check}}s.",2 MESOS-3236,"Updated slave task label decorator hook to pass in ExecutorInfo.","If that task being launched has a command executor, there is no way for the hook to determine the executor-id for that task. The executor-id is sometimes required by the label decorators for accounting purposes and for preparing ground for executor-environment-decorator (which is not passed the TaskInfo).",1 MESOS-3237,"HTTP requests with nested path are not properly handled by libprocess","For example, if master adds a route ""/api/v1/scheduler"", a handler named ""api/v1/scheduler"" is added to 'master' libprocess. But when a request is posted to the above path, process::visit() looks for a http handler named ""api"" instead of ""api/v1/scheduler"". Ideally libprocess should look for handlers in the following preference order: ""api/v1/scheduler"" --> ""api/v1"" --> ""api"" ",2 MESOS-3251,"http::get API evaluates ""host"" wrongly","Currently libprocess http API sets the ""Host"" header field from the peer socket address (IP:port). The problem is that socket address might not be right HTTP server and might be just a proxy. ",1 MESOS-3252,"Ignore no statistics condition for containers with no qdisc","In PortMappingStatistics::execute, we log the following errors to stderr if the egress rate limiting qdiscs are not configured inside the container. {code} Failed to get the network statistics for the htb qdisc on eth0 Failed to get the network statistics for the fq_codel qdisc on eth0 {code} This can occur because of an error reading the qdisc (statistics function return an error) or because the qdisc does not exist (function returns none). We should not log an error when the qdisc does not exist since this is normal behaviour if the container is created without rate limiting. We do not want to gate this function on the slave rate limiting flag since we would have to compare the behaviour against the flag value at the time the container was created.",2 MESOS-3254,"Cgroup CHECK fails test harness","CHECK in clean up of ContainerizerTest causes test harness to abort rather than fail or skip only perf related tests. [ RUN ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch [ OK ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch (628 ms) [----------] 24 tests from SlaveRecoveryTest/0 (38986 ms total) [----------] 4 tests from MesosContainerizerSlaveRecoveryTest [ RUN ] MesosContainerizerSlaveRecoveryTest.ResourceStatistics ../../src/tests/mesos.cpp:720: Failure cgroups::mount(hierarchy, subsystem): 'perf_event' is already attached to another hierarchy ------------------------------------------------------------- We cannot run any cgroups tests that require a hierarchy with subsystem 'perf_event' because we failed to find an existing hierarchy or create a new one (tried '/tmp/mesos_test_cgroup/perf_event'). You can either remove all existing hierarchies, or disable this test case (i.e., --gtest_filter=-MesosContainerizerSlaveRecoveryTest.*). ------------------------------------------------------------- F0811 17:23:43.874696 12955 mesos.cpp:774] CHECK_SOME(cgroups): '/tmp/mesos_test_cgroup/perf_event' is not a valid hierarchy *** Check failure stack trace: *** @ 0x7fb2fb4835fd google::LogMessage::Fail() @ 0x7fb2fb48543d google::LogMessage::SendToLog() @ 0x7fb2fb4831ec google::LogMessage::Flush() @ 0x7fb2fb485d39 google::LogMessageFatal::~LogMessageFatal() @ 0x4e3f98 _CheckFatal::~_CheckFatal() @ 0x82f25a mesos::internal::tests::ContainerizerTest<>::TearDown() @ 0xc030e3 testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0xbf9050 testing::Test::Run() @ 0xbf912e testing::TestInfo::Run() @ 0xbf9235 testing::TestCase::Run() @ 0xbf94e8 testing::internal::UnitTestImpl::RunAllTests() @ 0xbf97a4 testing::UnitTest::Run() @ 0x4a9df3 main @ 0x7fb2f9371ec5 (unknown) @ 0x4b63ee (unknown) Build step 'Execute shell' marked build as failure",2 MESOS-3262,"HTTPTest.NestedGet is flaky","[ RUN ] HTTPTest.NestedGet ../../../3rdparty/libprocess/src/tests/http_tests.cpp:459: Failure Value of: response.get().status Actual: ""202 Accepted"" Expected: http::statuses[200] Which is: ""200 OK"" *** Aborted at 1439569965 (unix time) try ""date -d @1439569965"" if you are using GNU date *** PC: @ 0x63abe8 testing::UnitTest::AddTestPartResult() *** SIGSEGV (@0x0) received by PID 25766 (TID 0x7f499415c780) from PID 0; stack trace: *** @ 0x7f499224dca0 (unknown) @ 0x63abe8 testing::UnitTest::AddTestPartResult() @ 0x62f6af testing::internal::AssertHelper::operator=() @ 0x43cd78 HTTPTest_NestedGet_Test::TestBody() @ 0x65935e testing::internal::HandleSehExceptionsInMethodIfSupported<>() @ 0x653c5e testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x6349a3 testing::Test::Run() @ 0x635128 testing::TestInfo::Run() @ 0x635778 testing::TestCase::Run() @ 0x63c0e2 testing::internal::UnitTestImpl::RunAllTests() @ 0x65a11d testing::internal::HandleSehExceptionsInMethodIfSupported<>() @ 0x654958 testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x63ae08 testing::UnitTest::Run() @ 0x4877f9 RUN_ALL_TESTS() @ 0x487613 main @ 0x7f49915739f4 __libc_start_main ",2 MESOS-3265,"Starting maintenance needs to deactivate agents and kill tasks.","After using the {{/maintenance/start}} endpoint to begin maintenance on a machine, agents running on said machine should: * Be deactivated such that no offers are sent from that agent. (Investigate if {{Master::deactivate(Slave*)}} can be used or modified for this purpose.) * Kill all tasks still running on the agent (See MESOS-1475). * Prevent other agents on that machine from registering or sending out offers. This will likely involve some modifications to {{Master::register}} and {{Master::reregister}}. ",8 MESOS-3266,"Stopping/Completing maintenance needs to reactivate agents.","After using the {{/maintenance/stop}} endpoint to end maintenance on a machine, any deactivated agents must be reactivated and allowed to register with the master.",5 MESOS-3267,"JSON serialization/deserialization of bytes is incorrect","Currently, we use our own serialization of bytes in json.hpp but we use picojson for deserialization. We've observed that for some bytes the serialization results in a string that is incorrectly decoded by picojson. Example: String = """"\""\\/\b\f\n\r\t\x00\x19 !#[]\x7F\xFF"" Result of our own encoding: ""\""\\\""\\\\\\/\\b\\f\\n\\r\\t\\u0000\\u0019 !#[]\\u007f\xFF\"""" picojson's encoding: ""\""\\\""\\\\\\/\\b\\f\\n\\r\\t\\u0000\\u0019 !#[]\\u007F\\u00FF\"""" Fix: We just use picojson to serialize bytes for consistency.",2 MESOS-3273,"EventCall Test Framework is flaky","Observed this on ASF CI. h/t [~haosdent@gmail.com] Looks like the HTTP scheduler never sent a SUBSCRIBE request to the master. {code} [ RUN ] ExamplesTest.EventCallFramework Using temporary directory '/tmp/ExamplesTest_EventCallFramework_k4vXkx' I0813 19:55:15.643579 26085 exec.cpp:443] Ignoring exited event because the driver is aborted! Shutting down Sending SIGTERM to process tree at pid 26061 Killing the following process trees: [ ] Shutting down Sending SIGTERM to process tree at pid 26062 Shutting down Killing the following process trees: [ ] Sending SIGTERM to process tree at pid 26063 Killing the following process trees: [ ] Shutting down Sending SIGTERM to process tree at pid 26098 Killing the following process trees: [ ] Shutting down Sending SIGTERM to process tree at pid 26099 Killing the following process trees: [ ] WARNING: Logging before InitGoogleLogging() is written to STDERR I0813 19:55:17.161726 26100 process.cpp:1012] libprocess is initialized on 172.17.2.10:60249 for 16 cpus I0813 19:55:17.161888 26100 logging.cpp:177] Logging to STDERR I0813 19:55:17.163625 26100 scheduler.cpp:157] Version: 0.24.0 I0813 19:55:17.175302 26100 leveldb.cpp:176] Opened db in 3.167446ms I0813 19:55:17.176393 26100 leveldb.cpp:183] Compacted db in 1.047996ms I0813 19:55:17.176496 26100 leveldb.cpp:198] Created db iterator in 77155ns I0813 19:55:17.176518 26100 leveldb.cpp:204] Seeked to beginning of db in 8429ns I0813 19:55:17.176527 26100 leveldb.cpp:273] Iterated through 0 keys in the db in 4219ns I0813 19:55:17.176708 26100 replica.cpp:744] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0813 19:55:17.178951 26136 recover.cpp:449] Starting replica recovery I0813 19:55:17.179934 26136 recover.cpp:475] Replica is in EMPTY status I0813 19:55:17.181970 26126 master.cpp:378] Master 20150813-195517-167907756-60249-26100 (297daca2d01a) started on 172.17.2.10:60249 I0813 19:55:17.182317 26126 master.cpp:380] Flags at startup: --acls=""permissive: false register_frameworks { principals { type: SOME values: ""test-principal"" } roles { type: SOME values: ""*"" } } run_tasks { principals { type: SOME values: ""test-principal"" } users { type: SOME values: ""mesos"" } } "" --allocation_interval=""1secs"" --allocator=""HierarchicalDRF"" --authenticate=""false"" --authenticate_slaves=""false"" --authenticators=""crammd5"" --credentials=""/tmp/ExamplesTest_EventCallFramework_k4vXkx/credentials"" --framework_sorter=""drf"" --help=""false"" --initialize_driver_logging=""true"" --log_auto_initialize=""true"" --logbufsecs=""0"" --logging_level=""INFO"" --max_slave_ping_timeouts=""5"" --quiet=""false"" --recovery_slave_removal_limit=""100%"" --registry=""replicated_log"" --registry_fetch_timeout=""1mins"" --registry_store_timeout=""5secs"" --registry_strict=""false"" --root_submissions=""true"" --slave_ping_timeout=""15secs"" --slave_reregister_timeout=""10mins"" --user_sorter=""drf"" --version=""false"" --webui_dir=""/mesos/mesos-0.24.0/src/webui"" --work_dir=""/tmp/mesos-II8Gua"" --zk_session_timeout=""10secs"" I0813 19:55:17.183475 26126 master.cpp:427] Master allowing unauthenticated frameworks to register I0813 19:55:17.183536 26126 master.cpp:432] Master allowing unauthenticated slaves to register I0813 19:55:17.183615 26126 credentials.hpp:37] Loading credentials for authentication from '/tmp/ExamplesTest_EventCallFramework_k4vXkx/credentials' W0813 19:55:17.183859 26126 credentials.hpp:52] Permissions on credentials file '/tmp/ExamplesTest_EventCallFramework_k4vXkx/credentials' are too open. It is recommended that your credentials file is NOT accessible by others. I0813 19:55:17.183969 26123 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request I0813 19:55:17.184306 26126 master.cpp:469] Using default 'crammd5' authenticator I0813 19:55:17.184661 26126 authenticator.cpp:512] Initializing server SASL I0813 19:55:17.185104 26138 recover.cpp:195] Received a recover response from a replica in EMPTY status I0813 19:55:17.185972 26100 containerizer.cpp:143] Using isolation: posix/cpu,posix/mem,filesystem/posix I0813 19:55:17.186058 26135 recover.cpp:566] Updating replica status to STARTING I0813 19:55:17.187001 26138 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 654586ns I0813 19:55:17.187037 26138 replica.cpp:323] Persisted replica status to STARTING I0813 19:55:17.187499 26134 recover.cpp:475] Replica is in STARTING status I0813 19:55:17.187605 26126 auxprop.cpp:66] Initialized in-memory auxiliary property plugin I0813 19:55:17.187710 26126 master.cpp:506] Authorization enabled I0813 19:55:17.188657 26138 replica.cpp:641] Replica in STARTING status received a broadcasted recover request I0813 19:55:17.188853 26131 hierarchical.hpp:346] Initialized hierarchical allocator process I0813 19:55:17.189252 26132 whitelist_watcher.cpp:79] No whitelist given I0813 19:55:17.189321 26134 recover.cpp:195] Received a recover response from a replica in STARTING status I0813 19:55:17.190001 26125 recover.cpp:566] Updating replica status to VOTING I0813 19:55:17.190696 26124 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 357331ns I0813 19:55:17.190775 26124 replica.cpp:323] Persisted replica status to VOTING I0813 19:55:17.190970 26133 recover.cpp:580] Successfully joined the Paxos group I0813 19:55:17.192183 26129 recover.cpp:464] Recover process terminated I0813 19:55:17.192699 26123 slave.cpp:190] Slave started on 1)@172.17.2.10:60249 I0813 19:55:17.192741 26123 slave.cpp:191] Flags at startup: --authenticatee=""crammd5"" --cgroups_cpu_enable_pids_and_tids_count=""false"" --cgroups_enable_cfs=""false"" --cgroups_hierarchy=""/sys/fs/cgroup"" --cgroups_limit_swap=""false"" --cgroups_root=""mesos"" --container_disk_watch_interval=""15secs"" --containerizers=""mesos"" --default_role=""*"" --disk_watch_interval=""1mins"" --docker=""docker"" --docker_kill_orphans=""true"" --docker_remove_delay=""6hrs"" --docker_socket=""/var/run/docker.sock"" --docker_stop_timeout=""0ns"" --enforce_container_disk_quota=""false"" --executor_registration_timeout=""1mins"" --executor_shutdown_grace_period=""5secs"" --fetcher_cache_dir=""/tmp/mesos/fetch"" --fetcher_cache_size=""2GB"" --frameworks_home="""" --gc_delay=""1weeks"" --gc_disk_headroom=""0.1"" --hadoop_home="""" --help=""false"" --initialize_driver_logging=""true"" --isolation=""posix/cpu,posix/mem"" --launcher_dir=""/mesos/mesos-0.24.0/_build/src"" --logbufsecs=""0"" --logging_level=""INFO"" --oversubscribed_resources_interval=""15secs"" --perf_duration=""10secs"" --perf_interval=""1mins"" --qos_correction_interval_min=""0ns"" --quiet=""false"" --recover=""reconnect"" --recovery_timeout=""15mins"" --registration_backoff_factor=""1secs"" --resource_monitoring_interval=""1secs"" --resources=""cpus:2;mem:10240"" --revocable_cpu_low_priority=""true"" --sandbox_directory=""/mnt/mesos/sandbox"" --strict=""true"" --switch_user=""true"" --version=""false"" --work_dir=""/tmp/mesos-II8Gua/0"" I0813 19:55:17.194514 26100 containerizer.cpp:143] Using isolation: posix/cpu,posix/mem,filesystem/posix I0813 19:55:17.194658 26123 slave.cpp:354] Slave resources: cpus(*):2; mem(*):10240; disk(*):3.70122e+06; ports(*):[31000-32000] I0813 19:55:17.194854 26123 slave.cpp:384] Slave hostname: 297daca2d01a I0813 19:55:17.194877 26123 slave.cpp:389] Slave checkpoint: true I0813 19:55:17.196751 26132 master.cpp:1524] The newly elected leader is master@172.17.2.10:60249 with id 20150813-195517-167907756-60249-26100 I0813 19:55:17.196797 26132 master.cpp:1537] Elected as the leading master! I0813 19:55:17.196815 26132 master.cpp:1307] Recovering from registrar I0813 19:55:17.197032 26138 registrar.cpp:311] Recovering registrar I0813 19:55:17.197845 26132 slave.cpp:190] Slave started on 2)@172.17.2.10:60249 I0813 19:55:17.198420 26125 log.cpp:661] Attempting to start the writer I0813 19:55:17.197948 26132 slave.cpp:191] Flags at startup: --authenticatee=""crammd5"" --cgroups_cpu_enable_pids_and_tids_count=""false"" --cgroups_enable_cfs=""false"" --cgroups_hierarchy=""/sys/fs/cgroup"" --cgroups_limit_swap=""false"" --cgroups_root=""mesos"" --container_disk_watch_interval=""15secs"" --containerizers=""mesos"" --default_role=""*"" --disk_watch_interval=""1mins"" --docker=""docker"" --docker_kill_orphans=""true"" --docker_remove_delay=""6hrs"" --docker_socket=""/var/run/docker.sock"" --docker_stop_timeout=""0ns"" --enforce_container_disk_quota=""false"" --executor_registration_timeout=""1mins"" --executor_shutdown_grace_period=""5secs"" --fetcher_cache_dir=""/tmp/mesos/fetch"" --fetcher_cache_size=""2GB"" --frameworks_home="""" --gc_delay=""1weeks"" --gc_disk_headroom=""0.1"" --hadoop_home="""" --help=""false"" --initialize_driver_logging=""true"" --isolation=""posix/cpu,posix/mem"" --launcher_dir=""/mesos/mesos-0.24.0/_build/src"" --logbufsecs=""0"" --logging_level=""INFO"" --oversubscribed_resources_interval=""15secs"" --perf_duration=""10secs"" --perf_interval=""1mins"" --qos_correction_interval_min=""0ns"" --quiet=""false"" --recover=""reconnect"" --recovery_timeout=""15mins"" --registration_backoff_factor=""1secs"" --resource_monitoring_interval=""1secs"" --resources=""cpus:2;mem:10240"" --revocable_cpu_low_priority=""true"" --sandbox_directory=""/mnt/mesos/sandbox"" --strict=""true"" --switch_user=""true"" --version=""false"" --work_dir=""/tmp/mesos-II8Gua/1"" I0813 19:55:17.199121 26132 slave.cpp:354] Slave resources: cpus(*):2; mem(*):10240; disk(*):3.70122e+06; ports(*):[31000-32000] I0813 19:55:17.199235 26138 state.cpp:54] Recovering state from '/tmp/mesos-II8Gua/0/meta' I0813 19:55:17.199322 26132 slave.cpp:384] Slave hostname: 297daca2d01a I0813 19:55:17.199345 26132 slave.cpp:389] Slave checkpoint: true I0813 19:55:17.199676 26100 containerizer.cpp:143] Using isolation: posix/cpu,posix/mem,filesystem/posix I0813 19:55:17.200085 26135 state.cpp:54] Recovering state from '/tmp/mesos-II8Gua/1/meta' I0813 19:55:17.200317 26132 status_update_manager.cpp:202] Recovering status update manager I0813 19:55:17.200371 26129 status_update_manager.cpp:202] Recovering status update manager I0813 19:55:17.202003 26129 replica.cpp:477] Replica received implicit promise request with proposal 1 I0813 19:55:17.202585 26131 slave.cpp:190] Slave started on 3)@172.17.2.10:60249 I0813 19:55:17.202596 26129 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 523191ns I0813 19:55:17.202756 26129 replica.cpp:345] Persisted promised to 1 I0813 19:55:17.202770 26132 containerizer.cpp:379] Recovering containerizer I0813 19:55:17.203061 26135 containerizer.cpp:379] Recovering containerizer I0813 19:55:17.202663 26131 slave.cpp:191] Flags at startup: --authenticatee=""crammd5"" --cgroups_cpu_enable_pids_and_tids_count=""false"" --cgroups_enable_cfs=""false"" --cgroups_hierarchy=""/sys/fs/cgroup"" --cgroups_limit_swap=""false"" --cgroups_root=""mesos"" --container_disk_watch_interval=""15secs"" --containerizers=""mesos"" --default_role=""*"" --disk_watch_interval=""1mins"" --docker=""docker"" --docker_kill_orphans=""true"" --docker_remove_delay=""6hrs"" --docker_socket=""/var/run/docker.sock"" --docker_stop_timeout=""0ns"" --enforce_container_disk_quota=""false"" --executor_registration_timeout=""1mins"" --executor_shutdown_grace_period=""5secs"" --fetcher_cache_dir=""/tmp/mesos/fetch"" --fetcher_cache_size=""2GB"" --frameworks_home="""" --gc_delay=""1weeks"" --gc_disk_headroom=""0.1"" --hadoop_home="""" --help=""false"" --initialize_driver_logging=""true"" --isolation=""posix/cpu,posix/mem"" --launcher_dir=""/mesos/mesos-0.24.0/_build/src"" --logbufsecs=""0"" --logging_level=""INFO"" --oversubscribed_resources_interval=""15secs"" --perf_duration=""10secs"" --perf_interval=""1mins"" --qos_correction_interval_min=""0ns"" --quiet=""false"" --recover=""reconnect"" --recovery_timeout=""15mins"" --registration_backoff_factor=""1secs"" --resource_monitoring_interval=""1secs"" --resources=""cpus:2;mem:10240"" --revocable_cpu_low_priority=""true"" --sandbox_directory=""/mnt/mesos/sandbox"" --strict=""true"" --switch_user=""true"" --version=""false"" --work_dir=""/tmp/mesos-II8Gua/2"" I0813 19:55:17.203819 26131 slave.cpp:354] Slave resources: cpus(*):2; mem(*):10240; disk(*):3.70122e+06; ports(*):[31000-32000] I0813 19:55:17.203930 26131 slave.cpp:384] Slave hostname: 297daca2d01a I0813 19:55:17.203948 26131 slave.cpp:389] Slave checkpoint: true I0813 19:55:17.204674 26137 state.cpp:54] Recovering state from '/tmp/mesos-II8Gua/2/meta' I0813 19:55:17.205178 26135 status_update_manager.cpp:202] Recovering status update manager I0813 19:55:17.205323 26135 containerizer.cpp:379] Recovering containerizer I0813 19:55:17.205521 26136 slave.cpp:4069] Finished recovery I0813 19:55:17.206074 26136 slave.cpp:4226] Querying resource estimator for oversubscribable resources I0813 19:55:17.206424 26128 slave.cpp:4069] Finished recovery I0813 19:55:17.206722 26137 status_update_manager.cpp:176] Pausing sending status updates I0813 19:55:17.206858 26136 slave.cpp:684] New master detected at master@172.17.2.10:60249 I0813 19:55:17.206902 26138 slave.cpp:4069] Finished recovery I0813 19:55:17.206962 26128 slave.cpp:4226] Querying resource estimator for oversubscribable resources I0813 19:55:17.208312 26134 scheduler.cpp:272] New master detected at master@172.17.2.10:60249 I0813 19:55:17.208364 26136 slave.cpp:709] No credentials provided. Attempting to register without authentication I0813 19:55:17.208608 26136 slave.cpp:720] Detecting new master I0813 19:55:17.208839 26138 slave.cpp:4226] Querying resource estimator for oversubscribable resources I0813 19:55:17.209216 26123 coordinator.cpp:231] Coordinator attemping to fill missing position I0813 19:55:17.209247 26127 status_update_manager.cpp:176] Pausing sending status updates I0813 19:55:17.209259 26128 slave.cpp:684] New master detected at master@172.17.2.10:60249 I0813 19:55:17.209322 26127 status_update_manager.cpp:176] Pausing sending status updates I0813 19:55:17.209364 26128 slave.cpp:709] No credentials provided. Attempting to register without authentication I0813 19:55:17.209344 26138 slave.cpp:684] New master detected at master@172.17.2.10:60249 I0813 19:55:17.209455 26128 slave.cpp:720] Detecting new master I0813 19:55:17.209492 26138 slave.cpp:709] No credentials provided. Attempting to register without authentication I0813 19:55:17.209573 26128 slave.cpp:4240] Received oversubscribable resources from the resource estimator I0813 19:55:17.209601 26138 slave.cpp:720] Detecting new master I0813 19:55:17.209730 26138 slave.cpp:4240] Received oversubscribable resources from the resource estimator I0813 19:55:17.209883 26136 slave.cpp:4240] Received oversubscribable resources from the resource estimator I0813 19:55:17.211266 26136 replica.cpp:378] Replica received explicit promise request for position 0 with proposal 2 I0813 19:55:17.211771 26136 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 462128ns I0813 19:55:17.211797 26136 replica.cpp:679] Persisted action at 0 I0813 19:55:17.212980 26130 replica.cpp:511] Replica received write request for position 0 I0813 19:55:17.213124 26130 leveldb.cpp:438] Reading position from leveldb took 67075ns I0813 19:55:17.213580 26130 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 301649ns I0813 19:55:17.213603 26130 replica.cpp:679] Persisted action at 0 I0813 19:55:17.214284 26123 replica.cpp:658] Replica received learned notice for position 0 I0813 19:55:17.214622 26123 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 284547ns I0813 19:55:17.214648 26123 replica.cpp:679] Persisted action at 0 I0813 19:55:17.214675 26123 replica.cpp:664] Replica learned NOP action at position 0 I0813 19:55:17.215420 26136 log.cpp:677] Writer started with ending position 0 I0813 19:55:17.217463 26133 leveldb.cpp:438] Reading position from leveldb took 47943ns I0813 19:55:17.220762 26125 registrar.cpp:344] Successfully fetched the registry (0B) in 23.649024ms I0813 19:55:17.221081 26125 registrar.cpp:443] Applied 1 operations in 136902ns; attempting to update the 'registry' I0813 19:55:17.223667 26133 log.cpp:685] Attempting to append 174 bytes to the log I0813 19:55:17.223778 26125 coordinator.cpp:341] Coordinator attempting to write APPEND action at position 1 I0813 19:55:17.224516 26127 replica.cpp:511] Replica received write request for position 1 I0813 19:55:17.225009 26127 leveldb.cpp:343] Persisting action (193 bytes) to leveldb took 466230ns I0813 19:55:17.225042 26127 replica.cpp:679] Persisted action at 1 I0813 19:55:17.225653 26126 replica.cpp:658] Replica received learned notice for position 1 I0813 19:55:17.225953 26126 leveldb.cpp:343] Persisting action (195 bytes) to leveldb took 286966ns I0813 19:55:17.225975 26126 replica.cpp:679] Persisted action at 1 I0813 19:55:17.226013 26126 replica.cpp:664] Replica learned APPEND action at position 1 I0813 19:55:17.227545 26137 registrar.cpp:488] Successfully updated the 'registry' in 6.328064ms I0813 19:55:17.227722 26137 registrar.cpp:374] Successfully recovered registrar I0813 19:55:17.227918 26124 log.cpp:704] Attempting to truncate the log to 1 I0813 19:55:17.228024 26133 coordinator.cpp:341] Coordinator attempting to write TRUNCATE action at position 2 I0813 19:55:17.228193 26131 master.cpp:1334] Recovered 0 slaves from the Registry (135B) ; allowing 10mins for slaves to re-register I0813 19:55:17.228659 26127 replica.cpp:511] Replica received write request for position 2 I0813 19:55:17.228972 26127 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 297903ns I0813 19:55:17.229004 26127 replica.cpp:679] Persisted action at 2 I0813 19:55:17.229565 26127 replica.cpp:658] Replica received learned notice for position 2 I0813 19:55:17.229837 26127 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 260326ns I0813 19:55:17.229899 26127 leveldb.cpp:401] Deleting ~1 keys from leveldb took 48697ns I0813 19:55:17.229923 26127 replica.cpp:679] Persisted action at 2 I0813 19:55:17.229956 26127 replica.cpp:664] Replica learned TRUNCATE action at position 2 I0813 19:55:17.325634 26138 slave.cpp:1209] Will retry registration in 445.955946ms if necessary I0813 19:55:17.326088 26124 master.cpp:3635] Registering slave at slave(2)@172.17.2.10:60249 (297daca2d01a) with id 20150813-195517-167907756-60249-26100-S0 I0813 19:55:17.327446 26124 registrar.cpp:443] Applied 1 operations in 231072ns; attempting to update the 'registry' I0813 19:55:17.330252 26136 log.cpp:685] Attempting to append 344 bytes to the log I0813 19:55:17.330407 26132 coordinator.cpp:341] Coordinator attempting to write APPEND action at position 3 I0813 19:55:17.331418 26128 replica.cpp:511] Replica received write request for position 3 I0813 19:55:17.331753 26128 leveldb.cpp:343] Persisting action (363 bytes) to leveldb took 264140ns I0813 19:55:17.331778 26128 replica.cpp:679] Persisted action at 3 I0813 19:55:17.332324 26133 replica.cpp:658] Replica received learned notice for position 3 I0813 19:55:17.332809 26133 leveldb.cpp:343] Persisting action (365 bytes) to leveldb took 313064ns I0813 19:55:17.332834 26133 replica.cpp:679] Persisted action at 3 I0813 19:55:17.332865 26133 replica.cpp:664] Replica learned APPEND action at position 3 I0813 19:55:17.334211 26132 registrar.cpp:488] Successfully updated the 'registry' in 6.668032ms I0813 19:55:17.334430 26127 log.cpp:704] Attempting to truncate the log to 3 I0813 19:55:17.334566 26132 coordinator.cpp:341] Coordinator attempting to write TRUNCATE action at position 4 I0813 19:55:17.335283 26129 replica.cpp:511] Replica received write request for position 4 I0813 19:55:17.335615 26127 slave.cpp:3058] Received ping from slave-observer(1)@172.17.2.10:60249 I0813 19:55:17.335816 26129 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 458268ns I0813 19:55:17.335908 26137 master.cpp:3698] Registered slave 20150813-195517-167907756-60249-26100-S0 at slave(2)@172.17.2.10:60249 (297daca2d01a) with cpus(*):2; mem(*):10240; disk(*):3.70122e+0...",5 MESOS-3280,"Master fails to access replicated log after network partition","In a 5 node cluster with 3 masters and 2 slaves, and ZK on each node, when a network partition is forced, all the masters apparently lose access to their replicated log. The leading master halts. Unknown reasons, but presumably related to replicated log access. The others fail to recover from the replicated log. Unknown reasons. This could have to do with ZK setup, but it might also be a Mesos bug. This was observed in a Chronos test drive scenario described in detail here: https://github.com/mesos/chronos/issues/511 With setup instructions here: https://github.com/mesos/chronos/issues/508 ",8 MESOS-3281,"Create a user doc for Scheduler HTTP API","We need to convert the design doc into user doc that we can add to our docs folder.",3 MESOS-3284,"JSON representation of Protobuf should use base64 encoding for 'bytes' fields.","Currently we encode 'bytes' fields as UTF-8 strings, which is lossy for binary data due to invalid byte sequences! In order to encode binary data in a lossless fashion, we can encode 'bytes' fields in base64. Note that this is also how proto3 does its encoding (see [here|https://developers.google.com/protocol-buffers/docs/proto3?hl=en#json]), so this would make migration easier as well.",3 MESOS-3287,"downloadWithHadoop tries to access Error() for a valid Try","This was reported while trying to install Hadoop / Mesos integration: {noformat} I0818 05:36:35.058688 24428 fetcher.cpp:409] Fetcher Info: {""cache_directory"":""\/tmp\/mesos\/fetch\/slaves\/20150706-075218-1611773194-5050-28439-S473\/hadoop"",""items"":[{""action"":""BYPASS_CACHE"",""uri"":{""extract"":true,""value"":""hdfs:\/\/hdfs.prod:54310\/user\/ashwanth\/hadoop-with-mesos-2.6.0-cdh5.4.4.tar.gz""}}],""sandbox_directory"":""\/var\/lib\/mesos\/slaves\/20150706-075218-1611773194-5050-28439-S473\/frameworks\/20150706-075218-1611773194-5050-28439-4532\/executors\/executor_Task_Tracker_4129\/runs\/c26f52d4-4055-46fa-b999-11d73f2096dd"",""user"":""hadoop""} I0818 05:36:35.059806 24428 fetcher.cpp:364] Fetching URI 'hdfs://hdfs.prod:54310/user/ashwanth/hadoop-with-mesos-2.6.0-cdh5.4.4.tar.gz' I0818 05:36:35.059821 24428 fetcher.cpp:238] Fetching directly into the sandbox directory I0818 05:36:35.059835 24428 fetcher.cpp:176] Fetching URI 'hdfs://hdfs.prod:54310/user/ashwanth/hadoop-with-mesos-2.6.0-cdh5.4.4.tar.gz' mesos-fetcher: /tmp/mesos-build/mesos-repo/3rdparty/libprocess/3rdparty/stout/include/stout/try.hpp:90: const string& Try::error() const [with T = bool; std::string = std::basic_string]: Assertion `data.isNone()' failed. {noformat} This is, however, a genuine bug in {{src/launcher/fetcher.cpp#L99}}: {code}   Try available = hdfs.available();   if (available.isError() || !available.get()) {     return Error(""Skipping fetch with Hadoop Client as""                  "" Hadoop Client not available: "" + available.error());   } {code} The root cause is that (probably) the HDFS client is not available on the slave; however, we do not {{error()}} but rather return a {{false}} result. The bug is exposed in the {{return}} line, where we try to retrieve {{available.error()}} (which is not there - it's just `false`). This was a 'latent' bug that has been exposed by (my) recent refactoring of {{os::shell}} which is used by {{hdfs.available()}} under the covers.",1 MESOS-3288,"Implement docker registry client","Implement the docker registry client as per design document: https://docs.google.com/document/d/1kE-HXPQl4lQgamPIiaD4Ytdr-N4HeQc4fnE93WHR4X4/edit",5 MESOS-3289,"Add DockerRegistry unit tests","Add unit tests suite for docker registry implementation. This could include: - Creating mock docker registry server - Using openssl library for digest functions.",5 MESOS-3290,"Master should drop HTTP calls when it's recovering","Much like what we do with PID based frameworks, master should drop HTTP calls if it's not the leader and/or still recovering.",3 MESOS-3293,"Failing ROOT_ tests on CentOS 7.1 - LimitedCpuIsolatorTest","h2. LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids This is one of several ROOT failing tests: we want to track them *individually* and for each of them decide whether to: * fix; * remove; OR * redesign. (full verbose logs attached) h2. Steps to Reproduce Completely cleaned the build, removed directory, clean pull from {{master}} (SHA: {{fb93d93}}) - same results, 9 failed tests: {noformat} [==========] 751 tests from 114 test cases ran. (231218 ms total) [ PASSED ] 742 tests. [ FAILED ] 9 tests, listed below: [ FAILED ] LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids [ FAILED ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup, where TypeParam = mesos::internal::slave::CgroupsCpushareIsolatorProcess [ FAILED ] ContainerizerTest.ROOT_CGROUPS_BalloonFramework [ FAILED ] LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystem [ FAILED ] LinuxFilesystemIsolatorTest.ROOT_VolumeFromSandbox [ FAILED ] LinuxFilesystemIsolatorTest.ROOT_VolumeFromHost [ FAILED ] LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint [ FAILED ] LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithRootFilesystem [ FAILED ] MesosContainerizerLaunchTest.ROOT_ChangeRootfs 9 FAILED TESTS YOU HAVE 10 DISABLED TESTS {noformat}",5 MESOS-3294,"Failing ROOT_ tests on CentOS 7.1 - UserCgroupIsolatorTest","h2. UserCgroupIsolatorTest This is one of several ROOT failing tests: we want to track them *individually* and for each of them decide whether to: * fix; * remove; OR * redesign. (full verbose logs attached) h2. Steps to Reproduce Completely cleaned the build, removed directory, clean pull from {{master}} (SHA: {{fb93d93}}) - same results, 9 failed tests: {noformat} [==========] 751 tests from 114 test cases ran. (231218 ms total) [ PASSED ] 742 tests. [ FAILED ] 9 tests, listed below: [ FAILED ] LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids [ FAILED ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup, where TypeParam = mesos::internal::slave::CgroupsCpushareIsolatorProcess [ FAILED ] ContainerizerTest.ROOT_CGROUPS_BalloonFramework [ FAILED ] LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystem [ FAILED ] LinuxFilesystemIsolatorTest.ROOT_VolumeFromSandbox [ FAILED ] LinuxFilesystemIsolatorTest.ROOT_VolumeFromHost [ FAILED ] LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint [ FAILED ] LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithRootFilesystem [ FAILED ] MesosContainerizerLaunchTest.ROOT_ChangeRootfs 9 FAILED TESTS YOU HAVE 10 DISABLED TESTS {noformat}",5 MESOS-3295,"Failing ROOT_ tests on CentOS 7.1 - ContainerizerTest","h2. ContainerizerTest.ROOT_CGROUPS_BalloonFramework This is one of several ROOT failing tests: we want to track them *individually* and for each of them decide whether to: * fix; * remove; OR * redesign. (full verbose logs attached) h2. Steps to Reproduce Completely cleaned the build, removed directory, clean pull from {{master}} (SHA: {{fb93d93}}) - same results, 9 failed tests: {noformat} [==========] 751 tests from 114 test cases ran. (231218 ms total) [ PASSED ] 742 tests. [ FAILED ] 9 tests, listed below: [ FAILED ] LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids [ FAILED ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup, where TypeParam = mesos::internal::slave::CgroupsCpushareIsolatorProcess [ FAILED ] ContainerizerTest.ROOT_CGROUPS_BalloonFramework [ FAILED ] LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystem [ FAILED ] LinuxFilesystemIsolatorTest.ROOT_VolumeFromSandbox [ FAILED ] LinuxFilesystemIsolatorTest.ROOT_VolumeFromHost [ FAILED ] LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint [ FAILED ] LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithRootFilesystem [ FAILED ] MesosContainerizerLaunchTest.ROOT_ChangeRootfs 9 FAILED TESTS YOU HAVE 10 DISABLED TESTS {noformat}",5 MESOS-3296,"Failing ROOT_ tests on CentOS 7.1 - LinuxFilesystemIsolatorTest","h2. LinuxFilesystemIsolatorTest This is one of several ROOT failing tests: we want to track them *individually* and for each of them decide whether to: * fix; * remove; OR * redesign. (full verbose logs attached) h2. Steps to Reproduce Completely cleaned the build, removed directory, clean pull from {{master}} (SHA: {{fb93d93}}) - same results, 9 failed tests: {noformat} [==========] 751 tests from 114 test cases ran. (231218 ms total) [ PASSED ] 742 tests. [ FAILED ] 9 tests, listed below: [ FAILED ] LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids [ FAILED ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup, where TypeParam = mesos::internal::slave::CgroupsCpushareIsolatorProcess [ FAILED ] ContainerizerTest.ROOT_CGROUPS_BalloonFramework [ FAILED ] LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystem [ FAILED ] LinuxFilesystemIsolatorTest.ROOT_VolumeFromSandbox [ FAILED ] LinuxFilesystemIsolatorTest.ROOT_VolumeFromHost [ FAILED ] LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint [ FAILED ] LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithRootFilesystem [ FAILED ] MesosContainerizerLaunchTest.ROOT_ChangeRootfs 9 FAILED TESTS YOU HAVE 10 DISABLED TESTS {noformat}",5 MESOS-3297,"Failing ROOT_ tests on CentOS 7.1 - MesosContainerizerLaunchTest","h2. MesosContainerizerLaunchTest This is one of several ROOT failing tests: we want to track them *individually* and for each of them decide whether to: * fix; * remove; OR * redesign. (full verbose logs attached) h2. Steps to Reproduce Completely cleaned the build, removed directory, clean pull from {{master}} (SHA: {{fb93d93}}) - same results, 9 failed tests: {noformat} [==========] 751 tests from 114 test cases ran. (231218 ms total) [ PASSED ] 742 tests. [ FAILED ] 9 tests, listed below: [ FAILED ] LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids [ FAILED ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup, where TypeParam = mesos::internal::slave::CgroupsCpushareIsolatorProcess [ FAILED ] ContainerizerTest.ROOT_CGROUPS_BalloonFramework [ FAILED ] LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystem [ FAILED ] LinuxFilesystemIsolatorTest.ROOT_VolumeFromSandbox [ FAILED ] LinuxFilesystemIsolatorTest.ROOT_VolumeFromHost [ FAILED ] LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint [ FAILED ] LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithRootFilesystem [ FAILED ] MesosContainerizerLaunchTest.ROOT_ChangeRootfs 9 FAILED TESTS YOU HAVE 10 DISABLED TESTS {noformat}",5 MESOS-3299,"Add a protobuf to represent time with integer precision.","Existing timestamps in the protobufs use {{double}} to encode time. Generally, the field represents seconds (with the decimal component to represent smaller denominations of time). This is less than ideal. Instead, we should use integers, so as to not lose data (and to be able to compare value reliably). Something like: {code} message Time { int64 seconds; int32 nanoseconds; } {code}",1 MESOS-3304,"Remove remnants of LIBPROCESS_STATISTICS_WINDOW","As seen in MESOS-1283, LIBPROCESS_STATISTICS_WINDOW is no longer needed since metrics now require specification of a window size, and default to no history if not provided. Some commented-out code remnants associated with this environment variable still remain and should be removed.",1 MESOS-3307,"Configurable size of completed task / framework history","We try to make Mesos work with multiple frameworks and mesos-dns at the same time. The goal is to have set of frameworks per team / project on a single Mesos cluster. At this point our mesos state.json is at 4mb and it takes a while to assembly. 5 mesos-dns instances hit state.json every 5 seconds, effectively pushing mesos-master CPU usage through the roof. It's at 100%+ all the time. Here's the problem: {noformat} mesos λ curl -s http://mesos-master:5050/master/state.json | jq .frameworks[].completed_tasks[].framework_id | sort | uniq -c | sort -n 1 ""20150606-001827-252388362-5050-5982-0003"" 16 ""20150606-001827-252388362-5050-5982-0005"" 18 ""20150606-001827-252388362-5050-5982-0029"" 73 ""20150606-001827-252388362-5050-5982-0007"" 141 ""20150606-001827-252388362-5050-5982-0009"" 154 ""20150820-154817-302720010-5050-15320-0000"" 289 ""20150606-001827-252388362-5050-5982-0004"" 510 ""20150606-001827-252388362-5050-5982-0012"" 666 ""20150606-001827-252388362-5050-5982-0028"" 923 ""20150116-002612-269165578-5050-32204-0003"" 1000 ""20150606-001827-252388362-5050-5982-0001"" 1000 ""20150606-001827-252388362-5050-5982-0006"" 1000 ""20150606-001827-252388362-5050-5982-0010"" 1000 ""20150606-001827-252388362-5050-5982-0011"" 1000 ""20150606-001827-252388362-5050-5982-0027"" mesos λ fgrep 1000 -r src/master src/master/constants.cpp:const size_t MAX_REMOVED_SLAVES = 100000; src/master/constants.cpp:const uint32_t MAX_COMPLETED_TASKS_PER_FRAMEWORK = 1000; {noformat} Active tasks are just 6% of state.json response: {noformat} mesos λ cat ~/temp/mesos-state.json | jq -c . | wc 1 14796 4138942 mesos λ cat ~/temp/mesos-state.json | jq .frameworks[].tasks | jq -c . | wc 16 37 252774 {noformat} I see four options that can improve the situation: 1. Add query string param to exclude completed tasks from state.json and use it in mesos-dns and similar tools. There is no need for mesos-dns to know about completed tasks, it's just extra load on master and mesos-dns. 2. Make history size configurable. 3. Make JSON serialization faster. With 10000s of tasks even without history it would take a lot of time to serialize tasks for mesos-dns. Doing it every 60 seconds instead of every 5 seconds isn't really an option. 4. Create event bus for mesos master. Marathon has it and it'd be nice to have it in Mesos. This way mesos-dns could avoid polling master state and switch to listening for events. All can be done independently. Note to mesosphere folks: please start distributing debug symbols with your distribution. I was asking for it for a while and it is really helpful: https://github.com/mesosphere/marathon/issues/1497#issuecomment-104182501 Perf report for leading master: !http://i.imgur.com/iz7C3o0.png! I'm on 0.23.0.",3 MESOS-3308,"Define the container rootfs directories within the slave work_dir.","A few motivations: 1) Given the design in MESOS-3004 it became apparent that we need to support multiple images in a container and these images can be of different image types. (There are no sufficient reasons or major obstacles that force us not to allow it and it obviously gives the users more flexibility). 2) Also, even though we currently allow only one backend for each provisioner, when we update a running slave there can be multiple backends left in each container that we need to launch tasks with, or at least recover. We should evaluate in the future whether to support multiple backends and choose among them dynamically based on image characteristics. 3) Since the rootfs' lifecycle tie with the running containers and should be cleaned up after containers die, it fits into the pattern of {{word_dir}} and we can manage them inside the work dir without needing to ask the operator to specify more flags. ",2 MESOS-3310,"Support provisioning images specified in volumes.","This is related to MESOS-3095 and MESOS-3227. The idea is that we should allow command executor to run under host filesystem and provision the filesystem for the user. The command line executor will then chroot into user's root filesystem. This solves the issue that the command executor is not launchable in the user specified root filesystem. The design doc is here: https://docs.google.com/document/d/16hyLVRL0nz-KBts1J5stGyxZPniFPbPbs7R-ZRQVCH4/edit?usp=sharing",3 MESOS-3311,SlaveTest.HTTPSchedulerSlaveRestart,"Observed on ASF CI {code} [ RUN ] SlaveTest.HTTPSchedulerSlaveRestart Using temporary directory '/tmp/SlaveTest_HTTPSchedulerSlaveRestart_CXyDrA' I0825 22:07:36.809872 27610 leveldb.cpp:176] Opened db in 3.751801ms I0825 22:07:36.811115 27610 leveldb.cpp:183] Compacted db in 1.2194ms I0825 22:07:36.811175 27610 leveldb.cpp:198] Created db iterator in 30669ns I0825 22:07:36.811197 27610 leveldb.cpp:204] Seeked to beginning of db in 7829ns I0825 22:07:36.811208 27610 leveldb.cpp:273] Iterated through 0 keys in the db in 6017ns I0825 22:07:36.811245 27610 replica.cpp:744] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0825 22:07:36.811722 27638 recover.cpp:449] Starting replica recovery I0825 22:07:36.811980 27638 recover.cpp:475] Replica is in EMPTY status I0825 22:07:36.813033 27641 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request I0825 22:07:36.813355 27635 recover.cpp:195] Received a recover response from a replica in EMPTY status I0825 22:07:36.813756 27628 recover.cpp:566] Updating replica status to STARTING I0825 22:07:36.814434 27636 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 570160ns I0825 22:07:36.814471 27636 replica.cpp:323] Persisted replica status to STARTING I0825 22:07:36.814743 27642 recover.cpp:475] Replica is in STARTING status I0825 22:07:36.814965 27638 master.cpp:378] Master 20150825-220736-234885548-51219-27610 (09c6504e3a31) started on 172.17.0.14:51219 I0825 22:07:36.814999 27638 master.cpp:380] Flags at startup: --acls="""" --allocation_interval=""1secs"" --allocator=""HierarchicalDRF"" --authenticate=""true"" --authenticate_slaves=""true"" --authenticators=""crammd5"" --authorizers=""local"" --credentials=""/tmp/SlaveTest_HTTPSchedulerSlaveRestart_CXyDrA/credentials"" --framework_sorter=""drf"" --help=""false"" --initialize_driver_logging=""true"" --log_auto_initialize=""true"" --logbufsecs=""0"" --logging_level=""INFO"" --max_slave_ping_timeouts=""5"" --quiet=""false"" --recovery_slave_removal_limit=""100%"" --registry=""replicated_log"" --registry_fetch_timeout=""1mins"" --registry_store_timeout=""25secs"" --registry_strict=""true"" --root_submissions=""true"" --slave_ping_timeout=""15secs"" --slave_reregister_timeout=""10mins"" --user_sorter=""drf"" --version=""false"" --webui_dir=""/mesos/mesos-0.25.0/_inst/share/mesos/webui"" --work_dir=""/tmp/SlaveTest_HTTPSchedulerSlaveRestart_CXyDrA/master"" --zk_session_timeout=""10secs"" I0825 22:07:36.815347 27638 master.cpp:425] Master only allowing authenticated frameworks to register I0825 22:07:36.815371 27638 master.cpp:430] Master only allowing authenticated slaves to register I0825 22:07:36.815402 27638 credentials.hpp:37] Loading credentials for authentication from '/tmp/SlaveTest_HTTPSchedulerSlaveRestart_CXyDrA/credentials' I0825 22:07:36.815634 27632 replica.cpp:641] Replica in STARTING status received a broadcasted recover request I0825 22:07:36.815752 27638 master.cpp:469] Using default 'crammd5' authenticator I0825 22:07:36.815904 27638 master.cpp:506] Authorization enabled I0825 22:07:36.815979 27643 recover.cpp:195] Received a recover response from a replica in STARTING status I0825 22:07:36.816185 27637 whitelist_watcher.cpp:79] No whitelist given I0825 22:07:36.816186 27641 hierarchical.hpp:346] Initialized hierarchical allocator process I0825 22:07:36.816519 27630 recover.cpp:566] Updating replica status to VOTING I0825 22:07:36.817258 27639 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 475231ns I0825 22:07:36.817296 27639 replica.cpp:323] Persisted replica status to VOTING I0825 22:07:36.817420 27637 master.cpp:1525] The newly elected leader is master@172.17.0.14:51219 with id 20150825-220736-234885548-51219-27610 I0825 22:07:36.817467 27637 master.cpp:1538] Elected as the leading master! I0825 22:07:36.817483 27637 master.cpp:1308] Recovering from registrar I0825 22:07:36.817509 27635 recover.cpp:580] Successfully joined the Paxos group I0825 22:07:36.817708 27633 registrar.cpp:311] Recovering registrar I0825 22:07:36.817844 27635 recover.cpp:464] Recover process terminated I0825 22:07:36.818439 27631 log.cpp:661] Attempting to start the writer I0825 22:07:36.819694 27636 replica.cpp:477] Replica received implicit promise request with proposal 1 I0825 22:07:36.820133 27636 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 421255ns I0825 22:07:36.820168 27636 replica.cpp:345] Persisted promised to 1 I0825 22:07:36.820804 27630 coordinator.cpp:231] Coordinator attemping to fill missing position I0825 22:07:36.822105 27638 replica.cpp:378] Replica received explicit promise request for position 0 with proposal 2 I0825 22:07:36.822597 27638 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 468065ns I0825 22:07:36.822625 27638 replica.cpp:679] Persisted action at 0 I0825 22:07:36.823737 27637 replica.cpp:511] Replica received write request for position 0 I0825 22:07:36.823796 27637 leveldb.cpp:438] Reading position from leveldb took 39603ns I0825 22:07:36.824267 27637 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 446655ns I0825 22:07:36.824296 27637 replica.cpp:679] Persisted action at 0 I0825 22:07:36.824961 27634 replica.cpp:658] Replica received learned notice for position 0 I0825 22:07:36.825340 27634 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 362236ns I0825 22:07:36.825369 27634 replica.cpp:679] Persisted action at 0 I0825 22:07:36.825388 27634 replica.cpp:664] Replica learned NOP action at position 0 I0825 22:07:36.825975 27642 log.cpp:677] Writer started with ending position 0 I0825 22:07:36.826997 27628 leveldb.cpp:438] Reading position from leveldb took 56us I0825 22:07:36.829946 27639 registrar.cpp:344] Successfully fetched the registry (0B) in 12.187136ms I0825 22:07:36.830077 27639 registrar.cpp:443] Applied 1 operations in 40874ns; attempting to update the 'registry' I0825 22:07:36.832870 27635 log.cpp:685] Attempting to append 174 bytes to the log I0825 22:07:36.833088 27641 coordinator.cpp:341] Coordinator attempting to write APPEND action at position 1 I0825 22:07:36.833845 27636 replica.cpp:511] Replica received write request for position 1 I0825 22:07:36.834293 27636 leveldb.cpp:343] Persisting action (193 bytes) to leveldb took 425175ns I0825 22:07:36.834324 27636 replica.cpp:679] Persisted action at 1 I0825 22:07:36.835077 27643 replica.cpp:658] Replica received learned notice for position 1 I0825 22:07:36.835500 27643 leveldb.cpp:343] Persisting action (195 bytes) to leveldb took 404831ns I0825 22:07:36.835532 27643 replica.cpp:679] Persisted action at 1 I0825 22:07:36.835574 27643 replica.cpp:664] Replica learned APPEND action at position 1 I0825 22:07:36.836545 27643 registrar.cpp:488] Successfully updated the 'registry' in 6.393088ms I0825 22:07:36.836707 27643 registrar.cpp:374] Successfully recovered registrar I0825 22:07:36.836874 27639 log.cpp:704] Attempting to truncate the log to 1 I0825 22:07:36.837174 27632 master.cpp:1335] Recovered 0 slaves from the Registry (135B) ; allowing 10mins for slaves to re-register I0825 22:07:36.837291 27634 coordinator.cpp:341] Coordinator attempting to write TRUNCATE action at position 2 I0825 22:07:36.838249 27639 replica.cpp:511] Replica received write request for position 2 I0825 22:07:36.838685 27639 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 412214ns I0825 22:07:36.838716 27639 replica.cpp:679] Persisted action at 2 I0825 22:07:36.839735 27628 replica.cpp:658] Replica received learned notice for position 2 I0825 22:07:36.840304 27628 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 547841ns I0825 22:07:36.840375 27628 leveldb.cpp:401] Deleting ~1 keys from leveldb took 51256ns I0825 22:07:36.840401 27628 replica.cpp:679] Persisted action at 2 I0825 22:07:36.840428 27628 replica.cpp:664] Replica learned TRUNCATE action at position 2 I0825 22:07:36.849371 27610 containerizer.cpp:143] Using isolation: posix/cpu,posix/mem,filesystem/posix I0825 22:07:36.856500 27633 slave.cpp:190] Slave started on 286)@172.17.0.14:51219 I0825 22:07:36.856541 27633 slave.cpp:191] Flags at startup: --appc_store_dir=""/tmp/mesos/store/appc"" --authenticatee=""crammd5"" --cgroups_cpu_enable_pids_and_tids_count=""false"" --cgroups_enable_cfs=""false"" --cgroups_hierarchy=""/sys/fs/cgroup"" --cgroups_limit_swap=""false"" --cgroups_root=""mesos"" --container_disk_watch_interval=""15secs"" --containerizers=""mesos"" --credential=""/tmp/SlaveTest_HTTPSchedulerSlaveRestart_ukkA8L/credential"" --default_role=""*"" --disk_watch_interval=""1mins"" --docker=""docker"" --docker_kill_orphans=""true"" --docker_remove_delay=""6hrs"" --docker_socket=""/var/run/docker.sock"" --docker_stop_timeout=""0ns"" --enforce_container_disk_quota=""false"" --executor_registration_timeout=""1mins"" --executor_shutdown_grace_period=""5secs"" --fetcher_cache_dir=""/tmp/SlaveTest_HTTPSchedulerSlaveRestart_ukkA8L/fetch"" --fetcher_cache_size=""2GB"" --frameworks_home="""" --gc_delay=""1weeks"" --gc_disk_headroom=""0.1"" --hadoop_home="""" --help=""false"" --initialize_driver_logging=""true"" --isolation=""posix/cpu,posix/mem"" --launcher_dir=""/mesos/mesos-0.25.0/_build/src"" --logbufsecs=""0"" --logging_level=""INFO"" --oversubscribed_resources_interval=""15secs"" --perf_duration=""10secs"" --perf_interval=""1mins"" --qos_correction_interval_min=""0ns"" --quiet=""false"" --recover=""reconnect"" --recovery_timeout=""15mins"" --registration_backoff_factor=""10ms"" --resource_monitoring_interval=""1secs"" --resources=""cpus:2;mem:1024;disk:1024;ports:[31000-32000]"" --revocable_cpu_low_priority=""true"" --sandbox_directory=""/mnt/mesos/sandbox"" --strict=""true"" --switch_user=""true"" --version=""false"" --work_dir=""/tmp/SlaveTest_HTTPSchedulerSlaveRestart_ukkA8L"" I0825 22:07:36.857074 27633 credentials.hpp:85] Loading credential for authentication from '/tmp/SlaveTest_HTTPSchedulerSlaveRestart_ukkA8L/credential' I0825 22:07:36.857275 27633 slave.cpp:321] Slave using credential for: test-principal I0825 22:07:36.857822 27633 slave.cpp:354] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0825 22:07:36.857936 27633 slave.cpp:384] Slave hostname: 09c6504e3a31 I0825 22:07:36.857959 27633 slave.cpp:389] Slave checkpoint: true I0825 22:07:36.858886 27637 state.cpp:54] Recovering state from '/tmp/SlaveTest_HTTPSchedulerSlaveRestart_ukkA8L/meta' I0825 22:07:36.859130 27638 status_update_manager.cpp:202] Recovering status update manager I0825 22:07:36.859465 27636 containerizer.cpp:379] Recovering containerizer I0825 22:07:36.860631 27634 slave.cpp:4069] Finished recovery I0825 22:07:36.861034 27634 slave.cpp:4226] Querying resource estimator for oversubscribable resources I0825 22:07:36.861239 27643 status_update_manager.cpp:176] Pausing sending status updates I0825 22:07:36.861240 27634 slave.cpp:684] New master detected at master@172.17.0.14:51219 I0825 22:07:36.861322 27634 slave.cpp:747] Authenticating with master master@172.17.0.14:51219 I0825 22:07:36.861343 27634 slave.cpp:752] Using default CRAM-MD5 authenticatee I0825 22:07:36.861450 27634 slave.cpp:720] Detecting new master I0825 22:07:36.861495 27628 authenticatee.cpp:115] Creating new client SASL connection I0825 22:07:36.861569 27634 slave.cpp:4240] Received oversubscribable resources from the resource estimator I0825 22:07:36.861716 27632 master.cpp:4694] Authenticating slave(286)@172.17.0.14:51219 I0825 22:07:36.861799 27629 authenticator.cpp:407] Starting authentication session for crammd5_authenticatee(665)@172.17.0.14:51219 I0825 22:07:36.862045 27642 authenticator.cpp:92] Creating new server SASL connection I0825 22:07:36.862308 27635 authenticatee.cpp:206] Received SASL authentication mechanisms: CRAM-MD5 I0825 22:07:36.862337 27635 authenticatee.cpp:232] Attempting to authenticate with mechanism 'CRAM-MD5' I0825 22:07:36.862421 27629 authenticator.cpp:197] Received SASL authentication start I0825 22:07:36.862478 27629 authenticator.cpp:319] Authentication requires more steps I0825 22:07:36.862579 27633 authenticatee.cpp:252] Received SASL authentication step I0825 22:07:36.862679 27628 authenticator.cpp:225] Received SASL authentication step I0825 22:07:36.862707 27628 auxprop.cpp:102] Request to lookup properties for user: 'test-principal' realm: '09c6504e3a31' server FQDN: '09c6504e3a31' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0825 22:07:36.862717 27628 auxprop.cpp:174] Looking up auxiliary property '*userPassword' I0825 22:07:36.862754 27628 auxprop.cpp:174] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0825 22:07:36.862785 27628 auxprop.cpp:102] Request to lookup properties for user: 'test-principal' realm: '09c6504e3a31' server FQDN: '09c6504e3a31' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0825 22:07:36.862797 27628 auxprop.cpp:124] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0825 22:07:36.862802 27628 auxprop.cpp:124] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0825 22:07:36.862817 27628 authenticator.cpp:311] Authentication success I0825 22:07:36.862884 27629 authenticatee.cpp:292] Authentication success I0825 22:07:36.862921 27630 master.cpp:4724] Successfully authenticated principal 'test-principal' at slave(286)@172.17.0.14:51219 I0825 22:07:36.862969 27642 authenticator.cpp:425] Authentication session cleanup for crammd5_authenticatee(665)@172.17.0.14:51219 I0825 22:07:36.863139 27639 slave.cpp:815] Successfully authenticated with master master@172.17.0.14:51219 I0825 22:07:36.863256 27639 slave.cpp:1209] Will retry registration in 15.028678ms if necessary I0825 22:07:36.863382 27643 master.cpp:3636] Registering slave at slave(286)@172.17.0.14:51219 (09c6504e3a31) with id 20150825-220736-234885548-51219-27610-S0 I0825 22:07:36.863899 27610 sched.cpp:164] Version: 0.25.0 I0825 22:07:36.863940 27636 registrar.cpp:443] Applied 1 operations in 94492ns; attempting to update the 'registry' I0825 22:07:36.864670 27632 sched.cpp:262] New master detected at master@172.17.0.14:51219 I0825 22:07:36.864790 27632 sched.cpp:318] Authenticating with master master@172.17.0.14:51219 I0825 22:07:36.864821 27632 sched.cpp:325] Using default CRAM-MD5 authenticatee I0825 22:07:36.865095 27637 authenticatee.cpp:115] Creating new client SASL connection I0825 22:07:36.865453 27643 master.cpp:4694] Authenticating scheduler-6c5ddcdb-9dd1-4b38-b051-5f714d3c1c55@172.17.0.14:51219 I0825 22:07:36.865603 27629 authenticator.cpp:407] Starting authentication session for crammd5_authenticatee(666)@172.17.0.14:51219 I0825 22:07:36.865840 27638 authenticator.cpp:92] Creating new server SASL connection I0825 22:07:36.866217 27630 authenticatee.cpp:206] Received SASL authentication mechanisms: CRAM-MD5 I0825 22:07:36.866260 27630 authenticatee.cpp:232] Attempting to authenticate with mechanism 'CRAM-MD5' I0825 22:07:36.866433 27639 authenticator.cpp:197] Received SASL authentication start I0825 22:07:36.866513 27639 authenticator.cpp:319] Authentication requires more steps I0825 22:07:36.866710 27630 authenticatee.cpp:252] Received SASL authentication step I0825 22:07:36.866999 27638 authenticator.cpp:225] Received SASL authentication step I0825 22:07:36.867051 27638 auxprop.cpp:102] Request to lookup properties for user: 'test-principal' realm: '09c6504e3a31' server FQDN: '09c6504e3a31' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0825 22:07:36.867077 27638 auxprop.cpp:174] Looking up auxiliary property '*userPassword' I0825 22:07:36.867130 27638 auxprop.cpp:174] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0825 22:07:36.867162 27638 auxprop.cpp:102] Request to lookup properties for user: 'test-principal' realm: '09c6504e3a31' server FQDN: '09c6504e3a31' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0825 22:07:36.867175 27638 auxprop.cpp:124] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0825 22:07:36.867183 27638 auxprop.cpp:124] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0825 22:07:36.867202 27638 authenticator.cpp:311] Authentication success I0825 22:07:36.867426 27636 authenticatee.cpp:292] Authentication success I0825 22:07:36.867434 27633 authenticator.cpp:425] Authentication session cleanup for crammd5_authenticatee(666)@172.17.0.14:51219 I0825 22:07:36.867627 27630 master.cpp:4724] Successfully authenticated principal 'test-principal' at scheduler-6c5ddcdb-9dd1-4b38-b051-5f714d3c1c55@172.17.0.14:51219 I0825 22:07:36.867951 27641 sched.cpp:407] Successfully authenticated with master master@172.17.0.14:51219 I0825 22:07:36.867986 27641 sched.cpp:713] Sending SUBSCRIBE call to master@172.17.0.14:51219 I0825 22:07:36.868114 27641 sched.cpp:746] Will retry registration in 1.352726078secs if necessary I0825 22:07:36.868233 27634 log.cpp:685] Attempting to append 344 bytes to the log I0825 22:07:36.868268 27638 master.cpp:2094] Received SUBSCRIBE call for framework 'default' at scheduler-6c5ddcdb-9dd1-4b38-b051-5f714d3c1c55@172.17.0.14:51219 I0825 22:07:36.868305 27638 master.cpp:1564] Authorizing framework principal 'test-principal' to receive offers for role '*' I0825 22:07:36.868373 27631 coordinator.cpp:341] Coordinator attempting to write APPEND action at position 3 I0825 22:07:36.868614 27642 master.cpp:2164] Subscribing framework default with checkpointing enabled and capabilities [ ] I0825 22:07:36.868999 27643 hierarchical.hpp:391] Added framework 20150825-220736-234885548-51219-27610-0000 I0825 22:07:36.869030 27643 hierarchical.hpp:1010] No resources available to allocate! I0825 22:07:36.869046 27643 hierarchical.hpp:910] Performed allocation for 0 slaves in 34654ns I0825 22:07:36.869215 27631 sched.cpp:640] Framework registered with 20150825-220736-234885548-51219-27610-0000 I0825 22:07:36.869215 27643 replica.cpp:511] Replica received write request for position 3 I0825 22:07:36.869268 27631 sched.cpp:654] Scheduler::registered took 29976ns I0825 22:07:36.869453 27643 leveldb.cpp:343] Persisting action (363 bytes) to leveldb took 181689ns I0825 22:07:36.869477 27643 replica.cpp:679] Persisted action at 3 I0825 22:07:36.870075 27629 replica.cpp:658] Replica received learned notice for position 3 I0825 22:07:36.870542 27629 leveldb.cpp:343] Persisting action (365 bytes) to leveldb took 469081ns I0825 22:07:36.870589 27629 replica.cpp:679] Persisted action at 3 I0825 22:07:36.870622 27629 replica.cpp:664] Replica learned APPEND action at position 3 I0825 22:07:36.872133 27632 registrar.cpp:488] Successfully updated the 'registry' in 8.113152ms I0825 22:07:36.872354 27639 log.cpp:704] Attempting to truncate the log to 3 I0825 22:07:36.872470 27632 coordinator.cpp:341] Coordinator attempting to write TRUNCATE action at position 4 I0825 22:07:36.872879 27637 slave.cpp:3058] Received ping from slave-observer(274)@172.17.0.14:51219 I0825 22:07:36.873015 27636 master.cpp:3699] Registered slave 20150825-220736-234885548-51219-27610-S0 at slave(286)@172.17.0.14:51219 (09c6504e3a31) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0825 22:07:36.873180 27637 slave.cpp:859] Registered with master master@172.17.0.14:51219; given slave ID 20150825-220736-234885548-51219-27610-S0 I0825 22:07:36.873219 27637 fetcher.cpp:77] Clearing fetcher cache I0825 22:07:36.873410 27634 status_update_manager.cpp:183] Resuming sending status updates I0825 22:07:36.873379 27628 hierarchical.hpp:542] Added slave 20150825-220736-234885548-51219-27610-S0 (09c6504e3a31) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (allocated: ) I0825 22:07:36.873482 27642 replica.cpp:511] Replica received write request for position 4 I0825 22:0...",2 MESOS-3312,"Factor out JSON to repeated protobuf conversion","In general, we have the collection of protobuf messages as another protobuf message, which makes JSON -> protobuf conversion straightforward. This is not always the case, for example, {{Resources}} class is not a protobuf, though protobuf-convertible. To facilitate conversions like JSON -> {{Resources}} and avoid writing code for each particular case, we propose to introduce {{JSON::Array}} -> {{repeated protobuf}} conversion. With this in place, {{JSON::Array}} -> {{Resources}} boils down to {{JSON::Array}} -> {{repeated Resource}} -> (extra c-tor call) -> {{Resources}}.",2 MESOS-3313,"Rework Jenkins build script","Mesos Jenkins build script needs to be reworked to support the following: - Wider test coverage (libevent, libssl, root tests, Docker tests). - More OS/compiler Docker images for testing Mesos. - Excluding tests on per-image basis. - Reproducing the test image locally. ",3 MESOS-3319,"Mesos will not build when configured with gperftools enabled","Mesos configured with {{--enable-perftools}} currently will not build on OSX 10.10.4 or Ubuntu 14.04, possibly because the bundled gperftools-2.0 is not current. The stable release is now 2.4, which builds successfully on both of these platforms. This issue is resolved when Mesos will build successfully out of the box with gperftools enabled. After this ticket is resolved, the libprocess profiler should be tested to confirm that it still works and if not, it should be fixed.",2 MESOS-3321,"Spurious fetcher message about extracting an archive","The fetcher emits a spurious log message about not extracting an archive with "".tgz"" extension, even though the tarball is extracted correctly. {code} I0826 19:02:08.304914 2109 logging.cpp:172] INFO level logging started! I0826 19:02:08.305253 2109 fetcher.cpp:413] Fetcher Info: {""cache_directory"":""\/tmp\/mesos\/fetch\/slaves\/20150826-185716-251662764-5050-1-S0\/root"",""items"":[{""action"":""BYPASS_CACHE"",""uri"":{""extract"":true,""value"":""file:\/\/\/mesos\/sampleflaskapp.tgz""}}],""sandbox_directory"":""\/tmp\/mesos\/slaves\/20150826-185716-251662764-5050-1-S0\/frameworks\/20150826-185716-251662764-5050-1-0000\/executors\/sample-flask-app.f222d202-4c24-11e5-a628-0242ac110011\/runs\/e71f50b8-816d-46d5-bcc6-f9850a0402ed"",""user"":""root""} I0826 19:02:08.306834 2109 fetcher.cpp:368] Fetching URI 'file:///mesos/sampleflaskapp.tgz' I0826 19:02:08.306864 2109 fetcher.cpp:242] Fetching directly into the sandbox directory I0826 19:02:08.306884 2109 fetcher.cpp:179] Fetching URI 'file:///mesos/sampleflaskapp.tgz' I0826 19:02:08.306900 2109 fetcher.cpp:159] Copying resource with command:cp '/mesos/sampleflaskapp.tgz' '/tmp/mesos/slaves/20150826-185716-251662764-5050-1-S0/frameworks/20150826-185716-251662764-5050-1-0000/executors/sample-flask-app.f222d202-4c24-11e5-a628-0242ac110011/runs/e71f50b8-816d-46d5-bcc6-f9850a0402ed/sampleflaskapp.tgz' I0826 19:02:08.309063 2109 fetcher.cpp:76] Extracting with command: tar -C '/tmp/mesos/slaves/20150826-185716-251662764-5050-1-S0/frameworks/20150826-185716-251662764-5050-1-0000/executors/sample-flask-app.f222d202-4c24-11e5-a628-0242ac110011/runs/e71f50b8-816d-46d5-bcc6-f9850a0402ed' -xf '/tmp/mesos/slaves/20150826-185716-251662764-5050-1-S0/frameworks/20150826-185716-251662764-5050-1-0000/executors/sample-flask-app.f222d202-4c24-11e5-a628-0242ac110011/runs/e71f50b8-816d-46d5-bcc6-f9850a0402ed/sampleflaskapp.tgz' I0826 19:02:08.315313 2109 fetcher.cpp:84] Extracted '/tmp/mesos/slaves/20150826-185716-251662764-5050-1-S0/frameworks/20150826-185716-251662764-5050-1-0000/executors/sample-flask-app.f222d202-4c24-11e5-a628-0242ac110011/runs/e71f50b8-816d-46d5-bcc6-f9850a0402ed/sampleflaskapp.tgz' into '/tmp/mesos/slaves/20150826-185716-251662764-5050-1-S0/frameworks/20150826-185716-251662764-5050-1-0000/executors/sample-flask-app.f222d202-4c24-11e5-a628-0242ac110011/runs/e71f50b8-816d-46d5-bcc6-f9850a0402ed' W0826 19:02:08.315381 2109 fetcher.cpp:264] Copying instead of extracting resource from URI with 'extract' flag, because it does not seem to be an archive: file:///mesos/sampleflaskapp.tgz I0826 19:02:08.315604 2109 fetcher.cpp:445] Fetched 'file:///mesos/sampleflaskapp.tgz' to '/tmp/mesos/slaves/20150826-185716-251662764-5050-1-S0/frameworks/20150826-185716-251662764-5050-1-0000/executors/sample-flask-app.f222d202-4c24-11e5-a628-0242ac110011/runs/e71f50b8-816d-46d5-bcc6-f9850a0402ed/sampleflaskapp.tgz' {code}",1 MESOS-3323,"Auto-generate protos for stout tests","Stout protobufs (AFAIK right now it's just a single file {{protobuf_tests.proto}}) are not generated automatically. Including proto generation step would be cleaner and more convenient.",2 MESOS-3326,"Make use of C++11 atomics","Now that we require C++11, we can make use of std::atomic. For example: * libprocess/process.cpp uses a bare int + __sync_synchronize() for ""running"" * __sync_synchronize() is used in logging.hpp in libprocess and fork.hpp in stout * sched/sched.cpp uses a volatile int for ""running"" -- this is wrong, ""volatile"" is not sufficient to ensure safe concurrent access * ""volatile"" is used in a few other places -- most are probably dubious but I haven't looked closely",2 MESOS-3332,"Support HTTP Pipelining in libprocess (http::post)","Currently , {{http::post}} in libprocess, does not support HTTP pipelining. Each call as of know sends in the {{Connection: close}} header, thereby, signaling to the server to close the TCP socket after the response. We either need to create a new interface for supporting HTTP pipelining , or modify the existing {{http::post}} to do so. This is needed for the Scheduler/Executor library implementations to make sure ""Calls"" are sent in order to the master. Currently, in order to do so, we send in the next request only after we have received a response for an earlier call that results in degraded performance. ",8 MESOS-3337,"Refactored libprocess SSL tests."," Refactor SSL test fixture to be available for reuse by other projects. Currently the fixture class and its the symbols it depends on are not present in libproces's include files.",3 MESOS-3338,"Dynamic reservations are not counted as used resources in the master","Dynamically reserved resources should be considered used or allocated and hence reflected in Mesos bookkeeping structures and {{state.json}}. I expanded the {{ReservationTest.ReserveThenUnreserve}} test with the following section: {code} // Check that the Master counts the reservation as a used resource. { Future response = process::http::get(master.get(), ""state.json""); AWAIT_READY(response); Try parse = JSON::parse(response.get().body); ASSERT_SOME(parse); Result cpus = parse.get().find(""slaves[0].used_resources.cpus""); ASSERT_SOME_EQ(JSON::Number(1), cpus); } {code} and got {noformat} ../../../src/tests/reservation_tests.cpp:168: Failure Value of: (cpus).get() Actual: 0 Expected: JSON::Number(1) Which is: 1 {noformat} Idea for new resources states: https://docs.google.com/drawings/d/1aquVIqPY8D_MR-cQjZu-wz5nNn3cYP3jXqegUHl-Kzc/edit",3 MESOS-3339,"Implement filtering mechanism for (Scheduler API Events) Testing","Currently, our testing infrastructure does not have a mechanism of filtering/dropping HTTP events of a particular type from the Scheduler API response stream. We need a {{DROP_HTTP_CALLS}} abstraction that can help us to filter a particular event type. {code} // Enqueues all received events into a libprocess queue. ACTION_P(Enqueue, queue) { std::queue events = arg0; while (!events.empty()) { // Note that we currently drop HEARTBEATs because most of these tests // are not designed to deal with heartbeats. // TODO(vinod): Implement DROP_HTTP_CALLS that can filter heartbeats. if (events.front().type() == Event::HEARTBEAT) { VLOG(1) << ""Ignoring HEARTBEAT event""; } else { queue->put(events.front()); } events.pop(); } } {code} This helper code is duplicated in at least two places currently, Scheduler Library/Maintenance Primitives tests. - The solution can be as trivial as moving this helper function to a common test-header. - Implement a {{DROP_HTTP_CALLS}} similar to what we do for other protobufs via {{DROP_CALLS}}.",3 MESOS-3340,"Command-line flags should take precedence over OS Env variables","Currently, it appears that re-defining a flag on the command-line that was already defined via a OS Env var ({{MESOS_*}}) causes the Master to fail with a not very helpful message. For example, if one has {{MESOS_QUORUM}} defined, this happens: {noformat} $ ./mesos-master --zk=zk://192.168.1.4/mesos --quorum=1 --hostname=192.168.1.4 --ip=192.168.1.4 Duplicate flag 'quorum' on command line {noformat} which is not very helpful. Ideally, we would parse the flags with a ""well-known"" priority (command-line first, environment last) - but at the very least, the error message should be more helpful in explaining what the issue is.",2 MESOS-3343,"Rate Limiting functionality for HTTP Frameworks","We need to build rate limiting functionality for frameworks connecting via the Scheduler HTTP API similar to the PID based frameworks. Link to the rate-limiting section from design doc: https://docs.google.com/document/d/1pnIY_HckimKNvpqhKRhbc9eSItWNFT-priXh_urR-T0/edit#heading=h.kzgdk4d5fmba - This ticket deals with refactoring the existing PID based framework functionality and extend it for HTTP frameworks. - The second part of notifying the framework when rate-limiting is active i.e. returning a status of 429 can be undertook as part of MESOS-1664",5 MESOS-3345,"Expand the range of integer precision when converting into/out of json.","For [MESOS-3299], we added some protobufs to represent time with integer precision. However, this precision is not maintained through protobuf <-> JSON conversion, because of how our JSON encoders/decoders convert numbers to floating point. To maintain precision, we can try one of the following: * Try using a {{long double}} to represent a number. * Add logic to stringify/parse numbers without loss when possible. * Try representing {{int64_t}} as a string and parse it as such? * Update PicoJson and add a compiler flag, i.e. {{-DPICOJSON_USE_INT64}} In all cases, we'll need to make sure that: * Integers are properly stringified without loss. * The JSON decoder parses the integer without loss. * We have some unit tests for big (close to {{INT32_MAX}}/{{INT64_MAX}}) and small integers.",5 MESOS-3346,"Add filter support for inverse offers","A filter attached to the inverse offer can be used by the framework to control when it wants to be contacted again with the inverse offer, since future circumstances may change the viability of the maintenance schedule. The “filter�? for InverseOffers is identical to the existing mechanism for re-offering Offers to frameworks.",5 MESOS-3348,"Add either log rotation or capped-size logging (for tasks)","Tasks currently log their output (i.e. stdout/stderr) to files (the ""sandbox"") on an agent's disk. In some cases, the accumulation of these logs can completely fill up the agent's disk and thereby kill the task or machine. To prevent this, we should either implement a log rotation mechanism or capped-size logging. This would be used by executors to control the amount of logs they keep. Master/agent logs will not be affected. We will first scope out several possible approaches for log rotation/capping in a design document (see [MESOS-3356]). Once an approach is chosen, this story will be broken down into some corresponding issues.",13 MESOS-3349,"Removing mount point fails with EBUSY in LinuxFilesystemIsolator.","When running the tests as root, we found PersistentVolumeTest.AccessPersistentVolume fails consistently on some platforms. {noformat} [ RUN ] PersistentVolumeTest.AccessPersistentVolume I0901 02:17:26.435140 39432 exec.cpp:133] Version: 0.25.0 I0901 02:17:26.442129 39461 exec.cpp:207] Executor registered on slave 20150901-021726-1828659978-52102-32604-S0 Registered executor on hostname Starting task d8ff1f00-e720-4a61-b440-e111009dfdc3 sh -c 'echo abc > path1/file' Forked command at 39484 Command exited with status 0 (pid: 39484) ../../src/tests/persistent_volume_tests.cpp:579: Failure Value of: os::exists(path::join(directory, ""path1"")) Actual: true Expected: false [ FAILED ] PersistentVolumeTest.AccessPersistentVolume (777 ms) {noformat} Turns out that the 'rmdir' after the 'umount' fails with EBUSY because there's still some references to the mount. FYI [~jieyu] [~mcypark]",5 MESOS-3352,"Problem Statement Summary for Systemd Cgroup Launcher","There have been many reports of cgroups related issues when running Mesos on Systemd. Many of these issues are rooted in the manual manipulation of the cgroups filesystem by Mesos. This task is to describe the problem in a 1-page summary, and elaborate on the suggested 2 part solution: 1. Using the {{delegate=true}} flag for the slave 2. Implementing a Systemd launcher to run executors with tighter Systemd integration.",5 MESOS-3356,"Scope out approaches to deal with logging to finite disks (i.e. log rotation|capped-size logging).","For the background, see the parent story [MESOS-3348]. For the work/design/discussion, see the linked design document (below). ",5 MESOS-3357,"Update quota design doc based on user comments and offline syncs","We got plenty of feedback from different parties, which we would like to persist in the design doc for posterity.",3 MESOS-3365,"Export per container SNMP statistics","We need to export the per container SNMP statistics too, from its /proc/net/snmp.",5 MESOS-3366,"Allow resources/attributes discovery","In heterogeneous clusters, tasks sometimes have strong constraints on the type of hardware they need to execute on. The current solution is to use custom resources and attributes on the agents. Detecting non-standard resources/attributes requires wrapping the ""mesos-slave"" binary behind a script and use custom code to probe the agent. Unfortunately, this approach doesn't allow composition. The solution would be to provide a hook/module mechanism to allow users to use custom code performing resources/attributes discovery. Please review the detailed document below: https://docs.google.com/document/d/15OkebDezFxzeyLsyQoU0upB0eoVECAlzEkeg0HQAX9w Feel free to express comments/concerns by annotating the document or by replying to this issue. ",3 MESOS-3368,"Add device support in cgroups abstraction","Add support for [device cgroups|https://www.kernel.org/doc/Documentation/cgroup-v1/devices.txt] to aid isolators controlling access to devices. In the future, we could think about how to numerate and control access to devices as resource or task/container policy",3 MESOS-3375,"Add executor protobuf to v1","A new protobuf for Executor was introduced in Mesos for the HTTP API, it needs to be added to /v1 so it reflects changes made on v1/mesos.proto. This protobuf is ought to be changed as the executor HTTP API design evolves.",1 MESOS-3378,"Document a test pattern for expediting event firing","We use {{Clock::advance()}} extensively in tests to expedite event firing and minimize overall {{make check}} time. Document this pattern for posterity.",3 MESOS-3393,"Remove unused executor protobuf","The executor protobuf definition living outside the v1/ directory is unused, it should be removed to avoid confusion.",1 MESOS-3399,"Rewrite perf events code","Our current code base invokes and parses `perf stat`, which sucks, because cmdline output is not a stable ABI at all, it can break our code at any time, for example MESOS-2834. We should use the stable API perf_event_open(2). With this patch https://reviews.apache.org/r/37540/, we already have the infrastructure for the implementation, so it should not be hard to rewrite all the perf events code.",5 MESOS-3402,"mesos-execute does not support credentials","mesos-execute does not appear to support passing credentials. This makes it impossible to use on a cluster where framework authentication is required.",2 MESOS-3409,"Refactor the plain JSON parsing in the docker containerizer","Two functions in the Docker-related code take a string and parse it to JSON: * {{Docker::Container::create}} in {{src/docker/docker.cpp}} * {{Token::create}} in {{src/slave/containerizer/provisioners/docker/token_manager.cpp}} This JSON is then validated (lots of if-elses) and used via the {{JSON::Value}} accessors. We could instead use a protobuf and the related Stout JSON->Protobuf conversion function.",3 MESOS-3413,"Docker containerizer does not symlink persistent volumes into sandbox","For the ArangoDB framework I am trying to use the persistent primitives. nearly all is working, but I am missing a crucial piece at the end: I have successfully created a persistent disk resource and have set the persistence and volume information in the DiskInfo message. However, I do not see any way to find out what directory on the host the mesos slave has reserved for us. I know it is ${MESOS_SLAVE_WORKDIR}/volumes/roles//_ but we have no way to query this information anywhere. The docker containerizer does not automatically mount this directory into our docker container, or symlinks it into our sandbox. Therefore, I have essentially no access to it. Note that the mesos containerizer (which I cannot use for other reasons) seems to create a symlink in the sandbox to the actual path for the persistent volume. With that, I could mount the volume into our docker container and all would be well.",5 MESOS-3416,"Publish egg for 0.24.0 to PyPI","0.24.0 was released, but the python egg has not been published.",1 MESOS-3417,"Log source address replicated log recieved broadcasts","Currently Mesos doesn't log what machine a replicated log status broadcast was recieved from: {code} Sep 11 21:41:14 master-01 mesos-master[15625]: I0911 21:41:14.320164 15637 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request Sep 11 21:41:14 master-01 mesos-dns[15583]: I0911 21:41:14.321097 15583 detect.go:118] ignoring children-changed event, leader has not changed: /mesos Sep 11 21:41:14 master-01 mesos-master[15625]: I0911 21:41:14.353914 15639 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request Sep 11 21:41:14 master-01 mesos-master[15625]: I0911 21:41:14.479132 15639 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request {code} It would be really useful for debugging replicated log startup issues to have info about where the message came from (libprocess address, ip, or hostname) the message came from",2 MESOS-3418,"Factor out V1 API test helper functions","We currently have some helper functionality for V1 API tests. This is copied in a few test files. Factor this out into a common place once the API is stabilized. {code} // Helper class for using EXPECT_CALL since the Mesos scheduler API // is callback based. class Callbacks { public: MOCK_METHOD0(connected, void(void)); MOCK_METHOD0(disconnected, void(void)); MOCK_METHOD1(received, void(const std::queue&)); }; {code} {code} // Enqueues all received events into a libprocess queue. // TODO(jmlvanre): Factor this common code out of tests into V1 // helper. ACTION_P(Enqueue, queue) { std::queue events = arg0; while (!events.empty()) { // Note that we currently drop HEARTBEATs because most of these tests // are not designed to deal with heartbeats. // TODO(vinod): Implement DROP_HTTP_CALLS that can filter heartbeats. if (events.front().type() == Event::HEARTBEAT) { VLOG(1) << ""Ignoring HEARTBEAT event""; } else { queue->put(events.front()); } events.pop(); } } {code} We can also update the helpers in {{/tests/mesos.hpp}} to support the V1 API. This would let us get ride of lines like: {code} v1::TaskInfo taskInfo = evolve(createTask(devolve(offer), """", DEFAULT_EXECUTOR_ID)); {code} In favor of: {code} v1::TaskInfo taskInfo = createTask(offer, """", DEFAULT_EXECUTOR_ID); {code}",2 MESOS-3423,"Perf event isolator stops performing sampling if a single timeout occurs.","Currently the perf event isolator times out a sample after a fixed extra time of 2 seconds on top of the sample time elapses: {code} Duration timeout = flags.perf_duration + Seconds(2); {code} This should be based on the reap interval maximum. Also, the code stops sampling altogether when a single timeout occurs. We've observed time outs during normal operation, so it would be better for the isolator to continue performing perf sampling in the case of timeouts. It may also make sense to continue sampling in the case of errors, since these may be transient.",3 MESOS-3424,"Support fetching AppC images into the store","So far AppC store is read only and depends on out of band mechanisms to get the images. We need to design a way to support fetching in a native way. As commented on MESOS-2824: It's unacceptable to have either have: * the slave to be blocked for extended period of time (minutes) which delays the communication between the executor and scheduler, or * the first task that uses this image to be blocked for a long time to wait for the container image to be ready. The solution needs to enable the operator to prefetch a list of ""preferred images"" without introducing the above problems.",5 MESOS-3425,"Modify LinuxLauncher to support Systemd","Implement the solution described in MESOS-3352 in the LinuxLauncher In order to avoid the migration of cgroup pids by Systemd we can use the {{delegate=true}} flag. This guards Systemd from migrating the pids that are descendants of the process launched by a Systemd unit. In order for this strategy to work, the {{delegate}} flag must be supported by the Systemd version. Support for this was introduced in Systemd v218; however, it has also been backported to v208 for RHEL7 and CentOS7 [here|http://centoserrata.nagater.net/item/CEBA-2015-0037-CentOS-7.i386.x86_64.html] with the package [systemd-208-20|https://rhn.redhat.com/errata/RHBA-2015-1155.html]. It is highly recommended to upgrade to this package if running those operating systems. Once the {{delegate=true}} flag has been set, the cgroups that are manually manipulated by the agent will no longer be migrated *during the lifetime of the agent*. This still leaves the problem of tasks being migrated _after the agent has stopped running_ (voluntarily or not). In order to deal with the problem we propose the following solution: If an agent is running on a Systemd initialized machine, then the agent will create a Systemd slice with a life-time that is independent of the agent and {{delegate=true}}. The linux launcher (used when cgroups isolators are enabled) will then assign the cgroup name for any executor that is launched to this separate slice. The consequence of this is that when the agent unit is terminated, the separate slice will continue to delegate the cgroups preventing Systemd from migrating the pids. A side benefit of this is that we can maintain the {{KillMode=control-group}} flag on the agent and terminate all agent specific services such as the {{fetcher}} without terminating the tasks. This provides for a nice clean-up. This solution will still require that the agent unit be launched with the {{delegate=true}} flag such that there is no race during the transition of the pids from the agent to the separate slice. The agent will be responsible for verifying the slice is still available upon recovery, and warning the operator if it notices that the tasks it is recovering are no longer associated with this separate slice, as this can cause *silent* loss of isolation of existing tasks.",8 MESOS-3426,"process::collect and process::await do not perform discard propagation.","When aggregating futures with collect, one may discard the outer future: {code} Promise p1; Promise p2; Future collect = process::collect(p1.future(), p2.future()); collect.discard(); // collect will transition to DISCARDED // However, p{1,2}.future().hasDiscard() remains false // as there is no discard propagation! {code} Discard requests should propagate down into the inner futures being collected.",3 MESOS-3428,"Support running filesystem isolation with Command Executor in MesosContainerizer",NULL,4 MESOS-3430,"LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem fails on CentOS 7.1","Just ran ROOT tests on CentOS 7.1 and had the following failure (clean build, just pulled from {{master}}): {noformat} [ RUN ] LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem ../../src/tests/containerizer/filesystem_isolator_tests.cpp:498: Failure (wait).failure(): Failed to clean up an isolator when destroying container '366b6d37-b326-4ed1-8a5f-43d483dbbace' :Failed to unmount volume '/tmp/LinuxFilesystemIsolatorTest_ROOT_PersistentVolumeWithoutRootFilesystem_KXgvoH/sandbox/volume': Failed to unmount '/tmp/LinuxFilesystemIsolatorTest_ROOT_PersistentVolumeWithoutRootFilesystem_KXgvoH/sandbox/volume': Invalid argument ../../src/tests/utils.cpp:75: Failure os::rmdir(sandbox.get()): Device or resource busy [ FAILED ] LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem (1943 ms) [----------] 1 test from LinuxFilesystemIsolatorTest (1943 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test case ran. (1951 ms total) [ PASSED ] 0 tests. [ FAILED ] 1 test, listed below: [ FAILED ] LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem {noformat}",2 MESOS-3432,"Unify the implementations of the image provisioners.","The current design uses separate provisioner implementation for each type of image (e.g., APPC, DOCKER). This creates a lot of code duplications. Since we already have a unified provisioner backend (e.g., copy, bind, overlayfs), we should be able to unify the implementations of image provisioners and hide the image specific logics in the corresponding 'Store' implementation.",5 MESOS-3433,"Unmount irrelevant host mounts in the new container's mount namespace.","As described in this [TODO|https://github.com/apache/mesos/blob/e601e469c64594dd8339352af405cbf26a574ea8/src/slave/containerizer/isolators/filesystem/linux.cpp#L418]: {noformat:title=} // TODO(jieyu): Try to unmount work directory mounts and persistent // volume mounts for other containers to release the extra // references to those mounts. {noformat} This will a best effort attempt to alleviate the race condition between provisioner's container cleanup and new containers copying host mount table.",3 MESOS-3443,"Windows: Port protobuf_tests.hpp","We have ported `stout/protobuf.hpp`, but to make the `protobuf_tests.cpp` file to work, we need to port `stout/uuid.hpp`.",2 MESOS-3449,"Expand the range of integer precision in json <-> protobuf conversions to include unsigned integers","The previous changes (MESOS-3345) to support integer precision when converting JSON <-> Protobuf did not support precision for unsigned integers between {{INT64_MAX}} and {{UINT64_MAX}}. (There's some loss, but the conversion is still as good/bad as it was with doubles.) This problem is due to a limitation in the JSON parsing library we use (PicoJSON), which parses integers as {{int64_t}}. Some possible solutions or things to investigate: * We can patch PicoJSON to parse some large values as {{uint64_t}}. * We can investigate using another parsing library. * If we want extra precision beyond 64 or 80 bits per double, one possibility is the [GMP library|https://gmplib.org/]. We'd still need to change the parsing library though.",5 MESOS-3455,"Higher level construct for expressing process dispatch","Since mesos code is based on the actor model and dispatching an interface asynchronously is a large part of the code base, generalizing the concept of asynchronously dispatching an interface would eliminate the need to manual programming of the dispatch boilerplate. An example usage: For a simple interface like: {code} class Interface { virtual Future writeToFile(const char* data) = 0; virtual ~Interface(); }; {code} Today the developer has to do the following: a. Write a wrapper class that implements the same interface to add the dispatching boilerplate. b. Spend precious time in reviews. c. Risk introducing bugs. None of the above steps add any value to the executable binary. The wrapper class would look like: {code} // -- hpp file class InterfaceProcess; class InterfaceImpl : public Interface { public: Try> create(const Flags& flags); virtual Future writeToFile(const char* data); ~InterfaceImpl(); private: Owned process; }; // -- cpp file Try> create(const Flags& flags) { // Code to create the InterfaceProcess class. } Future Future InterfaceImpl::writeToFile(const char* data) { process->dispatch( &InterfaceProcess::writeToFile, data); } InterfaceImpl::InterfaceImpl() { // Code to spawn the process } InterfaceImpl::~InterfaceImpl() { // Code to stop the process. } {code} At the caller/client site, the code would look like: {code} Try> in = InterfaceImpl::create(flags); Future result = in->writeToFile(data); {code} Proposal We should use C++'s rich language semnatics to express the intent and avoid the boilerplate we write manually. The basic intent of the code that leads to all the boilerplate above is: a. An interface that provides a set of functionality. b. An implementation of the interface. c. Ability to dispatch that interface asynchronously using actor. C++ has a rich set of generics that can be used to express above. Components ProcessDispatcher This component will ""dispatch"" an interface implementation asychronously using the process framework. This component can be expressed as: {code} ProcessDispatcher {code} DispatchInterface Any interface that provides an implementation that can be ""dispatched"" can be expressed using this component. This component can be expressed as: {code} Dispatchable {code} Usage: Simple usage {code} Try>> dispatcher = ProcessDispatcher::create(flags); Future result = dispatcher->dispatch( Interface::writeToFile, data); {code} Collecting the interface in a container {code} vector>> dispatchCollection; Try>> dispatcher1 = ProcessDispatcher::create(flags); Try>> dispatcher2 = ProcessDispatcher::create(""test""); dispatchCollection.push_back(dispatcher1); dispatchCollection.push_back(dispatcher2); {code} The advantages of using the generic dispatcher: Saves time by avoiding to write all the boilerplate and going through review cycles. Less bugs. Focus on real problem and not boilerplate. ",6 MESOS-3457,"Add flag to disable hostname lookup","In testing / buildinging DCOS we've found that we need to set --hostname explicitly on the masters. For our uses IP and `hostname` must always be the same thing. More in general, under certain circumstances, dynamic lookup of {{hostname}}, while successful, provides undesirable results; we would also like, in those circumstances, be able to just set the hostname to the chosen IP address (possibly set via the {{\-\- ip_discovery_command}} method). We suggest adding a {{\-\-no-hostname-lookup}}. Note that we can introduce this flag as {{--hostname-lookup}} with a default to 'true' (which is the current semantics) and that way someone can do {{\-\-no-hostname-lookup}} or {{\-\-hostname-lookup=false}}. ",3 MESOS-3458,"Segfault when accepting or declining inverse offers","Discovered while writing a test for filters (in regards to inverse offers). Fix here: https://reviews.apache.org/r/38470/",1 MESOS-3459,"Change /machine/up and /machine/down endpoints to take an array","With [MESOS-3312] committed, the {{/machine/up}} and {{/machine/down}} endpoints should also take an input as an array. It is important to change this before maintenance primitives are released: https://reviews.apache.org/r/38011/ Also, a minor change to the error message from these endpoints: https://reviews.apache.org/r/37969/",1 MESOS-3466,"Add metrics for filesystem isolation and image provisioning.","We need to know about: 1) Errors encountered while provisioning root filesystems 2) Errors encountered while cleaning up root filesystems 3) Number of containers changing root filesystem ...",2 MESOS-3467,"Provide the users with a fully writable filesystem","In the first phase of filesystem provisioning and isolation we are disallowing (or at least should, especially in the case of CopyBackend) users to write outside the sandbox without explicitly mounting specific volumes into the container. We do this even when OverlayBackend can potentially support a empty writable top layer. However in the real world use of containers (and for people coming from the VM world), users and applications often are used to being able to write to the full filesystem (restricted by plain file system permissions) with reasons ranging from applications being non-portable (filesystem-wise) to the need to do custom installs at run time to system directories (inside its container). In general, it's a good practice to restrict the application to write to confined locations and software dependencies can be managed through pre-packaged layers but these often introduce a high entry barrier for users. We should discuss a solution that gives the users the option to write to a full filesystem with a filesystem layer on top of provisioned images and optionally enable persistence of that layer through persistent volumes. This has implication in the management of user namespaces and resource reservations and requires a thorough design.",13 MESOS-3468,"Improve apply_reviews.sh script to apply chain of reviews","Currently the support/apply-review.sh script allows an user (typically committer) to apply a single review on top the HEAD. Since Mesos contributors typically submit a chain of reviews for a given issue it makes sense for the script to apply the whole chain recursively.",8 MESOS-3470,"UserCgroupIsolatorTest failed on CentOS 6.6","UserCgroupIsolatorTest use /sys/fs/cgroup as cgroups_hierarchy. But CentOS 6.6 cgroups_hierarchy is /cgroup. Need change to follow the way in ContainerizerTest.",1 MESOS-3472,"RegistryTokenTest.ExpiredToken test is flaky","RegistryTokenTest.ExpiredToken test is flaky. Here is the error I got on OSX after running it for several times: {noformat} [ RUN ] RegistryTokenTest.ExpiredToken ../../src/tests/containerizer/provisioner_docker_tests.cpp:167: Failure Value of: token.isError() Actual: false Expected: true libc++abi.dylib: terminating with uncaught exception of type testing::internal::GoogleTestFailureException: ../../src/tests/containerizer/provisioner_docker_tests.cpp:167: Failure Value of: token.isError() Actual: false Expected: true *** Aborted at 1442708631 (unix time) try ""date -d @1442708631"" if you are using GNU date *** PC: @ 0x7fff925fd286 __pthread_kill *** SIGABRT (@0x7fff925fd286) received by PID 7082 (TID 0x7fff7d7ad300) stack trace: *** @ 0x7fff9041af1a _sigtramp @ 0x7fff59759968 (unknown) @ 0x7fff9bb429b3 abort @ 0x7fff90ce1a21 abort_message @ 0x7fff90d099b9 default_terminate_handler() @ 0x7fff994767eb _objc_terminate() @ 0x7fff90d070a1 std::__terminate() @ 0x7fff90d06d48 __cxa_rethrow @ 0x10781bb16 testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x1077e9d30 testing::UnitTest::Run() @ 0x106d59a91 RUN_ALL_TESTS() @ 0x106d55d47 main @ 0x7fff8fc395c9 start @ 0x3 (unknown) Abort trap: 6 ~/src/mesos/build ((3ee82e3...)) $ {noformat}",3 MESOS-3476,"Refactor Status Update method on Agent to handle HTTP based Executors","Currently, receiving a status update sent from slave to itself , {{runTask}} , {{killTask}} and status updates from executors are handled by the {{Slave::statusUpdate}} method on Slave. The signature of the method is {{void Slave::statusUpdate(StatusUpdate update, const UPID& pid)}}. We need to create another overload of it that can also handle HTTP based executors which the previous PID based function can also call into. The signature of the new function could be: {{void Slave::statusUpdate(StatusUpdate update, Executor* executor)}} The HTTP Executor would also call into this new function via {{src/slave/http.cpp}}",8 MESOS-3480,"Refactor Executor struct in Slave to handle HTTP based executors","Currently, the {{struct Executor}} in slave only supports executors connected via message passing (driver). We should refactor it to add support for HTTP based Executors similar to what was done for the Scheduler API {{struct Framework}} in {{src/master/master.hpp}}",3 MESOS-3481,"Add const accessor to Master flags","It would make sense to have an accessor to the master's flags, especially for tests. For example, see [this test|https://github.com/apache/mesos/blob/2876b8c918814347dd56f6f87d461e414a90650a/src/tests/master_maintenance_tests.cpp#L1231-L1235].",2 MESOS-3483,"LinuxFilesystemIsolator should make the slave's work_dir a shared mount.","So that when a user task is forked, it does not hold extra references to the sandbox mount and provisioner bind backend mounts. If we don't do that, we could get the following error message when cleaning up bind backend mount points and sandbox mount points. {noformat} E0921 17:35:57.268159 47010 bind.cpp:182] Failed to remove rootfs mount point '/var/lib/mesos/provisioner/containers/07eb6660-25ff-4e83-8b2f-06955567e04a/backends/bind/rootfses/30f7e5e2-55d0-4d4d-a662-f8aad0d56b33': Device or resource busy E0921 17:35:57.268349 47010 provisioner.cpp:403] Failed to remove the provisioned container directory at '/var/lib/mesos/provisioner/containers/07eb6660-25ff-4e83-8b2f-06955567e04a': Device or resource busy {noformat}",3 MESOS-3485,"Make hook execution order deterministic","Currently, when using multiple hooks of the same type, the execution order is implementation-defined. This is because in src/hook/manager.cpp, the list of available hooks is stored in a {{hashmap}}. A hashmap is probably unnecessary for this task since the number of hooks should remain reasonable. A data structure preserving ordering should be used instead to allow the user to predict the execution order of the hooks. I suggest that the execution order should be the order in which hooks are specified with {{--hooks}} when starting an agent/master. This will be useful when combining multiple hooks after MESOS-3366 is done.",3 MESOS-3489,"Add support for exposing Accept/Decline responses for inverse offers","Current implementation of maintenance primitives does not support exposing Accept/Decline responses of frameworks to the cluster operators. This functionality is necessary to provide visibility to operators into whether a given framework is ready to comply with the posted maintenance schedule.",2 MESOS-3490,"Mesos UI fails to represent JSON entities","The Mesos UI is broken, it seems to fail to represent JSON from /state. This may have been introduced with https://reviews.apache.org/r/38028 ",1 MESOS-3491,"Enable ubuntu builds in ASF CI","I've disabled ubuntu:14.04 builds on ASF CI because the job randomly fails on fetching packages. {code} Get:406 http://archive.ubuntu.com/ubuntu/ trusty-updates/main gdisk amd64 0.8.8-1ubuntu0.1 [185 kB] Err http://archive.ubuntu.com/ubuntu/ trusty-security/main libldap-2.4-2 amd64 2.4.31-1+nmu2ubuntu8.1 404 Not Found [IP: 91.189.91.15 80] Err http://archive.ubuntu.com/ubuntu/ trusty-security/main libfreetype6 amd64 2.5.2-1ubuntu2.4 404 Not Found [IP: 91.189.91.15 80] Err http://archive.ubuntu.com/ubuntu/ trusty-security/main libicu52 amd64 52.1-3ubuntu0.3 404 Not Found [IP: 91.189.91.15 80] Fetched 213 MB in 1min 57s (1812 kB/s) [91mE [0m [91m: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/o/openldap/libldap-2.4-2_2.4.31-1+nmu2ubuntu8.1_amd64.deb 404 Not Found [IP: 91.189.91.15 80] E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/f/freetype/libfreetype6_2.5.2-1ubuntu2.4_amd64.deb 404 Not Found [IP: 91.189.91.15 80] E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/i/icu/libicu52_52.1-3ubuntu0.3_amd64.deb 404 Not Found [IP: 91.189.91.15 80] E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/g/gvfs/gvfs-common_1.20.3-0ubuntu1.1_all.deb 404 Not Found [IP: 91.189.91.15 80] E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/g/gvfs/gvfs-libs_1.20.3-0ubuntu1.1_amd64.deb 404 Not Found [IP: 91.189.91.15 80] E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/g/gvfs/gvfs-daemons_1.20.3-0ubuntu1.1_amd64.deb 404 Not Found [IP: 91.189.91.15 80] E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/g/gvfs/gvfs_1.20.3-0ubuntu1.1_amd64.deb 404 Not Found [IP: 91.189.91.15 80] E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing? [0mThe command '/bin/sh -c apt-get -y install build-essential clang git maven autoconf libtool' returned a non-zero code: 100 {code} We need to figure out what the problem is and fix it before enabling testing on ubuntu.",1 MESOS-3492,"Expose maintenance user doc via the documentation home page","The committed docs can be found here: http://mesos.apache.org/documentation/latest/maintenance/ We need to add a link to {{docs/home.md}} Also, the doc needs some minor formatting tweaks.",1 MESOS-3496,"Create interface for digest verifier","Add interface for digest verifier so that we can add implementations for digest types like sha256, sha512 etc",2 MESOS-3497,"Add implementation for sha256 based file content verification.",https://reviews.apache.org/r/38747/,3 MESOS-3499,"Add a test for os::realpath()",NULL,1 MESOS-3501,"configure cannot find libevent headers in CentOS 6","If libevent is installed via {{sudo yum install libevent-headers}}, running {{../configure --enable-libevent}} will fail to discover the libevent headers: {code} checking event2/event.h usability... no checking event2/event.h presence... no checking for event2/event.h... no configure: error: cannot find libevent headers ------------------------------------------------------------------- libevent is required for libprocess to build. ------------------------------------------------------------------- {code}",2 MESOS-3504,"Introduce MESOS_SANDBOX environment variable in Mesos containerizer.","Similar to Docker containerizer, if a container changes rootfs, we'll have two environment variables: MESOS_DIRECTORY: the path in the host filesystem MESOS_SANDBOX: the path in the container filesystem",3 MESOS-3506,"Build instructions for CentOS 6.6 should include `sudo yum update`","Neglecting to run {{sudo yum update}} on CentOS 6.6 currently causes the build to break when building {{mesos-0.25.0.jar}}. The build instructions for this platform on the Getting Started page should be changed accordingly.",1 MESOS-3510,"Synchronize V1 helper functions with pre-v1",NULL,5 MESOS-3512,"Don't retry close() on EINTR.","On Linux, retrying close on EINTR is dangerous because the fd is already released and we may accidentally close a newly opened fd (from another thread), see: http://ewontfix.com/4/ http://lwn.net/Articles/576478/ http://lwn.net/Articles/576591/ It appears that other OSes, like HPUX, require a retry of close on EINTR. The Austin Group recently proposed changes to POSIX to require that the EINTR case need a retry, but EINPROGRESS be used for when a retry should not occur: http://austingroupbugs.net/view.php?id=529 However, Linux does not follow this and so we need to remove our EINTR retries. Some more links for posterity: https://github.com/wahern/cqueues/issues/56#issuecomment-108656004 https://code.google.com/p/chromium/issues/detail?id=269623 https://codereview.chromium.org/23455051/ ",1 MESOS-3513,"Cgroups Test Filters aborts tests on Centos 6.6 ","Running make check on centos 6.6 causes all tests to abort due to CHECK_SOME test in CgroupsFIlter: {code} Build directory: /home/jenkins/workspace/mesos-config-centos6/build F0923 23:00:49.748896 27362 environment.cpp:132] CHECK_SOME(hierarchies_): Failed to determine canonical path of /sys/fs/cgroup/freezer: No such file or directory *** Check failure stack trace: *** @ 0x7fb786ca0c4d google::LogMessage::Fail() @ 0x7fb786ca298c google::LogMessage::SendToLog() @ 0x7fb786ca083c google::LogMessage::Flush() @ 0x7fb786ca3289 google::LogMessageFatal::~LogMessageFatal() @ 0x58e66c mesos::internal::tests::CgroupsFilter::CgroupsFilter() @ 0x58712f mesos::internal::tests::Environment::Environment() @ 0x4c882f main @ 0x7fb782767d5d __libc_start_main @ 0x4d6331 (unknown) make[3]: *** [check-local] Aborted {code}",1 MESOS-3515,"Support Subscribe Call for HTTP based Executors","We need to add a {{subscribe(...)}} method in {{src/slave/slave.cpp}} to introduce the ability for HTTP based executors to subscribe and then receive events on the persistent HTTP connection. Most of the functionality needed would be similar to {{Master::subscribe}} in {{src/master/master.cpp}}.",5 MESOS-3516,"Add user doc for networking support in Mesos 0.25.0",NULL,2 MESOS-3519,"Fix file descriptor leakage / double close in the code base",NULL,3 MESOS-3520,"Add an abstraction to manage the life cycle of file descriptors.","In order to avoid missing {{close()}} calls on file descriptors, or double-closing file descriptors, it would be nice to add a reference counted {{FileDescriptor}} in a similar way to what we've done for Socket. This will be closed automatically when the last reference goes away, and double closes can be prevented via internal state.",5 MESOS-3525,"Figure out how to enforce 64-bit builds on Windows.","We need to make sure people don't try to compile Mesos on 32-bit architectures. We don't want a Windows repeat of something like this: https://issues.apache.org/jira/browse/MESOS-267",3 MESOS-3539,"Validate that slave's work_dir is a shared mount in its own peer group when LinuxFilesystemIsolator is used.","To address this TODO in the code: {noformat} src/slave/containerizer/isolators/filesystem/linux.cpp +122 // TODO(jieyu): Currently, we don't check if the slave's work_dir // mount is a shared mount or not. We just assume it is. We cannot // simply mark the slave as shared again because that will create a // new peer group for the mounts. This is a temporary workaround for // now while we are thinking about fixes. {noformat}",3 MESOS-3540,"Libevent termination triggers Broken Pipe","When the libevent loop terminates and we unblock the {{SIGPIPE}} signal, the pending {{SIGPIPE}} instantly triggers and causes a broken pipe when the test binary stops running. {code} Program received signal SIGPIPE, Broken pipe. [Switching to Thread 0x7ffff18b4700 (LWP 16270)] pthread_sigmask (how=1, newmask=, oldmask=0x7ffff18b3d80) at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:53 53 ../sysdeps/unix/sysv/linux/pthread_sigmask.c: No such file or directory. (gdb) bt #0 pthread_sigmask (how=1, newmask=, oldmask=0x7ffff18b3d80) at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:53 #1 0x00000000006fd9a4 in unblock () at ../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/posix/signals.hpp:90 #2 0x00000000007d7915 in run () at ../../../3rdparty/libprocess/src/libevent.cpp:125 #3 0x00000000007950cb in _M_invoke<>(void) () at /usr/include/c++/4.9/functional:1700 #4 0x0000000000795000 in operator() () at /usr/include/c++/4.9/functional:1688 #5 0x0000000000794f6e in _M_run () at /usr/include/c++/4.9/thread:115 #6 0x00007ffff668de30 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #7 0x00007ffff79a16aa in start_thread (arg=0x7ffff18b4700) at pthread_create.c:333 #8 0x00007ffff5df1eed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 {code}",2 MESOS-3550,"Create a Executor Library based on the new Executor HTTP API","Similar to the Scheduler Library {{src/scheduler/scheduler.cpp}} , we would need a Executor Library that speaks the new Executor HTTP API. ",5 MESOS-3551,"Replace use of strerror with thread-safe alternatives strerror_r / strerror_l.","{{strerror()}} is not required to be thread safe by POSIX and is listed as unsafe on Linux: http://pubs.opengroup.org/onlinepubs/9699919799/ http://man7.org/linux/man-pages/man3/strerror.3.html I don't believe we've seen any issues reported due to this. We should replace occurrences of strerror accordingly, possibly offering a wrapper in stout to simplify callsites.",3 MESOS-3552,"CHECK failure due to floating point precision on reservation request","result.cpus() == cpus() check is failing due to ( double == double ) comparison problem. Root Cause : Framework requested 0.1 cpu reservation for the first task. So far so good. Next Reserve operation — lead to double operations resulting in following double values : results.cpus() : 23.9999999999999964472863211995 cpus() : 24 And the check ( result.cpus() == cpus() ) failed. The double arithmetic operations caused results.cpus() value to be : 23.9999999999999964472863211995 and hence ( 23.9999999999999964472863211995 == 24 ) failed. ",3 MESOS-3553,"LIBPROCESS_IP not passed when executor's environment is specified","When the executor's environment is specified explicitly via {{\-\-executor_environment_variables}}, {{LIBPROCESS_IP}} will not be passed, leading to errors in some cases - for example, when no DNS is available.",2 MESOS-3554,"Allocator changes trigger large re-compiles","Due to the templatized nature of the allocator, even small changes trigger large recompiles of the code-base. This make iterating on changes expensive for developers.",3 MESOS-3556,"mesos.cli broken in 0.24.x","The issue was initially reported on the mailing list: http://www.mail-archive.com/user@mesos.apache.org/msg04670.html The format of the master data stored in zookeeper has changed but the mesos.cli does not reflect these changes causing tools like {{mesos-tail}} and {{mesos-ps}} to fail. Example error from {{mesos-tail}}: {noformat} mesos-master ~$ mesos tail -f -n 50 service Traceback (most recent call last): File ""/usr/local/bin/mesos-tail"", line 11, in sys.exit(main()) File ""/usr/local/lib/python2.7/dist-packages/mesos/cli/cli.py"", line 61, in wrapper return fn(*args, **kwargs) File ""/usr/local/lib/python2.7/dist-packages/mesos/cli/cmds/tail.py"", line 55, in main args.task, args.file, fail=(not args.follow)): File ""/usr/local/lib/python2.7/dist-packages/mesos/cli/cluster.py"", line 27, in files tlist = MASTER.tasks(fltr) File ""/usr/local/lib/python2.7/dist-packages/mesos/cli/master.py"", line 174, in tasks self._task_list(active_only)))) File ""/usr/local/lib/python2.7/dist-packages/mesos/cli/master.py"", line 153, in _task_list *[util.merge(x, *keys) for x in self.frameworks(active_only)]) File ""/usr/local/lib/python2.7/dist-packages/mesos/cli/master.py"", line 185, in frameworks return util.merge(self.state, *keys) File ""/usr/local/lib/python2.7/dist-packages/mesos/cli/util.py"", line 58, in __get__ value = self.fget(inst) File ""/usr/local/lib/python2.7/dist-packages/mesos/cli/master.py"", line 123, in state return self.fetch(""/master/state.json"").json() File ""/usr/local/lib/python2.7/dist-packages/mesos/cli/master.py"", line 64, in fetch return requests.get(urlparse.urljoin(self.host, url), **kwargs) File ""/usr/local/lib/python2.7/dist-packages/requests/api.py"", line 69, in get return request('get', url, params=params, **kwargs) File ""/usr/local/lib/python2.7/dist-packages/requests/api.py"", line 50, in request response = session.request(method=method, url=url, **kwargs) File ""/usr/local/lib/python2.7/dist-packages/requests/sessions.py"", line 451, in request prep = self.prepare_request(req) File ""/usr/local/lib/python2.7/dist-packages/requests/sessions.py"", line 382, in prepare_request hooks=merge_hooks(request.hooks, self.hooks), File ""/usr/local/lib/python2.7/dist-packages/requests/models.py"", line 304, in prepare self.prepare_url(url, params) File ""/usr/local/lib/python2.7/dist-packages/requests/models.py"", line 357, in prepare_url raise InvalidURL(*e.args) requests.exceptions.InvalidURL: Failed to parse: 10.100.1.100:5050"",""port"":5050,""version"":""0.24.1""} {noformat} The problem exists in https://github.com/mesosphere/mesos-cli/blob/master/mesos/cli/master.py#L107. The code should be along the lines of: {noformat} try: parsed = json.loads(val) return parsed[""address""][""ip""] + "":"" + str(parsed[""address""][""port""]) except Exception: return val.split(""@"")[-1] {noformat} This causes the master address to come back correctly.",1 MESOS-3558,"Implement HTTPCommandExecutor that uses the Executor Library ","Instead of using the {{MesosExecutorDriver}} , we should make the {{CommandExecutor}} in {{src/launcher/executor.cpp}} use the new Executor HTTP Library that we create in {{MESOS-3550}}. This would act as a good validation of the {{HTTP API}} implementation.",13 MESOS-3559,"Make the Command Scheduler use the HTTP Scheduler Library","We should make the Command Scheduler in {{src/cli/executor.cpp}} use the Scheduler Library {{src/scheduler/scheduler.cpp}} instead of the Scheduler Driver.",3 MESOS-3560,"JSON-based credential files do not work correctly","Specifying the following credentials file: {code} { “credentials�?: [ { “principal�?: “user�?, “secret�?: “password�? } ] } {code} Then hitting a master endpoint with: {code} curl -i -u “user:password�? ... {code} Does not work. This is contrary to the text-based credentials file which works: {code} user password {code} Currently, the password in a JSON-based credentials file needs to be base64-encoded in order for it to work: {code} { “credentials�?: [ { “principal�?: “user�?, “secret�?: “cGFzc3dvcmQ=�? } ] } {code}",1 MESOS-3563,"Revocable task CPU shows as zero in /state.json","The slave's state.json reports revocable task resources as zero: {noformat} resources: { cpus: 0, disk: 3071, mem: 1248, ports: ""[31715-31715]"" }, {noformat} Also, there is no indication that a task uses revocable CPU. It would be great to have this type of info.",2 MESOS-3567,"Support TCP checks in Mesos health check program","In Marathon we have the ability to specify Health Checks for: - Command (Mesos supports this) - HTTP (see progress in MESOS-2533) - TCP missing See here for reference: https://mesosphere.github.io/marathon/docs/health-checks.html Since we made good experiences with those 3 options in Marathon, I see a lot of value, if Mesos would also support them. ",8 MESOS-3570,"Make Scheduler Library use HTTP Pipelining Abstraction in Libprocess","Currently, the scheduler library sends calls in order by chaining them and sending them only when it has received a response for the earlier call. This was done because there was no HTTP Pipelining abstraction in Libprocess {{process::post}}. However once {{MESOS-3332}} is resolved, we should be now able to use the new abstraction.",8 MESOS-3571,"Refactor registry_client","Refactor registry client component to: - Make methods shorter for readability - Pull out structs not related to registry client into common namespace.",5 MESOS-3573,"Mesos does not kill orphaned docker containers","After upgrade to 0.24.0 we noticed hanging containers appearing. Looks like there were changes between 0.23.0 and 0.24.0 that broke cleanup. Here's how to trigger this bug: 1. Deploy app in docker container. 2. Kill corresponding mesos-docker-executor process 3. Observe hanging container Here are the logs after kill: {noformat} slave_1 | I1002 12:12:59.362002 7791 docker.cpp:1576] Executor for container 'f083aaa2-d5c3-43c1-b6ba-342de8829fa8' has exited slave_1 | I1002 12:12:59.362284 7791 docker.cpp:1374] Destroying container 'f083aaa2-d5c3-43c1-b6ba-342de8829fa8' slave_1 | I1002 12:12:59.363404 7791 docker.cpp:1478] Running docker stop on container 'f083aaa2-d5c3-43c1-b6ba-342de8829fa8' slave_1 | I1002 12:12:59.363876 7791 slave.cpp:3399] Executor 'sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c' of framework 20150923-122130-2153451692-5050-1-0000 terminated with signal Terminated slave_1 | I1002 12:12:59.367570 7791 slave.cpp:2696] Handling status update TASK_FAILED (UUID: 4a1b2387-a469-4f01-bfcb-0d1cccbde550) for task sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework 20150923-122130-2153451692-5050-1-0000 from @0.0.0.0:0 slave_1 | I1002 12:12:59.367842 7791 slave.cpp:5094] Terminating task sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c slave_1 | W1002 12:12:59.368484 7791 docker.cpp:986] Ignoring updating unknown container: f083aaa2-d5c3-43c1-b6ba-342de8829fa8 slave_1 | I1002 12:12:59.368671 7791 status_update_manager.cpp:322] Received status update TASK_FAILED (UUID: 4a1b2387-a469-4f01-bfcb-0d1cccbde550) for task sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework 20150923-122130-2153451692-5050-1-0000 slave_1 | I1002 12:12:59.368741 7791 status_update_manager.cpp:826] Checkpointing UPDATE for status update TASK_FAILED (UUID: 4a1b2387-a469-4f01-bfcb-0d1cccbde550) for task sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework 20150923-122130-2153451692-5050-1-0000 slave_1 | I1002 12:12:59.370636 7791 status_update_manager.cpp:376] Forwarding update TASK_FAILED (UUID: 4a1b2387-a469-4f01-bfcb-0d1cccbde550) for task sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework 20150923-122130-2153451692-5050-1-0000 to the slave slave_1 | I1002 12:12:59.371335 7791 slave.cpp:2975] Forwarding the update TASK_FAILED (UUID: 4a1b2387-a469-4f01-bfcb-0d1cccbde550) for task sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework 20150923-122130-2153451692-5050-1-0000 to master@172.16.91.128:5050 slave_1 | I1002 12:12:59.371908 7791 slave.cpp:2899] Status update manager successfully handled status update TASK_FAILED (UUID: 4a1b2387-a469-4f01-bfcb-0d1cccbde550) for task sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework 20150923-122130-2153451692-5050-1-0000 master_1 | I1002 12:12:59.372047 11 master.cpp:4069] Status update TASK_FAILED (UUID: 4a1b2387-a469-4f01-bfcb-0d1cccbde550) for task sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework 20150923-122130-2153451692-5050-1-0000 from slave 20151002-120829-2153451692-5050-1-S0 at slave(1)@172.16.91.128:5051 (172.16.91.128) master_1 | I1002 12:12:59.372534 11 master.cpp:4108] Forwarding status update TASK_FAILED (UUID: 4a1b2387-a469-4f01-bfcb-0d1cccbde550) for task sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework 20150923-122130-2153451692-5050-1-0000 master_1 | I1002 12:12:59.373018 11 master.cpp:5576] Updating the latest state of task sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework 20150923-122130-2153451692-5050-1-0000 to TASK_FAILED master_1 | I1002 12:12:59.373447 11 hierarchical.hpp:814] Recovered cpus(*):0.1; mem(*):16; ports(*):[31685-31685] (total: cpus(*):4; mem(*):1001; disk(*):52869; ports(*):[31000-32000], allocated: cpus(*):8.32667e-17) on slave 20151002-120829-2153451692-5050-1-S0 from framework 20150923-122130-2153451692-5050-1-0000 {noformat} Another issue: if you restart mesos-slave on the host with orphaned docker containers, they are not getting killed. This was the case before and I hoped for this trick to kill hanging containers, but it doesn't work now. Marking this as critical because it hoards cluster resources and blocks scheduling.",5 MESOS-3575,"V1 API java/python protos are not generated","The java/python protos for the V1 api should be generated according to the Makefile; however, they do not show up in the generated build directory.",2 MESOS-3579,"FetcherCacheTest.LocalUncachedExtract is flaky","From ASF CI: https://builds.apache.org/job/Mesos/866/COMPILER=clang,CONFIGURATION=--verbose,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/console {code} [ RUN ] FetcherCacheTest.LocalUncachedExtract Using temporary directory '/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA' I0925 19:15:39.541198 27410 leveldb.cpp:176] Opened db in 3.43934ms I0925 19:15:39.542362 27410 leveldb.cpp:183] Compacted db in 1.136184ms I0925 19:15:39.542428 27410 leveldb.cpp:198] Created db iterator in 35866ns I0925 19:15:39.542448 27410 leveldb.cpp:204] Seeked to beginning of db in 8807ns I0925 19:15:39.542459 27410 leveldb.cpp:273] Iterated through 0 keys in the db in 6325ns I0925 19:15:39.542505 27410 replica.cpp:744] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0925 19:15:39.543143 27438 recover.cpp:449] Starting replica recovery I0925 19:15:39.543393 27438 recover.cpp:475] Replica is in EMPTY status I0925 19:15:39.544373 27436 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request I0925 19:15:39.544791 27433 recover.cpp:195] Received a recover response from a replica in EMPTY status I0925 19:15:39.545284 27433 recover.cpp:566] Updating replica status to STARTING I0925 19:15:39.546155 27436 master.cpp:376] Master c8bf1c95-50f4-4832-a570-c560f0b466ae (f57fd4291168) started on 172.17.1.195:41781 I0925 19:15:39.546257 27433 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 747249ns I0925 19:15:39.546288 27433 replica.cpp:323] Persisted replica status to STARTING I0925 19:15:39.546483 27434 recover.cpp:475] Replica is in STARTING status I0925 19:15:39.546187 27436 master.cpp:378] Flags at startup: --acls="""" --allocation_interval=""1secs"" --allocator=""HierarchicalDRF"" --authenticate=""true"" --authenticate_slaves=""true"" --authenticators=""crammd5"" --authorizers=""local"" --credentials=""/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA/credentials"" --framework_sorter=""drf"" --help=""false"" --hostname_lookup=""true"" --initialize_driver_logging=""true"" --log_auto_initialize=""true"" --logbufsecs=""0"" --logging_level=""INFO"" --max_slave_ping_timeouts=""5"" --quiet=""false"" --recovery_slave_removal_limit=""100%"" --registry=""replicated_log"" --registry_fetch_timeout=""1mins"" --registry_store_timeout=""25secs"" --registry_strict=""true"" --root_submissions=""true"" --slave_ping_timeout=""15secs"" --slave_reregister_timeout=""10mins"" --user_sorter=""drf"" --version=""false"" --webui_dir=""/mesos/mesos-0.26.0/_inst/share/mesos/webui"" --work_dir=""/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA/master"" --zk_session_timeout=""10secs"" I0925 19:15:39.546567 27436 master.cpp:423] Master only allowing authenticated frameworks to register I0925 19:15:39.546617 27436 master.cpp:428] Master only allowing authenticated slaves to register I0925 19:15:39.546632 27436 credentials.hpp:37] Loading credentials for authentication from '/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA/credentials' I0925 19:15:39.546931 27436 master.cpp:467] Using default 'crammd5' authenticator I0925 19:15:39.547044 27436 master.cpp:504] Authorization enabled I0925 19:15:39.547276 27441 whitelist_watcher.cpp:79] No whitelist given I0925 19:15:39.547320 27434 hierarchical.hpp:468] Initialized hierarchical allocator process I0925 19:15:39.547471 27438 replica.cpp:641] Replica in STARTING status received a broadcasted recover request I0925 19:15:39.548318 27443 recover.cpp:195] Received a recover response from a replica in STARTING status I0925 19:15:39.549067 27435 recover.cpp:566] Updating replica status to VOTING I0925 19:15:39.549115 27440 master.cpp:1603] The newly elected leader is master@172.17.1.195:41781 with id c8bf1c95-50f4-4832-a570-c560f0b466ae I0925 19:15:39.549162 27440 master.cpp:1616] Elected as the leading master! I0925 19:15:39.549190 27440 master.cpp:1376] Recovering from registrar I0925 19:15:39.549342 27434 registrar.cpp:309] Recovering registrar I0925 19:15:39.549666 27430 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 418187ns I0925 19:15:39.549753 27430 replica.cpp:323] Persisted replica status to VOTING I0925 19:15:39.550089 27442 recover.cpp:580] Successfully joined the Paxos group I0925 19:15:39.550320 27442 recover.cpp:464] Recover process terminated I0925 19:15:39.550904 27430 log.cpp:661] Attempting to start the writer I0925 19:15:39.551955 27434 replica.cpp:477] Replica received implicit promise request with proposal 1 I0925 19:15:39.552351 27434 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 380746ns I0925 19:15:39.552372 27434 replica.cpp:345] Persisted promised to 1 I0925 19:15:39.552896 27436 coordinator.cpp:231] Coordinator attemping to fill missing position I0925 19:15:39.554003 27432 replica.cpp:378] Replica received explicit promise request for position 0 with proposal 2 I0925 19:15:39.554534 27432 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 510572ns I0925 19:15:39.554558 27432 replica.cpp:679] Persisted action at 0 I0925 19:15:39.555516 27443 replica.cpp:511] Replica received write request for position 0 I0925 19:15:39.555595 27443 leveldb.cpp:438] Reading position from leveldb took 65355ns I0925 19:15:39.556177 27443 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 542757ns I0925 19:15:39.556200 27443 replica.cpp:679] Persisted action at 0 I0925 19:15:39.556813 27429 replica.cpp:658] Replica received learned notice for position 0 I0925 19:15:39.557251 27429 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 422272ns I0925 19:15:39.557281 27429 replica.cpp:679] Persisted action at 0 I0925 19:15:39.557315 27429 replica.cpp:664] Replica learned NOP action at position 0 I0925 19:15:39.558061 27442 log.cpp:677] Writer started with ending position 0 I0925 19:15:39.559294 27434 leveldb.cpp:438] Reading position from leveldb took 56536ns I0925 19:15:39.560333 27432 registrar.cpp:342] Successfully fetched the registry (0B) in 10.936064ms I0925 19:15:39.560469 27432 registrar.cpp:441] Applied 1 operations in 41340ns; attempting to update the 'registry' I0925 19:15:39.561244 27441 log.cpp:685] Attempting to append 176 bytes to the log I0925 19:15:39.561378 27436 coordinator.cpp:341] Coordinator attempting to write APPEND action at position 1 I0925 19:15:39.562126 27439 replica.cpp:511] Replica received write request for position 1 I0925 19:15:39.562515 27439 leveldb.cpp:343] Persisting action (195 bytes) to leveldb took 364968ns I0925 19:15:39.562539 27439 replica.cpp:679] Persisted action at 1 I0925 19:15:39.563160 27438 replica.cpp:658] Replica received learned notice for position 1 I0925 19:15:39.563699 27438 leveldb.cpp:343] Persisting action (197 bytes) to leveldb took 455933ns I0925 19:15:39.563730 27438 replica.cpp:679] Persisted action at 1 I0925 19:15:39.563753 27438 replica.cpp:664] Replica learned APPEND action at position 1 I0925 19:15:39.564749 27434 registrar.cpp:486] Successfully updated the 'registry' in 4.214016ms I0925 19:15:39.564893 27434 registrar.cpp:372] Successfully recovered registrar I0925 19:15:39.564950 27442 log.cpp:704] Attempting to truncate the log to 1 I0925 19:15:39.565039 27429 coordinator.cpp:341] Coordinator attempting to write TRUNCATE action at position 2 I0925 19:15:39.565172 27430 master.cpp:1413] Recovered 0 slaves from the Registry (137B) ; allowing 10mins for slaves to re-register I0925 19:15:39.565946 27429 replica.cpp:511] Replica received write request for position 2 I0925 19:15:39.566349 27429 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 375473ns I0925 19:15:39.566371 27429 replica.cpp:679] Persisted action at 2 I0925 19:15:39.566994 27431 replica.cpp:658] Replica received learned notice for position 2 I0925 19:15:39.567440 27431 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 437095ns I0925 19:15:39.567483 27431 leveldb.cpp:401] Deleting ~1 keys from leveldb took 31954ns I0925 19:15:39.567498 27431 replica.cpp:679] Persisted action at 2 I0925 19:15:39.567514 27431 replica.cpp:664] Replica learned TRUNCATE action at position 2 I0925 19:15:39.576660 27410 containerizer.cpp:143] Using isolation: posix/cpu,posix/mem,filesystem/posix W0925 19:15:39.577055 27410 backend.cpp:50] Failed to create 'bind' backend: BindBackend requires root privileges I0925 19:15:39.583020 27443 slave.cpp:190] Slave started on 46)@172.17.1.195:41781 I0925 19:15:39.583062 27443 slave.cpp:191] Flags at startup: --appc_store_dir=""/tmp/mesos/store/appc"" --authenticatee=""crammd5"" --cgroups_cpu_enable_pids_and_tids_count=""false"" --cgroups_enable_cfs=""false"" --cgroups_hierarchy=""/sys/fs/cgroup"" --cgroups_limit_swap=""false"" --cgroups_root=""mesos"" --container_disk_watch_interval=""15secs"" --containerizers=""mesos"" --credential=""/tmp/FetcherCacheTest_LocalUncachedExtract_LwfzK4/credential"" --default_role=""*"" --disk_watch_interval=""1mins"" --docker=""docker"" --docker_kill_orphans=""true"" --docker_local_archives_dir=""/tmp/mesos/images/docker"" --docker_puller=""local"" --docker_remove_delay=""6hrs"" --docker_socket=""/var/run/docker.sock"" --docker_stop_timeout=""0ns"" --docker_store_dir=""/tmp/mesos/store/docker"" --enforce_container_disk_quota=""false"" --executor_registration_timeout=""1mins"" --executor_shutdown_grace_period=""5secs"" --fetcher_cache_dir=""/tmp/FetcherCacheTest_LocalUncachedExtract_LwfzK4/fetch"" --fetcher_cache_size=""2GB"" --frameworks_home="""" --gc_delay=""1weeks"" --gc_disk_headroom=""0.1"" --hadoop_home="""" --help=""false"" --hostname_lookup=""true"" --image_provisioner_backend=""copy"" --initialize_driver_logging=""true"" --isolation=""posix/cpu,posix/mem"" --launcher_dir=""/mesos/mesos-0.26.0/_build/src"" --logbufsecs=""0"" --logging_level=""INFO"" --oversubscribed_resources_interval=""15secs"" --perf_duration=""10secs"" --perf_interval=""1mins"" --qos_correction_interval_min=""0ns"" --quiet=""false"" --recover=""reconnect"" --recovery_timeout=""15mins"" --registration_backoff_factor=""10ms"" --resource_monitoring_interval=""1secs"" --resources=""cpus(*):1000; mem(*):1000"" --revocable_cpu_low_priority=""true"" --sandbox_directory=""/mnt/mesos/sandbox"" --strict=""true"" --switch_user=""true"" --systemd_runtime_directory=""/run/systemd/system"" --version=""false"" --work_dir=""/tmp/FetcherCacheTest_LocalUncachedExtract_LwfzK4"" I0925 19:15:39.583472 27443 credentials.hpp:85] Loading credential for authentication from '/tmp/FetcherCacheTest_LocalUncachedExtract_LwfzK4/credential' I0925 19:15:39.583752 27443 slave.cpp:321] Slave using credential for: test-principal I0925 19:15:39.584249 27443 slave.cpp:354] Slave resources: cpus(*):1000; mem(*):1000; disk(*):3.70122e+06; ports(*):[31000-32000] I0925 19:15:39.584344 27443 slave.cpp:390] Slave hostname: f57fd4291168 I0925 19:15:39.584362 27443 slave.cpp:395] Slave checkpoint: true I0925 19:15:39.585180 27428 state.cpp:54] Recovering state from '/tmp/FetcherCacheTest_LocalUncachedExtract_LwfzK4/meta' I0925 19:15:39.585383 27440 status_update_manager.cpp:202] Recovering status update manager I0925 19:15:39.585636 27435 containerizer.cpp:386] Recovering containerizer I0925 19:15:39.586380 27438 slave.cpp:4110] Finished recovery I0925 19:15:39.586845 27438 slave.cpp:4267] Querying resource estimator for oversubscribable resources I0925 19:15:39.587059 27430 status_update_manager.cpp:176] Pausing sending status updates I0925 19:15:39.587064 27438 slave.cpp:705] New master detected at master@172.17.1.195:41781 I0925 19:15:39.587139 27438 slave.cpp:768] Authenticating with master master@172.17.1.195:41781 I0925 19:15:39.587163 27438 slave.cpp:773] Using default CRAM-MD5 authenticatee I0925 19:15:39.587321 27438 slave.cpp:741] Detecting new master I0925 19:15:39.587357 27434 authenticatee.cpp:115] Creating new client SASL connection I0925 19:15:39.587574 27438 slave.cpp:4281] Received oversubscribable resources from the resource estimator I0925 19:15:39.587739 27442 master.cpp:5138] Authenticating slave(46)@172.17.1.195:41781 I0925 19:15:39.587853 27441 authenticator.cpp:407] Starting authentication session for crammd5_authenticatee(139)@172.17.1.195:41781 I0925 19:15:39.588052 27439 authenticator.cpp:92] Creating new server SASL connection I0925 19:15:39.588248 27431 authenticatee.cpp:206] Received SASL authentication mechanisms: CRAM-MD5 I0925 19:15:39.588297 27431 authenticatee.cpp:232] Attempting to authenticate with mechanism 'CRAM-MD5' I0925 19:15:39.588443 27437 authenticator.cpp:197] Received SASL authentication start I0925 19:15:39.588506 27437 authenticator.cpp:319] Authentication requires more steps I0925 19:15:39.588677 27443 authenticatee.cpp:252] Received SASL authentication step I0925 19:15:39.588814 27436 authenticator.cpp:225] Received SASL authentication step I0925 19:15:39.588855 27436 auxprop.cpp:102] Request to lookup properties for user: 'test-principal' realm: 'f57fd4291168' server FQDN: 'f57fd4291168' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0925 19:15:39.588876 27436 auxprop.cpp:174] Looking up auxiliary property '*userPassword' I0925 19:15:39.588937 27436 auxprop.cpp:174] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0925 19:15:39.588979 27436 auxprop.cpp:102] Request to lookup properties for user: 'test-principal' realm: 'f57fd4291168' server FQDN: 'f57fd4291168' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0925 19:15:39.588997 27436 auxprop.cpp:124] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0925 19:15:39.589011 27436 auxprop.cpp:124] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0925 19:15:39.589036 27436 authenticator.cpp:311] Authentication success I0925 19:15:39.589126 27443 authenticatee.cpp:292] Authentication success I0925 19:15:39.589192 27437 master.cpp:5168] Successfully authenticated principal 'test-principal' at slave(46)@172.17.1.195:41781 I0925 19:15:39.589238 27433 authenticator.cpp:425] Authentication session cleanup for crammd5_authenticatee(139)@172.17.1.195:41781 I0925 19:15:39.589412 27440 slave.cpp:836] Successfully authenticated with master master@172.17.1.195:41781 I0925 19:15:39.589540 27440 slave.cpp:1230] Will retry registration in 13.562027ms if necessary I0925 19:15:39.589745 27436 master.cpp:3862] Registering slave at slave(46)@172.17.1.195:41781 (f57fd4291168) with id c8bf1c95-50f4-4832-a570-c560f0b466ae-S0 I0925 19:15:39.590121 27438 registrar.cpp:441] Applied 1 operations in 70627ns; attempting to update the 'registry' I0925 19:15:39.590831 27430 log.cpp:685] Attempting to append 345 bytes to the log I0925 19:15:39.590927 27439 coordinator.cpp:341] Coordinator attempting to write APPEND action at position 3 I0925 19:15:39.591809 27430 replica.cpp:511] Replica received write request for position 3 I0925 19:15:39.592072 27430 leveldb.cpp:343] Persisting action (364 bytes) to leveldb took 221734ns I0925 19:15:39.592099 27430 replica.cpp:679] Persisted action at 3 I0925 19:15:39.592643 27442 replica.cpp:658] Replica received learned notice for position 3 I0925 19:15:39.593215 27442 leveldb.cpp:343] Persisting action (366 bytes) to leveldb took 560946ns I0925 19:15:39.593237 27442 replica.cpp:679] Persisted action at 3 I0925 19:15:39.593255 27442 replica.cpp:664] Replica learned APPEND action at position 3 I0925 19:15:39.594663 27433 registrar.cpp:486] Successfully updated the 'registry' in 4.472832ms I0925 19:15:39.594874 27431 log.cpp:704] Attempting to truncate the log to 3 I0925 19:15:39.595407 27429 slave.cpp:3138] Received ping from slave-observer(45)@172.17.1.195:41781 I0925 19:15:39.595450 27433 coordinator.cpp:341] Coordinator attempting to write TRUNCATE action at position 4 I0925 19:15:39.596017 27442 replica.cpp:511] Replica received write request for position 4 I0925 19:15:39.596029 27429 hierarchical.hpp:675] Added slave c8bf1c95-50f4-4832-a570-c560f0b466ae-S0 (f57fd4291168) with cpus(*):1000; mem(*):1000; disk(*):3.70122e+06; ports(*):[31000-32000] (allocated: ) I0925 19:15:39.595952 27441 master.cpp:3930] Registered slave c8bf1c95-50f4-4832-a570-c560f0b466ae-S0 at slave(46)@172.17.1.195:41781 (f57fd4291168) with cpus(*):1000; mem(*):1000; disk(*):3.70122e+06; ports(*):[31000-32000] I0925 19:15:39.596240 27429 hierarchical.hpp:1326] No resources available to allocate! I0925 19:15:39.596263 27439 slave.cpp:880] Registered with master master@172.17.1.195:41781; given slave ID c8bf1c95-50f4-4832-a570-c560f0b466ae-S0 I0925 19:15:39.596341 27439 fetcher.cpp:77] Clearing fetcher cache I0925 19:15:39.596345 27429 hierarchical.hpp:1421] No inverse offers to send out! I0925 19:15:39.596367 27429 hierarchical.hpp:1239] Performed allocation for slave c8bf1c95-50f4-4832-a570-c560f0b466ae-S0 in 299337ns I0925 19:15:39.596524 27434 status_update_manager.cpp:183] Resuming sending status updates I0925 19:15:39.596571 27442 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 575374ns I0925 19:15:39.596662 27442 replica.cpp:679] Persisted action at 4 I0925 19:15:39.596984 27439 slave.cpp:903] Checkpointing SlaveInfo to '/tmp/FetcherCacheTest_LocalUncachedExtract_LwfzK4/meta/slaves/c8bf1c95-50f4-4832-a570-c560f0b466ae-S0/slave.info' I0925 19:15:39.597522 27434 replica.cpp:658] Replica received learned notice for position 4 I0925 19:15:39.597553 27410 sched.cpp:164] Version: 0.26.0 I0925 19:15:39.597746 27439 slave.cpp:939] Forwarding total oversubscribed resources I0925 19:15:39.598021 27429 master.cpp:4272] Received update of slave c8bf1c95-50f4-4832-a570-c560f0b466ae-S0 at slave(46)@172.17.1.195:41781 (f57fd4291168) with total oversubscribed resources I0925 19:15:39.598070 27434 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 531503ns I0925 19:15:39.598162 27434 leveldb.cpp:401] Deleting ~2 keys from leveldb took 79081ns I0925 19:15:39.598170 27428 sched.cpp:262] New master detected at master@172.17.1.195:41781 I0925 19:15:39.598206 27434 replica.cpp:679] Persisted action at 4 I0925 19:15:39.598238 27434 replica.cpp:664] Replica learned TRUNCATE action at position 4 I0925 19:15:39.598276 27428 sched.cpp:318] Authenticating with master master@172.17.1.195:41781 I0925 19:15:39.598296 27428 sched.cpp:325] Using default CRAM-MD5 authenticatee I0925 19:15:39.598950 27430 hierarchical.hpp:735] Slave c8bf1c95-50f4-4832-a570-c560f0b466ae-S0 (f57fd4291168) updated with oversubscribed resources (total: cpus(*):1000; mem(*):1000; disk(*):3.70122e+06; ports(*):[31000-32000], allocated: ) I0925 19:15:39.599242 27430 hierarchical.hpp:1326] No resources available to allocate! I0925 19:15:39.599282 27430 hierarchical.hpp:1421] No inverse offers to send out! I0925 19:15:39.599341 27430 hierarchical.hpp:1239] Performed allocation for slave c8bf1c95-50f4-4832-a570-c560f0b466ae-S0 in 327742ns I0925 19:15:39.599632 27437 authenticatee.cpp:115] Creating new client SASL connection I0925 19:15:39.600005 27428 master.cpp:5138] Authenticating scheduler-dda30e8e-47b7-4b1d-9a96-32364754be63@172.17.1.195:41781 I0925 19:15:39.600170 27435 authenticator.cpp:407] Starting authentication session for crammd5_authenticatee(140)@172.17.1.195:41781 I0925 19:15:39.600518 27433 authenticator.cpp:92] Creating new server SASL connection I0925 19:15:39.600788 27436 authenticatee.cpp:206] Received SASL authentication mechanisms: CRAM-MD5 I0925 19:15:39.600831 27436 authenticatee.cpp:232] Attempting to authenticate with mechanism 'CRAM-MD5' I0925 19:15:39.600944 27433 authenticator.cpp:197] Received SASL authentication start I0925 19:15:39.601019 27433 authenticator.cpp:319] Authentication requires more steps I0925 19:15:39.601150 27436 authenticatee.cpp:252] Received SASL authentication step I0925 19:15:39.601284 27436 authenticator.cpp:225] Received SASL authentication step I0925 19:15:39.601326 27436 auxprop.cpp:102] Request to lookup properties for user: 'test-principal' realm: 'f57fd4291168' server FQDN: 'f...",2 MESOS-3581,"License headers show up all over doxygen documentation.","Currently license headers are commented in something resembling Javadoc style, {code} /** * Licensed ... {code} Since we use Javadoc-style comment blocks for doxygen documentation all license headers appear in the generated documentation, potentially and likely hiding the actual documentation. Using {{/*}} to start the comment blocks would be enough to hide them from doxygen, but would likely also result in a largish (though mostly uninteresting) patch.",2 MESOS-3583,"Introduce stream IDs in HTTP Scheduler API","Currently, the HTTP Scheduler API has no concept of Sessions aka {{SessionID}} or a {{TokenID}}. This is useful in some failure scenarios. As of now, if a framework fails over and then subscribes again with the same {{FrameworkID}} with the {{force}} option set, the Mesos master would subscribe it. If the previous instance of the framework/scheduler tries to send a Call , e.g. {{Call::KILL}} with the same previous {{FrameworkID}} set, it would be still accepted by the master leading to erroneously killing a task. This is possible because we do not have a way currently of distinguishing connections. It used to work in the previous driver implementation due to the master also performing a {{UPID}} check to verify if they matched and only then allowing the call. Following the design process, we will implemented ""stream IDs"" for Mesos HTTP schedulers; each ID will be associated with a single subscription connection, and the scheduler must include it as a header in all non-subscribe calls sent to the master.",5 MESOS-3584,"rename libprocess tests to ""libprocess-tests""","Stout tests are in a binary named {{stout-tests}}, Mesos tests are in {{mesos-tests}}, but libprocess tests are just {{tests}}. It would be helpful to name them {{libprocess-tests}} ",1 MESOS-3585,"Add a test module for ip-per-container support","With the addition of {{NetworkInfo}} to allow frameworks to request IP-per-container for their tasks, we should add a simple module that mimics the behavior of a real network-isolation module for testing purposes. We can then add this module in {{src/examples}} and write some tests against it. This module can also serve as a template module for third-party network isolation provides for building their own network isolator modules.",3 MESOS-3586,"MemoryPressureMesosTest.CGROUPS_ROOT_Statistics and CGROUPS_ROOT_SlaveRecovery are flaky","I am install Mesos 0.24.0 on 4 servers which have very similar hardware and software configurations. After performing {{../configure}}, {{make}}, and {{make check}} some servers have completed successfully and other failed on test {{[ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics}}. Is there something I should check in this test? {code} PERFORMED MAKE CHECK NODE-001 [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics I1005 14:37:35.585067 38479 exec.cpp:133] Version: 0.24.0 I1005 14:37:35.593789 38497 exec.cpp:207] Executor registered on slave 20151005-143735-2393768202-35106-27900-S0 Registered executor on svdidac038.techlabs.accenture.com Starting task 010b2fe9-4eac-4136-8a8a-6ce7665488b0 Forked command at 38510 sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done' PERFORMED MAKE CHECK NODE-002 [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics I1005 14:38:58.794112 36997 exec.cpp:133] Version: 0.24.0 I1005 14:38:58.802851 37022 exec.cpp:207] Executor registered on slave 20151005-143857-2360213770-50427-26325-S0 Registered executor on svdidac039.techlabs.accenture.com Starting task 9bb317ba-41cb-44a4-b507-d1c85ceabc28 sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done' Forked command at 37028 ../../src/tests/containerizer/memory_pressure_tests.cpp:145: Failure Expected: (usage.get().mem_medium_pressure_counter()) >= (usage.get().mem_critical_pressure_counter()), actual: 5 vs 6 2015-10-05 14:39:00,130:26325(0x2af08cc78700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:37198] zk retcode=-4, errno=111(Connection refused): server refused to accept the client [ FAILED ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (4303 ms) {code}",1 MESOS-3587,"Framework failover when framework is 'active' does not trigger allocation.","FWICT, this is just a consequence of some technical debt in the master code. When an active framework fails over, we do not go through the deactivation->activation code paths, and so: (1) The framework's filters in the allocator remain after the failover. (2) The failed over framework does not receive an immediate allocation (it has to wait for the next allocation interval). If the framework had disconnected first, then the failover goes through the deactivation->activation code paths. This also means that some tests take longer to run than necessary.",5 MESOS-3593,"Propagate Isolator::prepare() failures to the framework","Currently, if {{Isolator::prepare}} fails for some isolator(s), we simply return a generic message about container being destroyed during launch. It would be especially helpful if a third-party isolator modules could report the error back to the framework.",2 MESOS-3595,"Framework process hangs after master failover when number frameworks > libprocess thread pool size","When running multi framework instances per process, if the number of framework created exceeds the libprocess threads then during master failover the zookeeper updates can cause deadlock. E.g. On a machine with 24 cpus, if the framework instance count exceeds 24 ( per process) then when the master fails over all the libprocess threads block updating the cache ( GroupProcess) leading to deadlock. Below is the stack trace of one the libprocess thread : {code} Thread 101 (Thread 0x7f42821f1700 (LWP 5974)): #0 0x000000314100b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f42870d1637 in Gate::arrive(long) () from /Users/mchadha/venv/lib/python2.7/site-packages/mesos.native-0.22.1003-py2.7-linux-x86_64.egg/mesos/native/_mesos.so #2 0x00007f42870be87c in process::ProcessManager::wait(process::UPID const&) () from /Users/mchadha/venv/lib/python2.7/site-packages/mesos.native-0.22.1003-py2.7-linux-x86_64.eg g/mesos/native/_mesos.so #3 0x00007f42870c25f7 in process::wait(process::UPID const&, Duration const&) () from /Users/mchadha/venv/lib/python2.7/site-packages/mesos.native-0.22.1003-py2.7-linux-x86_64.e gg/mesos/native/_mesos.so #4 0x00007f428708e294 in process::Latch::await(Duration const&) () from /Users/mchadha/venv/lib/python2.7/site-packages/mesos.native-0.22.1003-py2.7-linux-x86_64.egg/mesos/nativ e/_mesos.so #5 0x00007f4286b67dea in process::Future::await(Duration const&) const () from /Users/mchadha/venv/lib/python2.7/site-packages/mesos.native-0.22.1003-py2.7-linux-x86_64.egg /mesos/native/_mesos.so #6 0x00007f4286b5a0df in process::Future::get() const () from /Users/mchadha/venv/lib/python2.7/site-packages/mesos.native-0.22.1003-py2.7-linux-x86_64.egg/mesos/native/_me sos.so #7 0x00007f4286ff0508 in ZooKeeper::getChildren(std::basic_string, std::allocator > const&, bool, std::vector, std::allocator >, std::allocator, std::allocator > > >*) () from /Users/mchadha/venv/lib/python2.7/site -packages/mesos.native-0.22.1003-py2.7-linux-x86_64.egg/mesos/native/_mesos.so #8 0x00007f4286cb394e in zookeeper::GroupProcess::cache() () from /Users/mchadha/venv/lib/python2.7/site-packages/mesos.native-0.22.1003-py2.7-linux-x86_64.egg/mesos/native/_mes os.so #9 0x00007f4286cb1e63 in zookeeper::GroupProcess::updated(long, std::basic_string, std::allocator > const&) () from /Users/mchadha/venv/lib/py thon2.7/site-packages/mesos.native-0.22.1003-py2.7-linux-x86_64.egg/mesos/native/_mesos.so #10 0x00007f4286ce027a in std::tr1::_Mem_fn, std::allocator > const&)>::operator()(zo okeeper::GroupProcess*, long, std::basic_string, std::allocator > const&) const () from /Users/mchadha/venv/lib/python2.7/site-packages/mesos.n ative-0.22.1003-py2.7-linux-x86_64.egg/mesos/native/_mesos.so #11 0x00007f4286ce0067 in std::tr1::result_of, std::allocator > con st&)> ()(std::tr1::result_of, false, true> ()(std::tr1::_Placeholder<1>, std::tr1::tuple)>::type, std::tr1::res ult_of ()(long, std::tr1::_Mu, false, true> ()(std::tr1::_Placeholder<1>, std::tr1::tuple)) >::type, std::tr1::result_of, std::allocator >, false, false> ()(std::basic_string , std::allocator >, std::tr1::_Mu, false, true> ()(std::tr1::_Placeholder<1>, std::tr1::tuple))>::type)>::type std::tr1 ::_Bind, std::allocator > const&)> ()(std::tr1::_Placeholder<1>, lo ng, std::basic_string, std::allocator >)>::__call(std::tr1::_Mu, false, true> ( c onst&)(std::tr1::_Placeholder<1>, std::tr1::tuple), std::tr1::_Index_tuple<0, 1, 2>) () from /Users/mchadha/venv/lib/python2.7/site-packages/mesos.nati ve-0.22.1003-py2.7-linux-x86_64.egg/mesos/native/_mesos.so #12 0x00007f4286cdfd16 in std::tr1::result_of, std::allocator > con st&)> ()(std::tr1::result_of, false, true> ()(std::tr1::_Placeholder<1>, std::tr1::tuple)>::type, std::tr1::resu lt_of ()(long, std::tr1::_Mu, false, true> ()(std::tr1::_Placeholder<1>, std::tr1::tuple))>: :type, std::tr1::result_of, std::allocator >, false, false> ()(std::basic_string, std::allocator >, std::tr1::_Mu, false, true> ()(std::tr1::_Placeholder<1>, std::tr1::tuple))>::type)>::type std::tr1::_ Bind, std::allocator > const&)> ()(std::tr1::_Placeholder<1>, long, std::basic_string, std::allocator >)>::operator()(zookeeper::GroupProcess*&) () from /Users/mchadha/venv/lib/python2 .7/site-packages/mesos.native-0.22.1003-py2.7-linux-x86_64.egg/mesos/native/_mesos.so #13 0x00007f4286cdf8be in std::tr1::_Function_handler, std::allocator > const&)> ()(std::tr1::_Placeholder<1>, long, std::basic_string, std::allocator >)> >::_ M_invoke(std::tr1::_Any_data const&, zookeeper::GroupProcess*) () from /Users/mchadha/venv/lib/python2.7/site-packages/mesos.native-0.22.1003-py2.7-linux-x86_64.egg/mesos/native/ _mesos.so #14 0x00007f4286cc2394 in std::tr1::function::operator()(zookeeper::GroupProcess*) const () from /Users/mchadha/venv/lib/python2.7/site-package s/mesos.native-0.22.1003-py2.7-linux-x86_64.egg/mesos/native/_mesos.so #15 0x00007f4286cbc3a2 in void process::internal::vdispatcher(process::ProcessBase*, std::tr1::shared_ptr >) () from /Users/mchadha/venv/lib/python2.7/site-packages/mesos.native-0.22.1003-py2.7-linux-x86_64.egg/mesos/native/_mesos.so #16 0x00007f4286ccdca5 in std::tr1::result_of, false, true> ()(std::tr1::_Placeholder<1>, std::tr1::tuple)>::type, std::tr1::result_of >, false, false> ()(std::tr1::shared_p tr >, std::tr1::_Mu, false, true> ()(std::tr1::_Placeholder<1>, std::tr1::tuple))>::type))(process::ProcessBase*, std::tr1::shared_ptr >)>::type std::tr1::_Bind, std::tr1::shared_ptr >))(process::ProcessBase*, std::tr1::shared_ptr > )>::__call(std::tr1::_Mu, false, true> ( const&)(std::tr1::_Placeholder<1>, std::tr1::tuple), std: :tr1::_Index_tuple<0, 1>) () from /Users/mchadha/venv/lib/python2.7/site-packages/mesos.native-0.22.1003-py2.7-linux-x86_64.egg/mesos/native/_mesos.so #17 0x00007f4286cc7a5a in std::tr1::result_of, false, true> ()(std::tr1::_Placeholder<1>, std::tr1::tuple)>::type, std::tr1::result_of >, false, false> ()(std::tr1::shared_pt r >, std::tr1::_Mu, false, true> ()(std::tr1::_Placeholder<1>, std::tr1::tuple))>::type))(process::ProcessBase*, std::tr1::shared_ptr >)>::type std::tr1::_Bind, st d::tr1::shared_ptr >))(process::ProcessBase*, std::tr1::shared_ptr >)> ::operator()(process::ProcessBase*&) () from /Users/mchadha/venv/lib/python2.7/site-packages/mesos.native-0.22.1003-py2.7-linux-x86_64.egg/mesos/native/_me sos.so #18 0x00007f4286cc2480 in std::tr1::_Function_handler, std::tr1::shared_ptr >))(process::ProcessBase*, std::tr1::shared_ptr >)> >::_M_invoke(std::tr1::_Any_data con st&, process::ProcessBase*) () from /Users/mchadha/venv/lib/python2.7/site-packages/mesos.native-0.22.1003-py2.7-linux-x86_64.egg/mesos/native/_mesos.so #19 0x00007f42870db546 in std::tr1::function::operator()(process::ProcessBase*) const () from /Users/mchadha/venv/lib/python2.7/site-packages/meso s.native-0.22.1003-py2.7-linux-x86_64.egg/mesos/native/_mesos.so #20 0x00007f42870c1013 in process::ProcessBase::visit(process::DispatchEvent const&) () from /Users/mchadha/venv/lib/python2.7/site-packages/mesos.native-0.22.1003-py2.7-linux-x8 6_64.egg/mesos/native/_mesos.so #21 0x00007f42870c5582 in process::DispatchEvent::visit(process::EventVisitor*) const () from /Users/mchadha/venv/lib/python2.7/site-packages/mesos.native-0.22.1003-py2.7-linux-x 86_64.egg/mesos/native/_mesos.so #22 0x00007f428666680e in process::ProcessBase::serve(process::Event const&) () from /Users/mchadha/venv/lib/python2.7/site-packages/mesos.native-0.22.1003-py2.7-linux-x86_64.egg /mesos/native/_mesos.so #23 0x00007f42870bd88f in process::ProcessManager::resume(process::ProcessBase*) () from /Users/mchadha/venv/lib/python2.7/site-packages/mesos.native-0.22.1003-py2.7-linux-x86_64 .egg/mesos/native/_mesos.so #24 0x00007f42870b1cb9 in process::schedule(void*) () from /Users/mchadha/venv/lib/python2.7/site-packages/mesos.native-0.22.1003-py2.7-linux-x86_64.egg/mesos/native/_mesos.so #25 0x00000031410079d1 in start_thread () from /lib64/libpthread.so.0 #26 0x00000031408e88fd in clone () from /lib64/libc.so.6 {code} Solution: Create master detector per url instead of per framework. Will send the review request. ",3 MESOS-3603,"Test build failure due to comparison between signed and unsigned integers","Compilation fails on OpenSUSE Tumbleweed (Linux 4.1.6, gcc 5.1.1, glibc 2.22) with the following errors: {code} In file included from ../../src/tests/values_tests.cpp:22:0: ../3rdparty/libprocess/3rdparty/gmock-1.7.0/gtest/include/gtest/gtest.h: In instantiatio n of ‘testing::AssertionResult testing::internal::CmpHelperEQ(const char*, const char*, const T1&, const T2&) [with T1 = int; T2 = long unsigned int]’: ../3rdparty/libprocess/3rdparty/gmock-1.7.0/gtest/include/gtest/gtest.h:1484:23: requi red from ‘static testing::AssertionResult testing::internal::EqHelper::Compare(const char*, const char*, const T1&, const T2&) [with T1 = int; T2 = long un signed int; bool lhs_is_null_literal = false]’ ../../src/tests/values_tests.cpp:287:3: required from here ../3rdparty/libprocess/3rdparty/gmock-1.7.0/gtest/include/gtest/gtest.h:1448:16: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare] if (expected == actual) { ^ CXX tests/containerizer/mesos_tests-provisioner_docker_tests.o ^CMakefile:6779: recipe for target 'tests/mesos_tests-values_tests.o' failed make[3]: *** [tests/mesos_tests-values_tests.o] Interrupt {code}",1 MESOS-3604,"ExamplesTest.PersistentVolumeFramework does not work in OS X El Capitan","The example persistent volume framework test does not pass in OS X El Capitan. It seems to be executing the {{/src/.libs/mesos-executor}} directly while it should be executing the wrapper script at {{/src/mesos-executor}} instead. The no-executor framework passes however, which seem to have a very similar configuration with the persistent volume framework. The following is the output that shows the {{dyld}} load error: {noformat} I1008 01:22:52.280140 4284416 launcher.cpp:132] Forked child with pid '1706' for contain er 'b6d3bd96-2ebd-47b1-a16a-a22ffba992aa' I1008 01:22:52.280300 4284416 containerizer.cpp:873] Checkpointing executor's forked pid 1706 to '/var/folders/p6/nfxknpz52dzfc6zqnz23tq180000gn/T/mesos-XXXXXX.5OZ3locB/0/meta/ slaves/34d6329e-69cb-4a72-aee4-fe892bf1c70b-S2/frameworks/34d6329e-69cb-4a72-aee4-fe892b f1c70b-0000/executors/dec188d4-d2dc-40c5-ac4d-881adc3d81c0/runs/b6d3bd96-2ebd-47b1-a16a- a22ffba992aa/pids/forked.pid' dyld: Library not loaded: /usr/local/lib/libmesos-0.26.0.dylib Referenced from: /Users/mpark/Projects/mesos/build/src/.libs/mesos-executor Reason: image not found dyld: Library not loaded: /usr/local/lib/libmesos-0.26.0.dylib Referenced from: /Users/mpark/Projects/mesos/build/src/.libs/mesos-executor Reason: image not found dyld: Library not loaded: /usr/local/lib/libmesos-0.26.0.dylib Referenced from: /Users/mpark/Projects/mesos/build/src/.libs/mesos-executor Reason: image not found I1008 01:22:52.365397 3211264 containerizer.cpp:1284] Executor for container '06b649be-88c8-4047-8fb5-e89bdd096b66' has exited I1008 01:22:52.365433 3211264 containerizer.cpp:1097] Destroying container '06b649be-88c8-4047-8fb5-e89bdd096b66' {noformat}",3 MESOS-3613,"Port slave/paths.cpp to Windows","Important subset of dependency tree of changes necessary: slave/paths.cpp: os, path",1 MESOS-3615,"Port slave/state.cpp","Important subset of changes this depends on: slave/state.cpp: pid, os, path, protobuf, paths, state pid.hpp: address.hpp, ip.hpp address.hpp: ip.hpp, net.hpp net.hpp: ip, networking stuff state: type_utils, pid, os, path, protobuf, uuid type_utils.hpp: uuid.hpp",3 MESOS-3618,"Port slave/containerizer/fetcher.cpp","Important subset of the dependency tree follows: slave/containerizer/fetcher.cpp: slave, fetcher, collect, dispatch, net collect: future, defer, process fetcher: type_utils, future, process, subprocess dispatch.hpp: process.hpp net.hpp: ip, networking stuff future.hpp: pid.hpp defer.hpp: deferred.hpp, dispatch.hpp deferred.hpp: dispatch.hpp, pid.hpp type_utils.hpp: uuid.hpp subprocess: os, future",3 MESOS-3619,"Port slave/containerizer/isolator.cpp to Windows","Important subset of the dependency tree follows: isolator.hpp: dispatch.hpp, path.hpp isolator: process dispatch.hpp: process.hpp ",3 MESOS-3620,"Create slave/containerizer/isolators/filesystem/windows.cpp","Should look a lot like the posix.cpp flavor. Important subset of the dependency tree follows for the posix flavor: slave/containerizer/isolators/filesystem/posix.cpp: filesystem/posix, fs, os, path filesystem/posix: flags, isolator",3 MESOS-3623,"Port slave/containerizer/mesos/containerizer.cpp to Windows","Important subset of the dependency tree follows: slave/containerizer/mesos/containerizer.cpp: isolator, collect, defer, io, metrics, reap, subprocess, fs, os, path, protobuf_utils, paths, slave, containerizer, fetcher, launcher, posix, disk, containerizer, launch, provisioner",3 MESOS-3624,"Port slave/containerizer/mesos/launch.cpp to Windows","Important subset of the dependency tree follows: slave/containerizer/mesos/launch.cpp: os, protobuf, launch launch: subcommand subcommand: flags flags.hpp: os.hpp, path.hpp, fetch.hpp",3 MESOS-3625,"Add support for github and variable base URLs to apply-reviews.py","From Adam's email on dev@ list: I have used the '-g' feature for github PRs in the past, and we should continue to support that model, so that new Mesos contributors don't have to create new RB accounts and learn a new process just for quick documentation changes, etc. As a side note, now that the Myriad incubator project has migrated to Apache git and we can no longer merge PRs directly, we were hoping to take advantage of a tool like apply-reviews to apply our PR patches. It looks like apply-reviews.sh only specifies 'mesos' in the GITHUB_URL/API_URL. Would apply-reviews.py be just as easy to reuse for another project (i.e. Myriad)?",3 MESOS-3639,"Implement stout/os/windows/killtree.hpp","killtree() is implemented using Windows Job Objects. The processes created by the executor are associated with a job object using `create_job'. killtree() is simply terminating the job object. Helper functions: `create_job` function creates a job object whose name is derived from the `pid` and associates the `pid` process with the job object. Every process started by the process which is part of the job object becomes part of the job object. The job name should match the name used in `kill_job`. The jobs should be create with JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE and allow the caller to decide how to handle the returned handle. `kill_job` function assumes the process identified by `pid` is associated with a job object whose name is derive from it. Every process started by the process which is part of the job object becomes part of the job object. Destroying the task will close all such processes.",5 MESOS-3640,"Implement stout/os/windows/ls.hpp",NULL,3 MESOS-3641,"Implement stout/os/windows/read.hpp and write.hpp",NULL,2 MESOS-3645,"Implement stout/os/windows/stat.hpp",NULL,8 MESOS-3683,"Port slave/containerizer/isolator.hpp to Windows",NULL,3 MESOS-3688,"Get Container Name information when launching a container task","We want to get the Docker Name (or Docker ID, or both) when launching a container task with mesos. The container name is generated by mesos itself (i.e. mesos-77e5fde6-83e7-4618-a2dd-d5b10f2b4d25, obtained with ""docker ps"") and it would be nice to expose this information to frameworks so that this information can be used, for example by Marathon to give this information to users via a REST API. To go a bit in depth with our use case, we have files created by fluentd logdriver that are named with Docker Name or Docker ID (full or short) and we need a mapping for the users of the REST API and thus the first step is to make this information available from mesos. ",3 MESOS-3692,"Clarify error message 'could not chown work directory'","When deploying a framework I encountered the error message 'could not chown work directory'. It took me a while to figure out that this happened because my framework was registered as a user on my host machine which did not exist on the Docker container and the agent was running as root. I suggest to clarify this message by pointing out to either set {{--switch-user}} to {{false}} or to run the framework as the same user as the agent.",1 MESOS-3694,"Enable building mesos.apache.org locally in a Docker container.","We should make it easy for everyone to modify the website and be able to generate it locally before pushing to upstream. ",3 MESOS-3698,"JSON parsing allows non-whitespace trailing characters","Picojson supports a streaming mode in which a stream containing a series of JSON values can be repeatedly parsed. For this reason, it does not return an error when passed a string containing a valid JSON value followed by non-whitespace trailing characters. However, in addition to the four-argument {{picojson::parse()}} that we're using, picojson contains a two-argument {{parse()}} function (https://github.com/kazuho/picojson/blob/master/picojson.h#L938-L942) which accepts a {{std::string}} and should probably validate its input to ensure it doesn't contain trailing characters. A pull request has been filed for this change at https://github.com/kazuho/picojson/pull/70 and if it's merged, we can switch to the two-argument function call. In the meantime, we should provide such input validation ourselves in {{JSON::parse()}}.",1 MESOS-3700,"Deprecate resource_monitoring_interval flag","This parameter should be deprecated after 0.23.0 release as it has no use now. ",1 MESOS-3704,"Allow easier detection when hook signature changes","Currently, if the signature of a hook function changes, we don't get any compile time errors if the hook implementation is not updated. This results in a hook that is never called.",2 MESOS-3705,"HTTP Pipelining doesn't keep order of requests","[HTTP 1.1 Pipelining|https://en.wikipedia.org/wiki/HTTP_pipelining] describes a mechanism by which multiple HTTP request can be performed over a single socket. The requirement here is that responses should be send in the same order as requests are being made. Libprocess has some mechanisms built in to deal with pipelining when multiple HTTP requests are made, it is still, however, possible to create a situation in which responses are scrambled respected to the requests arrival. Consider the situation in which there are two libprocess processes, {{processA}} and {{processB}}, each running in a different thread, {{thread2}} and {{thread3}} respectively. The [{{ProcessManager}}|https://github.com/apache/mesos/blob/1d68eed9089659b06a1e710f707818dbcafeec52/3rdparty/libprocess/src/process.cpp#L374] runs in {{thread1}}. {{processA}} is of type {{ProcessA}} which looks roughly as follows: {code} class ProcessA : public ProcessBase { public: ProcessA() {} Future foo(const http::Request&) { // … Do something … return http::Ok(); } protected: virtual void initialize() { route(""/foo"", None(), &ProcessA::foo); } } {code} {{processB}} is from type {{ProcessB}} which is just like {{ProcessA}} but routes {{""bar""}} instead of {{""foo""}}. The situation in which the bug arises is the following: # Two requests, one for {{""http://server_uri/(1)/foo""}} and one for {{""http://server_uri/(2)//bar""}} are made over the same socket. # The first request arrives to [{{ProcessManager::handle}}|https://github.com/apache/mesos/blob/1d68eed9089659b06a1e710f707818dbcafeec52/3rdparty/libprocess/src/process.cpp#L2202] which is still running in {{thread1}}. This one creates an {{HttpEvent}} and delivers to the handler, in this case {{processA}}. # [{{ProcessManager::deliver}}|https://github.com/apache/mesos/blob/1d68eed9089659b06a1e710f707818dbcafeec52/3rdparty/libprocess/src/process.cpp#L2361] enqueues the HTTP event in to the {{processA}} queue. This happens in {{thread1}}. # The second request arrives to [{{ProcessManager::handle}}|https://github.com/apache/mesos/blob/1d68eed9089659b06a1e710f707818dbcafeec52/3rdparty/libprocess/src/process.cpp#L2202] which is still running in {{thread1}}. Another {{HttpEvent}} is created and delivered to the handler, in this case {{processB}}. # [{{ProcessManager::deliver}}|https://github.com/apache/mesos/blob/1d68eed9089659b06a1e710f707818dbcafeec52/3rdparty/libprocess/src/process.cpp#L2361] enqueues the HTTP event in to the {{processB}} queue. This happens in {{thread1}}. # {{Thread2}} is blocked, so {{processA}} cannot handle the first request, it is stuck in the queue. # {{Thread3}} is idle, so it picks up the request to {{processB}} immediately. # [{{ProcessBase::visit(HttpEvent)}}|https://github.com/apache/mesos/blob/1d68eed9089659b06a1e710f707818dbcafeec52/3rdparty/libprocess/src/process.cpp#L3073] is called in {{thread3}}, this one in turn [dispatches|https://github.com/apache/mesos/blob/1d68eed9089659b06a1e710f707818dbcafeec52/3rdparty/libprocess/src/process.cpp#L3106] the response's future to the {{HttpProxy}} associated with the socket where the request came. At the last point, the bug is evident, the request to {{processB}} will be send before the request to {{processA}} even if the handler takes a long time and the {{processA::bar()}} actually finishes before. The responses are not send in the order the requests are done. h1. Reproducer The following is a test which successfully reproduces the issue: {code:title=3rdparty/libprocess/src/tests/http_tests.cpp} #include get1, get2, get3; Latch latch; EXPECT_CALL(*server1.process, get(_)) .WillOnce(DoAll(FutureArg<0>(&get1), InvokeWithoutArgs([&latch]() { latch.await(); }), Return(http::OK(""1"")))) .WillOnce(DoAll(FutureArg<0>(&get2), Return(http::OK(""2"")))); EXPECT_CALL(*server2.process, get(_)) .WillOnce(DoAll(FutureArg<0>(&get3), Return(http::OK(""3"")))); auto url1 = http::URL( ""http"", server1.process->self().address.ip, server1.process->self().address.port, server1.process->self().id + ""/get""); auto url2 = http::URL( ""http"", server1.process->self().address.ip, server1.process->self().address.port, server2.process->self().id + ""/get""); // Create a connection to the server for HTTP pipelining. Future connect = http::connect(url1); AWAIT_READY(connect); http::Connection connection = connect.get(); http::Request request1; request1.method = ""GET""; request1.url = url1; request1.keepAlive = true; request1.body = ""1""; Future response1 = connection.send(request1); http::Request request2 = request1; request2.body = ""2""; Future response2 = connection.send(request2); http::Request request3; request3.method = ""GET""; request3.url = url2; request3.keepAlive = true; request3.body = ""3""; Future response3 = connection.send(request3); // Verify that request1 arrived at server1 and it is the right request. // Now server1 is blocked processing request1 and cannot pick up more events // in the queue. AWAIT_READY(get1); EXPECT_EQ(request1.body, get1->body); // Verify that request3 arrived at server2 and it is the right request. AWAIT_READY(get3); EXPECT_EQ(request3.body, get3->body); // Request2 hasn't been picked up since server1 is still blocked serving // request1. EXPECT_TRUE(get2.isPending()); // Free server1 so it can serve request2. latch.trigger(); // Verify that request2 arrived at server1 and it is the right request. AWAIT_READY(get2); EXPECT_EQ(request2.body, get2->body); // Wait for all responses. AWAIT_READY(response1); AWAIT_READY(response2); AWAIT_READY(response3); // If pipelining works as expected, even though server2 finished processing // its request before server1 even began with request2, the responses should // arrive in the order they were made. EXPECT_EQ(request1.body, response1->body); EXPECT_EQ(request2.body, response2->body); EXPECT_EQ(request3.body, response3->body); AWAIT_READY(connection.disconnect()); AWAIT_READY(connection.disconnected()); } {code}",3 MESOS-3716,"Update Allocator interface to support quota","An allocator should be notified when a quota is being set/updated or removed. Also to support master failover in presence of quota, allocator should be notified about the reregistering agents and allocations towards quota.",3 MESOS-3717,"Master recovery in presence of quota","Quota complicates master failover in several ways. The new master should determine if it is possible to satisfy the total quota and notify an operator in case it's not (imagine simultaneous failovers of multiple agents). The new master should hint the allocator how many agents might reconnect in the future to help it decide how to satisfy quota before the majority of agents reconnect.",5 MESOS-3718,"Implement Quota support in allocator","The built-in Hierarchical DRF allocator should support Quota. This includes (but not limited to): adding, updating, removing and satisfying quota; avoiding both overcomitting resources and handing them to non-quota'ed roles in presence of master failover. A [design doc for Quota support in Allocator|https://issues.apache.org/jira/browse/MESOS-2937] provides an overview of a feature set required to be implemented.",5 MESOS-3720,"Tests for Quota support in master","Allocator-agnostic tests for quota support in the master. They can be divided into several groups: * Heuristic check; * Master failover; * Functionality and quota guarantees.",5 MESOS-3722,"Prototype quota request authentication","Quota requests need to be authenticated. This ticket will authenticate quota requests using credentials provided by the `Authorization` field of the HTTP request. This is similar to how authentication is implemented in `Master::Http`.",5 MESOS-3723,"Prototype quota request authorization","When quotas are requested they should authorize their roles. This ticket will authorize quota requests with ACLs. The existing authorization support that has been implemented in MESOS-1342 will be extended to add a `request_quotas` ACL.",5 MESOS-3732,"Speed up FaultToleranceTest.FrameworkReregister test","FaultToleranceTest.FrameworkReregister test takes more than one second to complete: {code} [ RUN ] FaultToleranceTest.FrameworkReregister [ OK ] FaultToleranceTest.FrameworkReregister (1056 ms) {code} There must be a {{1s}} timeout somewhere which we should mitigate via {{Clock::advance()}}.",1 MESOS-3734,"Incorrect sed syntax for Mac OSX","The build currently fails on OSX: {noformat} ../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/protoc -I../../mesos/include/mesos/containerizer \ -I../../mesos/include -I../../mesos/src \ --python_out=python/interface/src/mesos/interface ../../mesos/include/mesos/containerizer/containerizer.proto ../../mesos/install-sh -c -d python/interface/src/mesos/v1/interface sed -i 's/mesos\.mesos_pb2/mesos_pb2/' python/interface/src/mesos/interface/containerizer_pb2.py sed: 1: ""python/interface/src/me ..."": extra characters at the end of p command make[1]: *** [python/interface/src/mesos/interface/containerizer_pb2.py] Error 1 {noformat} This is because the sed command uses the wrong syntax for OSX: you need {code}sed -i """"{code} to instruct sed to not use a backup file.",2 MESOS-3736,"Support docker local store pull same image simultaneously ","The current local store implements get() using the local puller. For all requests of pulling same docker image at the same time, the local puller just untar the image tarball as many times as those requests are, and cp all of them to the same directory, which wastes time and bear high demand of computation. We should be able to support the local store/puller only do these for the first time, and the simultaneous pulling request should wait for the promised future and get it once the first pulling finishes. ",3 MESOS-3739,"Mesos does not set Content-Type for 400 Bad Request","While integrating with the HTTP Scheduler API I encountered the following scenario. The message below was serialized to protobuf and sent as the POST body {code:title=message} call { type: ACKNOWLEDGE, acknowledge: { uuid: , agentID: { value: ""20151012-182734-16777343-5050-8978-S2"" }, taskID: { value: ""task-1"" } } } {code} {code:title=Request Headers} POST /api/v1/scheduler HTTP/1.1 Content-Type: application/x-protobuf Accept: application/x-protobuf Content-Length: 73 Host: localhost:5050 User-Agent: RxNetty Client {code} I received the following response {code:title=Response Headers} HTTP/1.1 400 Bad Request Date: Wed, 14 Oct 2015 23:21:36 GMT Content-Length: 74 Failed to validate Scheduler::Call: Expecting 'framework_id' to be present {code} Even though my accept header made no mention of {{text/plain}} the message body returned to me is {{text/plain}}. Additionally, there is no {{Content-Type}} header set on the response so I can't even do anything intelligently in my response handler.",2 MESOS-3740,"LIBPROCESS_IP not passed to Docker containers","Docker containers aren't currently passed all the same environment variables that Mesos Containerizer tasks are. See: https://github.com/apache/mesos/blob/master/src/slave/containerizer/containerizer.cpp#L254 for all the environment variables explicitly set for mesos containers. While some of them don't necessarily make sense for docker containers, when the docker has inside of it a libprocess process (A mesos framework scheduler) and is using {{--net=host}} the task needs to have LIBPROCESS_IP set otherwise the same sort of problems that happen because of MESOS-3553 can happen (libprocess will try to guess the machine's IP address with likely bad results in a number of operating environment).",3 MESOS-3743,"Provide diagnostic output in agent log when fetching fails","When fetching fails, the fetcher has written log output to stderr in the task sandbox, but it is not easy to get to. It may even be impossible to get to if one only has the agent log available and no more access to the sandbox. This is for instance the case when looking at output from a CI run. The fetcher actor in the agent detects if the external fetcher program claims to have succeeded or not. When it exits with an error code, we could grab the fetcher log from the stderr file in the sandbox and append it to the agent log. This is similar to this patch: https://reviews.apache.org/r/37813/ The difference is that the output of the latter is triggered by test failures outside the fetcher, whereas what is proposed here is triggering upon failures inside the fetcher.",2 MESOS-3748,"HTTP scheduler library does not gracefully parse invalid resource identifiers","If you pass a nonsense string for ""master"" into a framework using the C++ HTTP scheduler library, the framework segfaults. For example, using the example frameworks: {code:title=Scheduler Driver} build/src/test-framework --master=""asdf://127.0.0.1:5050"" {code} Results in: {code} Failed to create a master detector for 'asdf://127.0.0.1:5050': Failed to parse 'asdf://127.0.0.1:5050' {code} {code:title=HTTP Scheduler Library} export DEFAULT_PRINCIPAL=root build/src/event-call-framework --master=""asdf://127.0.0.1:5050"" {code} Results in {code} I1015 16:18:45.432075 2062201600 scheduler.cpp:157] Version: 0.26.0 Segmentation fault: 11 {code} {code:title=Stack Trace} * thread #2: tid = 0x28b6bb, 0x0000000100ad03ca libmesos-0.26.0.dylib`mesos::v1::scheduler::MesosProcess::initialize(this=0x00000001076031a0) + 42 at scheduler.cpp:213, stop reason = EXC_BAD_ACCESS (code=1, address=0x0) * frame #0: 0x0000000100ad03ca libmesos-0.26.0.dylib`mesos::v1::scheduler::MesosProcess::initialize(this=0x00000001076031a0) + 42 at scheduler.cpp:213 frame #1: 0x0000000100ad05f2 libmesos-0.26.0.dylib`virtual thunk to mesos::v1::scheduler::MesosProcess::initialize(this=0x00000001076031a0) + 34 at scheduler.cpp:210 frame #2: 0x00000001022b60f3 libmesos-0.26.0.dylib`::resume() + 931 at process.cpp:2449 frame #3: 0x00000001022c131c libmesos-0.26.0.dylib`::operator()() + 268 at process.cpp:2174 frame #4: 0x00000001022c0fa2 libmesos-0.26.0.dylib`::__thread_proxy > > > >() [inlined] __invoke<(lambda at ../../../3rdparty/libprocess/src/process.cpp:2158:35) &, const std::__1::atomic &> + 27 at __functional_base:415 frame #5: 0x00000001022c0f87 libmesos-0.26.0.dylib`::__thread_proxy > > > >() [inlined] __apply_functor<(lambda at ../../../3rdparty/libprocess/src/process.cpp:2158:35), std::__1::tuple > >, 0, std::__1::tuple<> > + 55 at functional:2060 frame #6: 0x00000001022c0f50 libmesos-0.26.0.dylib`::__thread_proxy > > > >() [inlined] operator()<> + 41 at functional:2123 frame #7: 0x00000001022c0f27 libmesos-0.26.0.dylib`::__thread_proxy > > > >() [inlined] __invoke > >> + 14 at __functional_base:415 frame #8: 0x00000001022c0f19 libmesos-0.26.0.dylib`::__thread_proxy > > > >() [inlined] __thread_execute > >> + 25 at thread:337 frame #9: 0x00000001022c0f00 libmesos-0.26.0.dylib`::__thread_proxy > > > >() + 368 at thread:347 frame #10: 0x00007fff964c705a libsystem_pthread.dylib`_pthread_body + 131 frame #11: 0x00007fff964c6fd7 libsystem_pthread.dylib`_pthread_start + 176 frame #12: 0x00007fff964c43ed libsystem_pthread.dylib`thread_start + 13 {code}",1 MESOS-3749,"Configuration docs are missing --enable-libevent and --enable-ssl","The {{\-\-enable-libevent}} and {{\-\-enable-ssl}} config flags are currently not documented in the ""Configuration"" docs with the rest of the flags. They should be added.",1 MESOS-3751,"MESOS_NATIVE_JAVA_LIBRARY not set on MesosContainerize tasks with --executor_environmnent_variables","When using --executor_environment_variables, and having MESOS_NATIVE_JAVA_LIBRARY in the environment of mesos-slave, the mesos containerizer does not set MESOS_NATIVE_JAVA_LIBRARY itself. Relevant code: https://github.com/apache/mesos/blob/14f7967ef307f3d98e3a4b93d92d6b3a56399b20/src/slave/containerizer/containerizer.cpp#L281 It sees that the variable is in the mesos-slave's environment (os::getenv), rather than checking if it is set in the environment variable set.",2 MESOS-3752,"CentOS 6 dependency install fails at Maven","It seems the Apache Maven dependencies have changed such that following the Getting Started docs for CentOS 6.6 will fail at Maven installation: {code} ---> Package apache-maven.noarch 0:3.3.3-2.el6 will be installed --> Processing Dependency: java-devel >= 1:1.7.0 for package: apache-maven-3.3.3-2.el6.noarch --> Finished Dependency Resolution Error: Package: apache-maven-3.3.3-2.el6.noarch (epel-apache-maven) Requires: java-devel >= 1:1.7.0 Available: java-1.5.0-gcj-devel-1.5.0.0-29.1.el6.x86_64 (base) java-devel = 1.5.0 Available: 1:java-1.6.0-openjdk-devel-1.6.0.35-1.13.7.1.el6_6.x86_64 (base) java-devel = 1:1.6.0 Available: 1:java-1.6.0-openjdk-devel-1.6.0.36-1.13.8.1.el6_7.x86_64 (updates) java-devel = 1:1.6.0 You could try using --skip-broken to work around the problem You could try running: rpm -Va --nofiles --nodigest {code}",1 MESOS-3753,"Test the HTTP Scheduler library with SSL enabled","Currently, the HTTP Scheduler library does not support SSL-enabled Mesos. (You can manually test this by spinning up an SSL-enabled master and attempt to run the event-call framework example against it.) We need to add tests that check the HTTP Scheduler library against SSL-enabled Mesos: * with downgrade support, * with required framework/client-side certifications, * with/without verification of certificates (master-side), * with/without verification of certificates (framework-side), * with a custom certificate authority (CA) These options should be controlled by the same environment variables found on the [SSL user doc|http://mesos.apache.org/documentation/latest/ssl/]. Note: This issue will be broken down into smaller sub-issues as bugs/problems are discovered.",13 MESOS-3756,"Generalized HTTP Authentication Modules","Libprocess is going to factor out an authentication interface: MESOS-3231 Here we propose that Mesos can provide implementations for this interface as Mesos modules.",13 MESOS-3758,"0.26.0 Release","Manage the release of Apache Mesos version 0.26.0. The Mesos 0.26.0 release will aim at being timely and at improving robustness. It will not be gated by new features. However, there may be blockers when it comes to bugs or incompleteness of existing features. Once these blockers are resolved, we will start deferring unresolved issues by Priority and Status until we are ready to make the first cut. Here is how you can stay informed and help out. h3. Users - Note the ""is blocked by"" links in this ticket for major targeted features. - Check out the 0.26.0 [dashboard|https://issues.apache.org/jira/secure/Dashboard.jspa?selectPageId=12327111] for status indicators. - See the in-progress [Release Notes|https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12333528&styleName=Html&projectId=12311242&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED%7C675829c365428965ebec16702c62d3637db57d84%7Clin] to see what's committed so far. - Add comments to issues describing your problems or use cases. h3. Issue Reporters - Set Target Version to 0.26.0, if appropriate. - Set Priority for fixing in 0.26.0. - Ask around on IRC or dev@ for a Shepherd! h3. Developers - Newbies: Check out [Accepted, Unassigned, 'newbie'] issues. - Looking for something meatier to work on? [Accepted, Unassigned for 0.26] - For Shepherdless issues, find a Shepherd before diving too deep!! - Update Target Version and Priority, as needed. - Discuss your intended design on the JIRA, perhaps sharing a design doc. - Update the Status to ""In Progress"" and ""Reviewable"" as you go. - Assign yourself if you are working on it. Un-assign yourself in case you stop before finishing. h3. Committers - Accept and Shepherd all relevant [Shepherdless issues]. - Update Target Version and Priority, as needed. - Add 'newbie' label to any easy ones. h3. Important JIRA fields - Target Version: Set to 0.26.0 if you want the issue to be addressed in 0.26. - Priority: Indicates how important it is for the issue to be fixed in the next release (0.26.0 in this case). If you want to update a Priority, please add a comment explaining your reason, and only change the Priority up/down one level. - Blocked-by Links: major features and critical tickets can be linked as blockers to this ticket to give a high-level overview of what we plan to land in 0.26. Non-critical issues should just set the ""Target Version"". ",5 MESOS-3759,"Document messages.proto","The messages we pass between Mesos components are largely undocumented. See this [TODO|https://github.com/apache/mesos/blob/19f14d06bac269b635657960d8ea8b2928b7830c/src/messages/messages.proto#L23].",3 MESOS-3762,"Refactor SSLTest fixture such that MesosTest can use the same helpers.","In order to write tests that exercise SSL with other components of Mesos, such as the HTTP scheduler library, we need to use the setup/teardown logic found in the {{SSLTest}} fixture. Currently, the test fixtures have separate inheritance structures like this: {code} SSLTest <- ::testing::Test MesosTest <- TemporaryDirectoryTest <- ::testing::Test {code} where {{::testing::Test}} is a gtest class. The plan is the following: # Change {{SSLTest}} to inherit from {{TemporaryDirectoryTest}}. This will require moving the setup (generation of keys and certs) from {{SetUpTestCase}} to {{SetUp}}. At the same time, *some* of the cleanup logic in the SSLTest will not be needed. # Move the logic of generating keys/certs into helpers, so that individual tests can call them when needed, much like {{MesosTest}}. # Write a child class of {{SSLTest}} which has the same functionality as the existing {{SSLTest}}, for use by the existing tests that rely on {{SSLTest}} or the {{RegistryClientTest}}. # Have {{MesosTest}} inherit from {{SSLTest}} (which might be renamed during the refactor). If Mesos is not compiled with {{--enable-ssl}}, then {{SSLTest}} could be {{#ifdef}}'d into any empty class. The resulting structure should be like: {code} MesosTest <- SSLTest <- TemporaryDirectoryTest <- ::testing::Test ChildOfSSLTest / {code}",3 MESOS-3763,"Need for http::put request method","As we decided to create a more restful api for managing Quota request. Therefore we also want to use the HTTP put request and hence need to enable the libprocess/http to send put request besides get and post requests.",1 MESOS-3771,"Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII handling","Spark encodes some binary data into the ExecutorInfo.data field. This field is sent as a ""bytes"" Protobuf value, which can have arbitrary non-UTF8 data. If you have such a field, it seems that it is splatted out into JSON without any regards to proper character encoding: {code} 0006b0b0 2e 73 70 61 72 6b 2e 65 78 65 63 75 74 6f 72 2e |.spark.executor.| 0006b0c0 4d 65 73 6f 73 45 78 65 63 75 74 6f 72 42 61 63 |MesosExecutorBac| 0006b0d0 6b 65 6e 64 22 7d 2c 22 64 61 74 61 22 3a 22 ac |kend""},""data"":"".| 0006b0e0 ed 5c 75 30 30 30 30 5c 75 30 30 30 35 75 72 5c |.\u0000\u0005ur\| 0006b0f0 75 30 30 30 30 5c 75 30 30 30 66 5b 4c 73 63 61 |u0000\u000f[Lsca| 0006b100 6c 61 2e 54 75 70 6c 65 32 3b 2e cc 5c 75 30 30 |la.Tuple2;..\u00| {code} I suspect this is because the HTTP api emits the executorInfo.data directly: {code} JSON::Object model(const ExecutorInfo& executorInfo) { JSON::Object object; object.values[""executor_id""] = executorInfo.executor_id().value(); object.values[""name""] = executorInfo.name(); object.values[""data""] = executorInfo.data(); object.values[""framework_id""] = executorInfo.framework_id().value(); object.values[""command""] = model(executorInfo.command()); object.values[""resources""] = model(executorInfo.resources()); return object; } {code} I think this may be because the custom JSON processing library in stout seems to not have any idea of what a byte array is. I'm guessing that some implicit conversion makes it get written as a String instead, but: {code} inline std::ostream& operator<<(std::ostream& out, const String& string) { // TODO(benh): This escaping DOES NOT handle unicode, it encodes as ASCII. // See RFC4627 for the JSON string specificiation. return out << picojson::value(string.value).serialize(); } {code} Thank you for any assistance here. Our cluster is currently entirely down -- the frameworks cannot handle parsing the invalid JSON produced (it is not even valid utf-8) ",2 MESOS-3772,"Consistency of quoted strings in error messages","Example log output: {quote} I1020 18:56:02.933956 1790 slave.cpp:1270] Got assigned task 13 for framework 496620b9-4368-4a71-b741-68216f3d909f-0000 I1020 18:56:02.934185 1790 slave.cpp:1386] Launching task 13 for framework 496620b9-4368-4a71-b741-68216f3d909f-0000 I1020 18:56:02.934408 1790 slave.cpp:1618] Queuing task '13' for executor default of framework '496620b9-4368-4a71-b741-68216f3d909f-0000 I1020 18:56:02.935417 1790 slave.cpp:1760] Sending queued task '13' to executor 'default' of framework 496620b9-4368-4a71-b741-68216f3d909f-0000 {quote} Aside from the typo (unmatched quote) in the third line, these log messages using quoting inconsistently: sometimes task, executor, and framework IDs are quoted, other times they are not. We should probably adopt a general rule, a la http://www.postgresql.org/docs/9.4/static/error-style-guide.html . My proposal: when interpolating a variable, only use quotes if it is possible that the value might contain whitespace or punctuation (in the latter case, the punctuation should probably be escaped).",3 MESOS-3773,"RegistryClientTest.SimpleGetBlob is flaky","{{RegistryClientTest.SimpleGetBlob}} fails about 1/5 times. This was encountered on OSX. {code:title=Repro} bin/mesos-tests.sh --gtest_filter=""*RegistryClientTest.SimpleGetBlob*"" --gtest_repeat=10 --gtest_break_on_failure {code} {code:title=Example Failure} [ RUN ] RegistryClientTest.SimpleGetBlob ../../src/tests/containerizer/provisioner_docker_tests.cpp:946: Failure Value of: blobResponse Actual: ""2015-10-20 20:58:59.579393024+00:00"" Expected: blob.get() Which is: ""\x15\x3\x3\00(P~\xCA&\xC6<\x4\x16\xE\xB2\xFF\b1a\xB9Z{\xE0\x80\xDA`\xBCt\x5R\x81x6\xF8 \x8B{\xA8\xA9\x4\xAB\xB6"" ""E\xE6\xDE\xCF\xD9*\xCC!\xC2\x15"" ""2015-10-20 20:58:59.579393024+00:00"" *** Aborted at 1445374739 (unix time) try ""date -d @1445374739"" if you are using GNU date *** PC: @ 0x103144ddc testing::UnitTest::AddTestPartResult() *** SIGSEGV (@0x0) received by PID 49008 (TID 0x7fff73ca3300) stack trace: *** @ 0x7fff8c58af1a _sigtramp @ 0x7fff8386e187 malloc @ 0x1031445b7 testing::internal::AssertHelper::operator=() @ 0x1030d32e0 mesos::internal::tests::RegistryClientTest_SimpleGetBlob_Test::TestBody() @ 0x1030d3562 mesos::internal::tests::RegistryClientTest_SimpleGetBlob_Test::TestBody() @ 0x1031ac8f3 testing::internal::HandleSehExceptionsInMethodIfSupported<>() @ 0x103192f87 testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x1031533f5 testing::Test::Run() @ 0x10315493b testing::TestInfo::Run() @ 0x1031555f7 testing::TestCase::Run() @ 0x103163df3 testing::internal::UnitTestImpl::RunAllTests() @ 0x1031af8c3 testing::internal::HandleSehExceptionsInMethodIfSupported<>() @ 0x103195397 testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x1031639f2 testing::UnitTest::Run() @ 0x1025abd41 RUN_ALL_TESTS() @ 0x1025a8089 main @ 0x7fff86b155c9 start {code} {code:title=Less common failure} [ RUN ] RegistryClientTest.SimpleGetBlob ../../src/tests/containerizer/provisioner_docker_tests.cpp:926: Failure (socket).failure(): Failed accept: connection error: error:00000000:lib(0):func(0):reason(0) {code}",4 MESOS-3775,"MasterAllocatorTest.SlaveLost is slow.","The {{MasterAllocatorTest.SlaveLost}} takes more that {{5s}} to complete. A brief look into the code hints that the stopped agent does not quit immediately (and hence its resources are not released by the allocator) because [it waits for the executor to terminate|https://github.com/apache/mesos/blob/master/src/tests/master_allocator_tests.cpp#L717]. {{5s}} timeout comes from {{EXECUTOR_SHUTDOWN_GRACE_PERIOD}} agent constant. Possible solutions: * Do not wait until the stopped agent quits (can be flaky, needs deeper analysis). * Decrease the agent's {{executor_shutdown_grace_period}} flag. * Terminate the executor faster (this may require some refactoring since the executor driver is created in the {{TestContainerizer}} and we do not have direct access to it. ",1 MESOS-3781,"Replace Master/Slave Terminology Phase I - Rename flag names and deprecate old ones",NULL,3 MESOS-3785,"Use URI content modification time to trigger fetcher cache updates.","Instead of using checksums to trigger fetcher cache updates, we can for starters use the content modification time (mtime), which is available for a number of download protocols, e.g. HTTP and HDFS. Proposal: Instead of just fetching the content size, we fetch both size and mtime together. As before, if there is no size, then caching fails and we fall back on direct downloading to the sandbox. Assuming a size is given, we compare the mtime from the fetch URI with the mtime known to the cache. If it differs, we update the cache. (As a defensive measure, a difference in size should also trigger an update.) Not having an mtime available at the fetch URI is simply treated as a unique valid mtime value that differs from all others. This means that when initially there is no mtime, cache content remains valid until there is one. Thereafter, anew lack of an mtime invalidates the cache once. In other words: any change from no mtime to having one or back is the same as encountering a new mtime. Note that this scheme does not require any new protobuf fields. ",5 MESOS-3786,"Backticks are not mentioned in Mesos C++ Style Guide","As far as I can tell, current practice is to quote code excerpts and object names with backticks when writing comments. For example: {code} // You know, `sadPanda` seems extra sad lately. std::string sadPanda; sadPanda = "" :'( ""; {code} However, I don't see this documented in our C++ style guide at all. It should be added.",1 MESOS-3793,"Cannot start mesos local on a Debian GNU/Linux 8 docker machine","We updated the mesos version to 0.25.0 in our Marathon docker image, that runs our integration tests. We use mesos local for those tests. This fails with this message: {noformat} root@a06e4b4eb776:/marathon# mesos local I1022 18:42:26.852485 136 leveldb.cpp:176] Opened db in 6.103258ms I1022 18:42:26.853302 136 leveldb.cpp:183] Compacted db in 765740ns I1022 18:42:26.853343 136 leveldb.cpp:198] Created db iterator in 9001ns I1022 18:42:26.853355 136 leveldb.cpp:204] Seeked to beginning of db in 1287ns I1022 18:42:26.853366 136 leveldb.cpp:273] Iterated through 0 keys in the db in 1111ns I1022 18:42:26.853406 136 replica.cpp:744] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I1022 18:42:26.853775 141 recover.cpp:449] Starting replica recovery I1022 18:42:26.853862 141 recover.cpp:475] Replica is in EMPTY status I1022 18:42:26.854751 138 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request I1022 18:42:26.854856 140 recover.cpp:195] Received a recover response from a replica in EMPTY status I1022 18:42:26.855002 140 recover.cpp:566] Updating replica status to STARTING I1022 18:42:26.855655 138 master.cpp:376] Master a3f39818-1bda-4710-b96b-2a60ed4d12b8 (a06e4b4eb776) started on 172.17.0.14:5050 I1022 18:42:26.855680 138 master.cpp:378] Flags at startup: --allocation_interval=""1secs"" --allocator=""HierarchicalDRF"" --authenticate=""false"" --authenticate_slaves=""false"" --authenticators=""crammd5"" --authorizers=""local"" --framework_sorter=""drf"" --help=""false"" --hostname_lookup=""true"" --initialize_driver_logging=""true"" --log_auto_initialize=""true"" --logbufsecs=""0"" --logging_level=""INFO"" --max_slave_ping_timeouts=""5"" --quiet=""false"" --recovery_slave_removal_limit=""100%"" --registry=""replicated_log"" --registry_fetch_timeout=""1mins"" --registry_store_timeout=""5secs"" --registry_strict=""false"" --root_submissions=""true"" --slave_ping_timeout=""15secs"" --slave_reregister_timeout=""10mins"" --user_sorter=""drf"" --version=""false"" --webui_dir=""/usr/share/mesos/webui"" --work_dir=""/tmp/mesos/local/AK0XpG"" --zk_session_timeout=""10secs"" I1022 18:42:26.855790 138 master.cpp:425] Master allowing unauthenticated frameworks to register I1022 18:42:26.855803 138 master.cpp:430] Master allowing unauthenticated slaves to register I1022 18:42:26.855815 138 master.cpp:467] Using default 'crammd5' authenticator W1022 18:42:26.855829 138 authenticator.cpp:505] No credentials provided, authentication requests will be refused I1022 18:42:26.855840 138 authenticator.cpp:512] Initializing server SASL I1022 18:42:26.856442 136 containerizer.cpp:143] Using isolation: posix/cpu,posix/mem,filesystem/posix I1022 18:42:26.856943 140 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 1.888185ms I1022 18:42:26.856987 140 replica.cpp:323] Persisted replica status to STARTING I1022 18:42:26.857115 140 recover.cpp:475] Replica is in STARTING status I1022 18:42:26.857270 140 replica.cpp:641] Replica in STARTING status received a broadcasted recover request I1022 18:42:26.857312 140 recover.cpp:195] Received a recover response from a replica in STARTING status I1022 18:42:26.857368 140 recover.cpp:566] Updating replica status to VOTING I1022 18:42:26.857781 140 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 371121ns I1022 18:42:26.857841 140 replica.cpp:323] Persisted replica status to VOTING I1022 18:42:26.857895 140 recover.cpp:580] Successfully joined the Paxos group I1022 18:42:26.857928 140 recover.cpp:464] Recover process terminated I1022 18:42:26.862455 137 master.cpp:1603] The newly elected leader is master@172.17.0.14:5050 with id a3f39818-1bda-4710-b96b-2a60ed4d12b8 I1022 18:42:26.862498 137 master.cpp:1616] Elected as the leading master! I1022 18:42:26.862511 137 master.cpp:1376] Recovering from registrar I1022 18:42:26.862560 137 registrar.cpp:309] Recovering registrar Failed to create a containerizer: Could not create MesosContainerizer: Failed to create launcher: Failed to create Linux launcher: Failed to mount cgroups hierarchy at '/sys/fs/cgroup/freezer': 'freezer' is already attached to another hierarchy {noformat} The setup worked with mesos 0.24.0. The Dockerfile is here: https://github.com/mesosphere/marathon/blob/mv/mesos_0.25/Dockerfile {noformat} root@a06e4b4eb776:/marathon# ls /sys/fs/cgroup/ root@a06e4b4eb776:/marathon# {noformat} {noformat} root@a06e4b4eb776:/marathon# cat /proc/mounts none / aufs rw,relatime,si=6e7ac87f36042e03,dio,dirperm1 0 0 proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0 tmpfs /dev tmpfs rw,nosuid,mode=755 0 0 devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666 0 0 shm /dev/shm tmpfs rw,nosuid,nodev,noexec,relatime,size=65536k 0 0 mqueue /dev/mqueue mqueue rw,nosuid,nodev,noexec,relatime 0 0 sysfs /sys sysfs ro,nosuid,nodev,noexec,relatime 0 0 /dev/sda1 /etc/resolv.conf ext4 rw,relatime,data=ordered 0 0 /dev/sda1 /etc/hostname ext4 rw,relatime,data=ordered 0 0 /dev/sda1 /etc/hosts ext4 rw,relatime,data=ordered 0 0 devpts /dev/console devpts rw,relatime,mode=600,ptmxmode=000 0 0 proc /proc/bus proc ro,nosuid,nodev,noexec,relatime 0 0 proc /proc/fs proc ro,nosuid,nodev,noexec,relatime 0 0 proc /proc/irq proc ro,nosuid,nodev,noexec,relatime 0 0 proc /proc/sys proc ro,nosuid,nodev,noexec,relatime 0 0 proc /proc/sysrq-trigger proc ro,nosuid,nodev,noexec,relatime 0 0 tmpfs /proc/kcore tmpfs rw,nosuid,mode=755 0 0 tmpfs /proc/timer_stats tmpfs rw,nosuid,mode=755 0 0 {noformat} [~bernd-mesos] Can you please assign to the correct person?",3 MESOS-3794,"Master should not store arbitrarily sized data in ExecutorInfo","From a comment in [MESOS-3771]: Master should not be storing the {{data}} fields from {{ExecutorInfo}}. We currently [store the entire object|https://github.com/apache/mesos/blob/master/src/master/master.hpp#L262-L271], which means master would be at high risk of OOM-ing if a bunch of executors were started with big {{data}} blobs. * Master should scrub out unneeded bloat from {{ExecutorInfo}} before storing it. * We can use an alternate internal object, like we do for {{TaskInfo}} vs {{Task}}; see [this|https://github.com/apache/mesos/blob/master/src/messages/messages.proto#L39-L41].",3 MESOS-3800,"Containerizer attempts to create Linux launcher by default ","Mesos containerizer attempts to create a Linux launcher by default without verifying whether the necessary prerequisites (such as availability of cgroups) are met.",3 MESOS-3814,"Add checks to make sure isolators and the launcher are compatible.","There's a recent change regarding the picking of which launcher (Linux or Posix) to use https://reviews.apache.org/r/39604 In our environment, cgroups are not auto-mounted after reboot. We rely on Mesos itself to mount all relevant cgroups hierachies. After the reboot, the above patch detects that 'freezer' hierarchy is not mounted, therefore, decided to use the Posix launcher (if --launcher is not specified explictly). Port mapping isolator requires network namespace to be created for each container (thus requires Linux launcher). But we don't have a check to verify that launcher and isolators are compatible. Slave thus starts fine and task failed with weird error like: {noformat} Collect failed: Failed to create the ingress qdisc on mesos61099: Link 'mesos61099' is not found. {noformat} It does take us quite a few time to figure out the root cause.",2 MESOS-3819,"Add documentation explaining ""roles""","Docs currently talk about resources, static/dynamic reservations, but don't explain what a ""role"" concept is to begin with.",2 MESOS-3820,"Test-only libprocess reinitialization","*Background* Libprocess initialization includes the spawning of a variety of global processes and the creation of the server socket which listens for incoming requests. Some properties of the server socket are configured via environment variables, such as the IP and port or the SSL configuration. In the case of tests, libprocess is initialized once per test binary. This means that testing different configurations (SSL in particular) is cumbersome as a separate process would be needed for every test case. *Proposal* # Add some optional code between some tests like: {code} // Cleanup all of libprocess's state, as if we're starting anew. process::finalize(); // For tests that need to test SSL connections with the Master: openssl::reinitialize(); process::initialize(); {code} See [MESOS-3863] for more on {{process::finalize}}.",3 MESOS-3825,"Enable mesos-reviewbot project on jenkins to use SSL","Currently mesos-reviewbot project does not support parameterized configuration. This limits the project from building using --enable-ssl (and others) configuration arguments for building mesos. ",3 MESOS-3831,"Document operator HTTP endpoints","These are not exhaustively documented; they probably should be. Some endpoints have docs: e.g., {{/reserve}} and {{/unreserve}} are described in the reservation doc page. But it would be good to have a single page that lists all the endpoints and their semantics.",3 MESOS-3833,"/help endpoints do not work for nested paths","Mesos displays the list of all supported endpoints starting at a given path prefix using the {{/help}} suffix, e.g. {{master:5050/help}}. It seems that the {{help}} functionality is broken for URL's having nested paths e.g. {{master:5050/help/master/machine/down}}. The response returned is: {quote} Malformed URL, expecting '/help/id/name/' {quote}",2 MESOS-3837,"Rootfs in provisioner test doesn't handle symlink directories properly","Currently Rootfs doesn't fully copy the directory structure over, and also doesn't create the symlinks in the new rootfs and will cause shell and other binaries that rely on the symlinks to no longer function.",4 MESOS-3839,"Update documentation for FetcherCache mtime-related changes",NULL,1 MESOS-3847,"Root tests for LinuxFilesystemIsolatorTest are broken","The refactor in [MESOS-3762] ended up exposing some differences in the {{TemporaryDirectoryTest}} classes (one in Stout, one in Mesos-proper). The tests that broke (during tear down): {code} LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithRootFilesystem LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem LinuxFilesystemIsolatorTest.ROOT_MultipleContainers {code} As per an offline discussion between [~jvanremoortere] and [~jieyu], the solution is to merge the two {{TemporaryDirectoryTest}} classes and to fix the tear down of {{LinuxFilesystemIsolatorTest}}.",2 MESOS-3848,"Refactor Environment::mkdtemp into TemporaryDirectoryTest.","As part of [MESOS-3762], many tests were changed from one {{TemporaryDirectoryTest}} to another {{TemporaryDirectoryTest}}. One subtle difference is that the name of the temporary directory no longer contains the name of the test. In [MESOS-3847], the duplicate {{TemporaryDirectoryTest}} was removed. The original {{TemporaryDirectoryTest}} called [{{environment->mkdtemp}}|https://github.com/apache/mesos/blob/master/src/tests/environment.cpp#L494]. We would like the naming, which is valuable for debugging, to be available for a majority of tests. (A majority of tests inherit from {{TemporaryDirectoryTest}} in some way.) Note: * Any additional directories created via {{environment->mkdtemp}} are cleaned up after the test. * We don't want mesos-specific logic in Stout, like the {{umount}} shell command in {{Environment::TearDown}}. *Proposed change:* Move the temporary directory logic from {{Environment::mkdtemp}} to {{TemporaryDirectoryTest}}. *Tests that need to change* | {{log_tests.cpp}} | {{LogZooKeeperTest}} | We can change {{ZooKeeperTest}} to inherit from {{TemporaryDirectoryTest}} to get rid of code duplication | | {{tests/mesos.cpp}} | {{MesosTest::CreateSlaveFlags}} | {{MesosTest}} already inherits from {{TemporaryDirectoryTest}}. | | {{tests/script.hpp}} | {{TEST_SCRIPT}} | This is used for the {{ExampleTests}}. We can define a test class that inherits appropriately. | | {{docker_tests.cpp}} | {{*}} | Already inherits from {{MesosTest}}. |",3 MESOS-3849,"Corrected style in Makefiles","Order of files in Makefiles is not strictly alphabetic",1 MESOS-3851,"Investigate recent crashes in Command Executor","Post https://reviews.apache.org/r/38900 i.e. updating CommandExecutor to support rootfs. There seem to be some tests showing frequent crashes due to assert violations. {{FetcherCacheTest.SimpleEviction}} failed due to the following log: {code} I1107 19:36:46.360908 30657 slave.cpp:1793] Sending queued task '3' to executor ''3' of framework 7d94c7fb-8950-4bcf-80c1-46112292dcd6-0000 at executor(1)@172.17.5.200:33871' I1107 19:36:46.363682 1236 exec.cpp:297] I1107 19:36:46.373569 1245 exec.cpp:210] Executor registered on slave 7d94c7fb-8950-4bcf-80c1-46112292dcd6-S0 @ 0x7f9f5a7db3fa google::LogMessage::Fail() I1107 19:36:46.394081 1245 exec.cpp:222] Executor::registered took 395411ns @ 0x7f9f5a7db359 google::LogMessage::SendToLog() @ 0x7f9f5a7dad6a google::LogMessage::Flush() @ 0x7f9f5a7dda9e google::LogMessageFatal::~LogMessageFatal() @ 0x48d00a _CheckFatal::~_CheckFatal() @ 0x49c99d mesos::internal::CommandExecutorProcess::launchTask() @ 0x4b3dd7 _ZZN7process8dispatchIN5mesos8internal22CommandExecutorProcessEPNS1_14ExecutorDriverERKNS1_8TaskInfoES5_S6_EEvRKNS_3PIDIT_EEMSA_FvT0_T1_ET2_T3_ENKUlPNS_11ProcessBaseEE_clESL_ @ 0x4c470c _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal22CommandExecutorProcessEPNS5_14ExecutorDriverERKNS5_8TaskInfoES9_SA_EEvRKNS0_3PIDIT_EEMSE_FvT0_T1_ET2_T3_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ @ 0x7f9f5a761b1b std::function<>::operator()() @ 0x7f9f5a749935 process::ProcessBase::visit() @ 0x7f9f5a74d700 process::DispatchEvent::visit() @ 0x48e004 process::ProcessBase::serve() @ 0x7f9f5a745d21 process::ProcessManager::resume() @ 0x7f9f5a742f52 _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_ @ 0x7f9f5a74cf2c _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0EEEET_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE @ 0x7f9f5a74cedc _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_ @ 0x7f9f5a74ce6e _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE @ 0x7f9f5a74cdc5 _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv @ 0x7f9f5a74cd5e _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv @ 0x7f9f5624f1e0 (unknown) @ 0x7f9f564a8df5 start_thread @ 0x7f9f559b71ad __clone I1107 19:36:46.551370 30656 containerizer.cpp:1257] Executor for container '6553a617-6b4a-418d-9759-5681f45ff854' has exited I1107 19:36:46.551429 30656 containerizer.cpp:1074] Destroying container '6553a617-6b4a-418d-9759-5681f45ff854' I1107 19:36:46.553869 30656 containerizer.cpp:1257] Executor for container 'd2c1f924-c92a-453e-82b1-c294d09c4873' has exited {code} The reason seems to be a race between the executor receiving a {{RunTaskMessage}} before {{ExecutorRegisteredMessage}} leading to the {{CHECK_SOME(executorInfo)}} failure. Link to complete log: https://issues.apache.org/jira/browse/MESOS-2831?focusedCommentId=14995535&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14995535 Another related failure from {{ExamplesTest.PersistentVolumeFramework}} {code} @ 0x7f4f71529cbd google::LogMessage::SendToLog() I1107 13:15:09.949987 31573 slave.cpp:2337] Status update manager successfully handled status update acknowledgement (UUID: 721c7316-5580-4636-a83a-098e3bd4ed1f) for task ad90531f-d3d8-43f6-96f2-c81c4548a12d of framework ac4ea54a-7d19-4e41-9ee3-1a761f8e5b0f-0000 @ 0x7f4f715296ce google::LogMessage::Flush() @ 0x7f4f7152c402 google::LogMessageFatal::~LogMessageFatal() @ 0x48d00a _CheckFatal::~_CheckFatal() @ 0x49c99d mesos::internal::CommandExecutorProcess::launchTask() @ 0x4b3dd7 _ZZN7process8dispatchIN5mesos8internal22CommandExecutorProcessEPNS1_14ExecutorDriverERKNS1_8TaskInfoES5_S6_EEvRKNS_3PIDIT_EEMSA_FvT0_T1_ET2_T3_ENKUlPNS_11ProcessBaseEE_clESL_ @ 0x4c470c _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal22CommandExecutorProcessEPNS5_14ExecutorDriverERKNS5_8TaskInfoES9_SA_EEvRKNS0_3PIDIT_EEMSE_FvT0_T1_ET2_T3_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ @ 0x7f4f714b047f std::function<>::operator()() @ 0x7f4f71498299 process::ProcessBase::visit() @ 0x7f4f7149c064 process::DispatchEvent::visit() @ 0x48e004 process::ProcessBase::serve() @ 0x7f4f71494685 process::ProcessManager::resume() {code} Full logs at: https://builds.apache.org/job/Mesos/1191/COMPILER=gcc,CONFIGURATION=--verbose,OS=centos:7,label_exp=docker%7C%7CHadoop/consoleFull",2 MESOS-3854,"Finalize design for generalized Authorizer interface","Finalize the structure of ACLs and achieve consensus on the design doc proposed in MESOS-2949.",2 MESOS-3856,"Add mtime-related fetcher tests",NULL,2 MESOS-3857,"Draft Design Doc for first Step External Volume MVP","As part of the overall design doc for global resources we would like to introduce improvements for Docker Volume Driver isolator module (https://github.com/emccode/mesos-module-dvdi). Currently the isolator module is controlled by setting environment variables as follows: {code} ""env"": { ""DVDI_VOLUME_NAME"": ""testing"", ""DVDI_VOLUME_DRIVER"": ""platform1"", ""DVDI_VOLUME_OPTS"": ""size=5,iops=150,volumetype=io1,newfstype=ext4,overwritefs=false"", ""DVDI_VOLUME_NAME1"": ""testing2"", ""DVDI_VOLUME_DRIVER1"": ""platform2"", ""DVDI_VOLUME_OPTS1"": ""size=6,volumetype=gp2,newfstype=xfs,overwritefs=true"" } {code} We should develop a more structured way for passing these settings to the isolator module which is in line with the overall goal of global resources.",3 MESOS-3858,"Draft quota limits design document","In the design documents for Quota (https://docs.google.com/document/d/16iRNmziasEjVOblYp5bbkeBZ7pnjNlaIzPQqMTHQ-9I/edit#) the proposed MVP does not include quota limits. Quota limits represent an upper bound of resources that a role is allowed to use. The task of this ticket is to outline a design document on how to implement quota limits when the quota MVP is implemented.",5 MESOS-3859,"Add github support to apply-reviews.py.",NULL,3 MESOS-3861,"Authenticate quota requests","Quota requests need to be authenticated. This ticket will authenticate quota requests using credentials provided by the {{Authorization}} field of the HTTP request. This is similar to how authentication is implemented in {{Master::Http}}.",3 MESOS-3862,"Authorize set quota requests.","When quotas are requested they should authorize their roles. This ticket will authorize quota requests with ACLs. The existing authorization support that has been implemented in MESOS-1342 will be extended to add a `request_quotas` ACL.",5 MESOS-3863,"Investigate the requirements of programmatically re-initializing libprocess","This issue is for investigating what needs to be added/changed in {{process::finalize}} such that {{process::initialize}} will start on a clean slate. Additional issues will be created once done. Also see [the parent issue|MESOS-3820]. {{process::finalize}} should cover the following components: * {{__s__}} (the server socket) ** {{delete}} should be sufficient. This closes the socket and thereby prevents any further interaction from it. * {{process_manager}} ** Related prior work: [MESOS-3158] ** Cleans up the garbage collector, help, logging, profiler, statistics, route processes (including [this one|https://github.com/apache/mesos/blob/3bda55da1d0b580a1b7de43babfdc0d30fbc87ea/3rdparty/libprocess/src/process.cpp#L963], which currently leaks a pointer). ** Cleans up any other {{spawn}} 'd process. ** Manages the {{EventLoop}}. * {{Clock}} ** The goal here is to clear any timers so that nothing can deference {{process_manager}} while we're finalizing/finalized. It's probably not important to execute any remaining timers, since we're ""shutting down"" libprocess. This means: *** The clock should be {{paused}} and {{settled}} before the clean up of {{process_manager}}. *** Processes, which might interact with the {{Clock}}, should be cleaned up next. *** A new {{Clock::finalize}} method would then clear timers, process-specific clocks, and {{tick}} s; and then {{resume}} the clock. * {{__address__}} (the advertised IP and port) ** Needs to be cleared after {{process_manager}} has been cleaned up. Processes use this to communicate events. If cleared prematurely, {{TerminateEvents}} will not be sent correctly, leading to infinite waits. * {{socket_manager}} ** The idea here is to close all sockets and deallocate any existing {{HttpProxy}} or {{Encoder}} objects. ** All sockets are created via {{__s__}}, so cleaning up the server socket prior will prevent any new activity. * {{mime}} ** This is effectively a static map. ** It should be possible to statically initialize it. * Synchronization atomics {{initialized}} & {{initializing}}. ** Once cleanup is done, these should be reset. *Summary*: * Implement {{Clock::finalize}}. [MESOS-3882] * Implement {{~SocketManager}}. [MESOS-3910] * Make sure the {{MetricsProcess}} and {{ReaperProcess}} are reinitialized. [MESOS-3934] * (Optional) Clean up {{mime}}. * Wrap everything up in {{process::finalize}}.",2 MESOS-3864,"Simplify and/or document the libprocess initialization synchronization logic","Tracks this [TODO|https://github.com/apache/mesos/blob/3bda55da1d0b580a1b7de43babfdc0d30fbc87ea/3rdparty/libprocess/src/process.cpp#L749]. The [synchronization logic of libprocess|https://github.com/apache/mesos/commit/cd757cf75637c92c438bf4cd22f21ba1b5be702f#diff-128d3b56fc8c9ec0176fdbadcfd11fc2] [predates abstractions|https://github.com/apache/mesos/commit/6c3b107e4e02d5ba0673eb3145d71ec9d256a639#diff-0eebc8689450916990abe080d86c2acb] like {{process::Once}}, which is used in almost all other one-time initialization blocks. The logic should be documented. It can also be simplified (see the [review description|https://reviews.apache.org/r/39949/]). Or it can be replaced with {{process::Once}}.",1 MESOS-3867,"Make `Resource.DiskInfo.Persistence.principal` a required field","A `principal` field is being added to the `Resource.DiskInfo.Persistence` message to facilitate authorization of persistent volume creation/deletion. In the long-run it should be a required field, but it's being initially introduced as optional to avoid breaking existing frameworks. The field should be changed to required at the end of a deprecation cycle.",1 MESOS-3868,"Make apply-review.sh use apply-reviews.py",NULL,1 MESOS-3873,"Enhance allocator interface with the recovery() method","There are some scenarios (e.g. quota is set for some roles) when it makes sense to notify an allocator about the recovery. Introduce a method into the allocator interface that allows for this.",3 MESOS-3874,"Investigate recovery for the Hierarchical allocator","The built-in Hierarchical allocator should implement the recovery (in the presence of quota).",3 MESOS-3875,"Account dynamic reservations towards quota","Dynamic reservations—whether allocated or not—should be accounted towards role's quota. This requires update in at least two places: * The built-in allocator, which actually satisfies quota; * The sanity check in the master.",3 MESOS-3877,"Draft operator documentation for quota","Draft an operator guide for quota which describes basic usage of the endpoints and few basic and advanced usage cases.",5 MESOS-3879,"Incorrect and inconsistent include order for and .","We currently have an inconsistent (and mostly incorrect) include order for and (see below). Some files include them (incorrectly) between the c and cpp standard header, while other correclt include them afterwards. According to the [Google Styleguide| https://google.github.io/styleguide/cppguide.html#Names_and_Order_of_Includes] the second include order is correct. {code:title=external_containerizer_test.cpp} #include #include #include {code} {code:title=launcher.hpp} #include #include {code}",1 MESOS-3880,"Propose a guideline for log messages","We are rather inconsistent in the way we write log messages. It would be helpful to come up with a style and document various aspects of logs, including but not limited to: * Usage of backticks and/or single quotes to quote interpolated variables; * Usage of backticks and/or single quotes to quote types and other names; * Usage of tenses and other grammatical forms; * Proper way of nesting [error] messages;",5 MESOS-3881,"Implement `stout/os/pstree.hpp` on Windows",NULL,2 MESOS-3882,"Libprocess: Implement process::Clock::finalize","Tracks this [TODO|https://github.com/apache/mesos/blob/aa0cd7ed4edf1184cbc592b5caa2429a8373e813/3rdparty/libprocess/src/process.cpp#L974-L975]. The {{Clock}} is initialized with a callback that, among other things, will dereference the global {{process_manager}} object. When libprocess is shutting down, the {{process_manager}} is cleaned up. Between cleanup and termination of libprocess, there is some chance that a {{Timer}} will time out and result in dereferencing {{process_manager}}. *Proposal* * Implement {{Clock::finalize}}. This would clear: ** existing timers ** process-specific clocks ** ticks * Change {{process::finalize}}. *# Resume the clock. (The clock is only paused during some tests.) When the clock is not paused, the callback does not dereference {{process_manager}}. *# Clean up {{process_manager}}. This terminates all the processes that would potentially interact with {{Clock}}. *# Call {{Clock::finalize}}.",3 MESOS-3883,"Add support to apply-reviews.py to update SVN when necessary. ","{quote} That said, this can be automated as a step in apply-reviews script. For example, the script can check if something in site/ (or docs/ ?) is being committed and if yes, also do an svn update. @artem do you want to take this on as you revamp the apply-reviews script? On Tue, Nov 10, 2015 at 1:23 AM, Adam Bordelon wrote: > Since it's still a manual process, the website is usually only updated a) > when we have a new release to announce, or b) when some other blog-worthy > content arises (e.g. MesosCon). {quote} https://mail-archives.apache.org/mod_mbox/mesos-dev/201511.mbox/%3CCAAkWvAzqJQ9kmdpcAQ_F%2Bh1bNnzBrRkNQZXkwjWzTRiHUf66fg%40mail.gmail.com%3E",3 MESOS-3884,"Corrected style in hierarchical allocator","The built-in allocator code has some style issues (namespaces in the .cpp file, unfortunate formatting) which should be corrected for readability.",1 MESOS-3887,"Add a flag to master to enable optimistic offers. ",NULL,3 MESOS-3888,"Support distinguishing revocable resources in the Resource protobuf.","Add enum type into RevocableInfo: * Framework need to assign RevocableInfo when launching task; if it’s not assign, use reserved resources. Framework need to identify which resources it’s using * Oversubscription resources need to assign the type by Agent (MESOS-3930) * Update Oversubscription document that OO has over-subscribe the Allocation Slack and recommend QoS to handle the usage slack only. (MESOS-3889) {code} message Resource { ... message RevocableInfo { enum Type { // Under-utilized, allocated resources. Controlled by // oversubscription (QoSController & ResourceEstimator). USAGE_SLACK = 1; // Unallocated, reserved resources. // Controlled by optimistic offers (Allocator). ALLOCATION_SLACK = 2; } optional Type type = 1; } ... optional RevocableInfo revocable = 9; } {code} ",2 MESOS-3889,"Modify Oversubscription documentation to explicitly forbid the QoS Controller from killing executors running on optimistically offered resources.",NULL,2 MESOS-3890,"Add notion of evictable task to RunTaskMessage","{code} message RunTaskMessage { ... // This list can be non-empty when a task is launched on reserved // resources. If the reserved resources are in use (as revocable // resources), this list contains the executors that can be evicted // to make room to run this task. repeated ExecutorID evictable_executors = 5; ... } {code}",2 MESOS-3891,"Add a helper function to the Agent to check available resources before launching a task. ","Launching a task using revocable resources should be funnelled through an accounting system: * If a task is launched using revocable resources, the resources must not be in use when launching the task. If they are in use, then the task should fail to start. * If a task is launched using reserved resources, the resources must be made available. This means potentially evicting tasks which are using revocable resources. Both cases could be implemented by adding a check in Slave::runTask, like a new helper method: {noformat} class Slave { ... // Checks if the given resources are available (i.e. not utilized) // for starting a task. If not, the task should either fail to // start or result in the eviction of revocable resources. virtual process::Future checkAvailableResources( const Resources& resources); ... } {noformat}",5 MESOS-3892,"Add a helper function to the Agent to retrieve the list of executors that are using optimistically offered, revocable resources.","In the agent, add a helper function to get the list of the exeuctor using ALLOCATION_SLACK. It's short term solution which is different the design document, because master did not have executor for command line executor. Send evicatble executors from master to slave will addess in post-MVP after MESOS-1718. {noformat} class Slave { ... // If the executor used revocable resources, add it into `evictableExecutors` // list. void addEvictableExecutor(Executor* executor); // If the executor used revocable resources, remove it from // `evictableExecutors` list. void removeEvictableExecutor(Executor* executor); // Get evictable executor ID list by `request resources`. The return value is `Result>`: // - if `isError()`, there's not enough resources to launch tasks // - if `isNone()`, no evictable exectuors need to be terminated // - if !`isNone()`, the list of executors that need to be evicted for resources Result> getEvictableExecutors(const Resources& request); ... // The map of evictable executor list. If there's not enough resources, // the evictable executor will be terminated by slave to release resources. hashmap> evictableExecutors; ... } {noformat} ",5 MESOS-3893,"Implement tests for verifying allocator resource math.","Write a test to ensure that the allocator performs the reservation slack calculations correctly.",8 MESOS-3894,"Rebuild reservation slack allocator state during master failover.",NULL,13 MESOS-3895,"Update reservation slack allocator state during agent failover.",NULL,13 MESOS-3896,"Add accounting for reservation slack in the allocator.","MESOS-XXX: Optimsistic accounter {code} class HierarchicalAllocatorProcess { struct Slave { ... struct Optimistic { Resources total; // The total allocation slack resources Resources allocated; // The allocated allocation slack resources }; Optimistic optimistic; }; } {code} MESOS-4146: flatten & allocationSlack for Optimistic Offer {code} class Resources { // Returns a Resources object with the same amount of each resource // type as these Resources, but with all Resource objects marked as // the specified `RevocableInfo::Type`; the other attribute is not // affected. Resources flatten(Resource::RevocableInfo::Type type); // Return a Resources object that: // - if role is given, the resources did not include role's reserved // resources. // - the resources's revocable type is `ALLOCATION_SLACK` // - the role of resources is set to ""*"" Resources allocationSlack(Option role = None()); } {code} MESOS-XXX: Allocate the allocation_slack resources to framework {code} void HierarchicalAllocatorProcess::allocate( const hashset& slaveIds_) { foreach slave; foreach role; foreach framework { Resource optimistic; if (framework.revocable) { Resources total = slaves[slaveId].optimistic.total.allocationSlack(role); optimistic = total - slaves[slaveId].optimistic.allocated; } ... offerable[frameworkId][slaveId] += resources + optimistic; ... slaves[slaveId].optimistic.allocated += optimistic; } } {code} Here's some consideration about `ALLOCATION_SLACK`: 1. 'Old' resources (available/total) did not include ALLOCATION_SLACK 2. After `Quota`, `remainingClusterResources.contains` should not check ALLOCATION_SLACK; if there no enough resources, master can still offer ALLOCATION_SALCK resources. 3. In sorter, it'll not include ALLOCATION_SLACK; as those resources are borrowed from other role/framework 4. If either normal resources or ALLOCATION_SLACK resources are allocable/!filtered, it can be offered to framework 5. Currently, allocator will assign all ALLOCATION_SALCK resources in slave to one framework MESOS-XXX: Update ALLOCATION_SLACK for dynamic reservation (updateAllocation) {code} void HierarchicalAllocatorProcess::updateAllocation( const FrameworkID& frameworkId, const SlaveID& slaveId, const vector& operations) { ... Try updatedOptimistic = slaves[slaveId].optimistic.total.apply(operations); CHECK_SOME(updatedTotal); slaves[slaveId].optimistic.total = updatedOptimistic.get().stateless().reserved().flatten(ALLOCATION_SLACK); ... } {code} MESOS-XXX: Add ALLOCATION_SLACK when slaver register/re-register (addSlave) {code} void HierarchicalAllocatorProcess::addSlave( const SlaveID& slaveId, const SlaveInfo& slaveInfo, const Option& unavailability, const Resources& total, const hashmap& used) { ... slaves[slaveId].optimistic.total = total.stateless().reserved().flatten(ALLOCATION_SLACK); ... } {code} No need to handle `removeSlave`, it'll all related info from `slaves` including `optimistic`. MESOS-XXX: return resources to allocator (recoverResources) {code} void HierarchicalAllocatorProcess::recoverResources( const FrameworkID& frameworkId, const SlaveID& slaveId, const Resources& resources, const Option& filters) { if (slaves.contains(slaveId)) { ... slaves[slaveId].optimistic.allocated -= resources.allocationSlack(); ... } } {code}",13 MESOS-3897,"Identify and implement test cases for verifying eviction logic in the agent",NULL,13 MESOS-3898,"Identify and implement test cases for handling a race between optimistic lender and tenant offers.","An example is the when lender launches the task on an agent followed by a borrower launching a task on the same agent before the optimistic offer is rescinded. ",13 MESOS-3899,"Wrong syntax and inconsistent formatting of JSON examples in flag documentation","The JSON examples in the documentation of the commandline flags ({{mesos-master.sh --help}} and {{mesos-slave.sh --help}}) don't have a consistent formatting. Furthermore, some examples aren't even compliant JSON because they have trailing commas were they shouldn't.",1 MESOS-3900,"Enable mesos-reviewbot project on jenkins to use docker","As a first step to adding capability for building multiple configurations on reviewbot, we need to change the build scripts to use docker. ",3 MESOS-3903,"Add authorization for '/create-volume' and '/destroy-volume' HTTP endpoints","This is the fourth in a series of tickets that adds authorization support for persistent volumes. We need to add ACL authorization for the '/create-volume' and '/destroy-volume' HTTP endpoints. In other complementary work, authorization for frameworks performing {{CREATE}} and {{DESTROY}} operations is being added by MESOS-3065. This will consist of adding authorization calls into the HTTP endpoint code in {{src/master/http.cpp}}, as well as tests for both failed & successful calls to '/create-volumes' and '/destroy-volumes' with authorization. We also must ensure that the {{principal}} field of {{Resource.DiskInfo.Persistence}} is being populated correctly.",2 MESOS-3905,"Five new docker-related slave flags are not covered by the configuration documentation.","These flags were added to ""slave/flags.cpp"", but are not mentioned in ""docs/configuration.md"": add(&Flags::docker_auth_server, ""docker_auth_server"", ""Docker authentication server"", ""auth.docker.io""); add(&Flags::docker_auth_server_port, ""docker_auth_server_port"", ""Docker authentication server port"", ""443""); add(&Flags::docker_puller_timeout_secs, ""docker_puller_timeout"", ""Timeout value in seconds for pulling images from Docker registry"", ""60""); add(&Flags::docker_registry, ""docker_registry"", ""Default Docker image registry server host"", ""registry-1.docker.io""); add(&Flags::docker_registry_port, ""docker_registry_port"", ""Default Docker registry server port"", ""443""); ",1 MESOS-3909,"isolator module headers depend on picojson headers","When trying to build an isolator module, stout headers end up depending on {{picojson.hpp}} which is not installed. {code} In file included from /opt/mesos/include/mesos/module/isolator.hpp:25: In file included from /opt/mesos/include/mesos/slave/isolator.hpp:30: In file included from /opt/mesos/include/process/dispatch.hpp:22: In file included from /opt/mesos/include/process/process.hpp:26: In file included from /opt/mesos/include/process/event.hpp:21: In file included from /opt/mesos/include/process/http.hpp:39: /opt/mesos/include/stout/json.hpp:23:10: fatal error: 'picojson.h' file not found #include ^ 8 warnings and 1 error generated. {code}",3 MESOS-3910,"Libprocess: Implement cleanup of the SocketManager in process::finalize","The {{socket_manager}} and {{process_manager}} are intricately tied together. Currently, only the {{process_manager}} is cleaned up by {{process::finalize}}. To clean up the {{socket_manager}}, we must close all sockets and deallocate any existing {{HttpProxy}} or {{Encoder}} objects. And we should prevent further objects from being created/tracked by the {{socket_manager}}. *Proposal* # Clean up all processes other than {{gc}}. This will clear all links and delete all {{HttpProxy}} s while {{socket_manager}} still exists. # Close all sockets via {{SocketManager::close}}. All of {{socket_manager}} 's state is cleaned up via {{SocketManager::close}}, including termination of {{HttpProxy}} (termination is idempotent, meaning that killing {{HttpProxy}} s via {{process_manager}} is safe). # At this point, {{socket_manager}} should be empty and only the {{gc}} process should be running. (Since we're finalizing, assume there are no threads trying to spawn processes.) {{socket_manager}} can be deleted. # {{gc}} can be deleted. This is currently a leaked pointer, so we'll also need to track and delete that. # {{process_manager}} should be devoid of processes, so we can proceed with cleanup (join threads, stop the {{EventLoop}}, etc).",5 MESOS-3911,"Add a `--force` flag to disable sanity check in quota","There are use cases when an operator may want to disable the sanity check for quota endpoints (MESOS-3074), even if this renders the cluster under quota. For example, an operator sets quota before adding more agents in order to make sure that no non-quota allocations from new agents are made. ",1 MESOS-3912,"Rescind offers in order to satisfy quota","When a quota request comes in, we may need to rescind a certain amount of outstanding offers in order to satisfy it. Because resources are allocated in the allocator, there can be a race between rescinding and allocating. This race makes it hard to determine the exact amount of offers that should be rescinded in the master.",3 MESOS-3913,"Disallow empty string roles","Having an empty role (empty string) looks like a terrible idea, but we do not prohibit it. I think we should add corresponding checks and update the docs to officially disallow empty roles.",3 MESOS-3916,"MasterMaintenanceTest.InverseOffersFilters is flaky","Verbose Logs: {code} [ RUN ] MasterMaintenanceTest.InverseOffersFilters I1113 16:43:58.486469 8728 leveldb.cpp:176] Opened db in 2.360405ms I1113 16:43:58.486935 8728 leveldb.cpp:183] Compacted db in 407105ns I1113 16:43:58.486995 8728 leveldb.cpp:198] Created db iterator in 16221ns I1113 16:43:58.487030 8728 leveldb.cpp:204] Seeked to beginning of db in 10935ns I1113 16:43:58.487046 8728 leveldb.cpp:273] Iterated through 0 keys in the db in 999ns I1113 16:43:58.487090 8728 replica.cpp:780] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I1113 16:43:58.487735 8747 recover.cpp:449] Starting replica recovery I1113 16:43:58.488047 8747 recover.cpp:475] Replica is in EMPTY status I1113 16:43:58.488977 8745 replica.cpp:676] Replica in EMPTY status received a broadcasted recover request from (58)@10.0.2.15:45384 I1113 16:43:58.489452 8746 recover.cpp:195] Received a recover response from a replica in EMPTY status I1113 16:43:58.489712 8747 recover.cpp:566] Updating replica status to STARTING I1113 16:43:58.490706 8742 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 745443ns I1113 16:43:58.490739 8742 replica.cpp:323] Persisted replica status to STARTING I1113 16:43:58.490859 8742 recover.cpp:475] Replica is in STARTING status I1113 16:43:58.491786 8747 replica.cpp:676] Replica in STARTING status received a broadcasted recover request from (59)@10.0.2.15:45384 I1113 16:43:58.492542 8749 recover.cpp:195] Received a recover response from a replica in STARTING status I1113 16:43:58.493221 8743 recover.cpp:566] Updating replica status to VOTING I1113 16:43:58.493710 8743 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 331874ns I1113 16:43:58.493767 8743 replica.cpp:323] Persisted replica status to VOTING I1113 16:43:58.493868 8743 recover.cpp:580] Successfully joined the Paxos group I1113 16:43:58.494119 8743 recover.cpp:464] Recover process terminated I1113 16:43:58.504369 8749 master.cpp:367] Master d59449fc-5462-43c5-b935-e05563fdd4b6 (vagrant-ubuntu-wily-64) started on 10.0.2.15:45384 I1113 16:43:58.504438 8749 master.cpp:369] Flags at startup: --acls="""" --allocation_interval=""1secs"" --allocator=""HierarchicalDRF"" --authenticate=""false"" --authenticate_slaves=""true"" --authenticators=""crammd5"" --authorizers=""local"" --credentials=""/tmp/ZB7csS/credentials"" --framework_sorter=""drf"" --help=""false"" --hostname_lookup=""true"" --initialize_driver_logging=""true"" --log_auto_initialize=""true"" --logbufsecs=""0"" --logging_level=""INFO"" --max_slave_ping_timeouts=""5"" --quiet=""false"" --recovery_slave_removal_limit=""100%"" --registry=""replicated_log"" --registry_fetch_timeout=""1mins"" --registry_store_timeout=""25secs"" --registry_strict=""true"" --root_submissions=""true"" --slave_ping_timeout=""15secs"" --slave_reregister_timeout=""10mins"" --user_sorter=""drf"" --version=""false"" --webui_dir=""/usr/local/share/mesos/webui"" --work_dir=""/tmp/ZB7csS/master"" --zk_session_timeout=""10secs"" I1113 16:43:58.504717 8749 master.cpp:416] Master allowing unauthenticated frameworks to register I1113 16:43:58.504889 8749 master.cpp:419] Master only allowing authenticated slaves to register I1113 16:43:58.504922 8749 credentials.hpp:37] Loading credentials for authentication from '/tmp/ZB7csS/credentials' I1113 16:43:58.505497 8749 master.cpp:458] Using default 'crammd5' authenticator I1113 16:43:58.505759 8749 master.cpp:495] Authorization enabled I1113 16:43:58.507638 8746 master.cpp:1606] The newly elected leader is master@10.0.2.15:45384 with id d59449fc-5462-43c5-b935-e05563fdd4b6 I1113 16:43:58.507693 8746 master.cpp:1619] Elected as the leading master! I1113 16:43:58.507720 8746 master.cpp:1379] Recovering from registrar I1113 16:43:58.507946 8749 registrar.cpp:309] Recovering registrar I1113 16:43:58.508561 8749 log.cpp:661] Attempting to start the writer I1113 16:43:58.510282 8747 replica.cpp:496] Replica received implicit promise request from (60)@10.0.2.15:45384 with proposal 1 I1113 16:43:58.510867 8747 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 475696ns I1113 16:43:58.510946 8747 replica.cpp:345] Persisted promised to 1 I1113 16:43:58.511912 8745 coordinator.cpp:240] Coordinator attempting to fill missing positions I1113 16:43:58.513030 8749 replica.cpp:391] Replica received explicit promise request from (61)@10.0.2.15:45384 for position 0 with proposal 2 I1113 16:43:58.513819 8749 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 739171ns I1113 16:43:58.513867 8749 replica.cpp:715] Persisted action at 0 I1113 16:43:58.522002 8745 replica.cpp:540] Replica received write request for position 0 from (62)@10.0.2.15:45384 I1113 16:43:58.522114 8745 leveldb.cpp:438] Reading position from leveldb took 33549ns I1113 16:43:58.522599 8745 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 435729ns I1113 16:43:58.522652 8745 replica.cpp:715] Persisted action at 0 I1113 16:43:58.523291 8746 replica.cpp:694] Replica received learned notice for position 0 from @0.0.0.0:0 I1113 16:43:58.523901 8746 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 538894ns I1113 16:43:58.523983 8746 replica.cpp:715] Persisted action at 0 I1113 16:43:58.524060 8746 replica.cpp:700] Replica learned NOP action at position 0 I1113 16:43:58.524775 8747 log.cpp:677] Writer started with ending position 0 I1113 16:43:58.525902 8745 leveldb.cpp:438] Reading position from leveldb took 39685ns I1113 16:43:58.526852 8745 registrar.cpp:342] Successfully fetched the registry (0B) in 18.832896ms I1113 16:43:58.527084 8745 registrar.cpp:441] Applied 1 operations in 24930ns; attempting to update the 'registry' I1113 16:43:58.528020 8745 log.cpp:685] Attempting to append 189 bytes to the log I1113 16:43:58.528323 8748 coordinator.cpp:350] Coordinator attempting to write APPEND action at position 1 I1113 16:43:58.529465 8744 replica.cpp:540] Replica received write request for position 1 from (63)@10.0.2.15:45384 I1113 16:43:58.530081 8744 leveldb.cpp:343] Persisting action (208 bytes) to leveldb took 552812ns I1113 16:43:58.530128 8744 replica.cpp:715] Persisted action at 1 I1113 16:43:58.530781 8745 replica.cpp:694] Replica received learned notice for position 1 from @0.0.0.0:0 I1113 16:43:58.531121 8745 leveldb.cpp:343] Persisting action (210 bytes) to leveldb took 271774ns I1113 16:43:58.531162 8745 replica.cpp:715] Persisted action at 1 I1113 16:43:58.531188 8745 replica.cpp:700] Replica learned APPEND action at position 1 I1113 16:43:58.532064 8743 registrar.cpp:486] Successfully updated the 'registry' in 4.9152ms I1113 16:43:58.532402 8743 registrar.cpp:372] Successfully recovered registrar I1113 16:43:58.532768 8742 log.cpp:704] Attempting to truncate the log to 1 I1113 16:43:58.532891 8743 master.cpp:1416] Recovered 0 slaves from the Registry (150B) ; allowing 10mins for slaves to re-register I1113 16:43:58.532968 8742 coordinator.cpp:350] Coordinator attempting to write TRUNCATE action at position 2 I1113 16:43:58.534010 8742 replica.cpp:540] Replica received write request for position 2 from (64)@10.0.2.15:45384 I1113 16:43:58.534488 8742 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 420186ns I1113 16:43:58.534533 8742 replica.cpp:715] Persisted action at 2 I1113 16:43:58.535081 8748 replica.cpp:694] Replica received learned notice for position 2 from @0.0.0.0:0 I1113 16:43:58.535482 8748 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 360618ns I1113 16:43:58.535550 8748 leveldb.cpp:401] Deleting ~1 keys from leveldb took 23693ns I1113 16:43:58.535575 8748 replica.cpp:715] Persisted action at 2 I1113 16:43:58.535611 8748 replica.cpp:700] Replica learned TRUNCATE action at position 2 I1113 16:43:58.550834 8746 slave.cpp:191] Slave started on 5)@10.0.2.15:45384 I1113 16:43:58.550834 8746 slave.cpp:192] Flags at startup: --appc_store_dir=""/tmp/mesos/store/appc"" --authenticatee=""crammd5"" --cgroups_cpu_enable_pids_and_tids_count=""false"" --cgroups_enable_cfs=""false"" --cgroups_hierarchy=""/sys/fs/cgroup"" --cgroups_limit_swap=""false"" --cgroups_root=""mesos"" --container_disk_watch_interval=""15secs"" --containerizers=""mesos"" --credential=""/tmp/MasterMaintenanceTest_InverseOffersFilters_2zc09g/credential"" --default_role=""*"" --disk_watch_interval=""1mins"" --docker=""docker"" --docker_auth_server=""auth.docker.io"" --docker_auth_server_port=""443"" --docker_kill_orphans=""true"" --docker_local_archives_dir=""/tmp/mesos/images/docker"" --docker_puller=""local"" --docker_puller_timeout=""60"" --docker_registry=""registry-1.docker.io"" --docker_registry_port=""443"" --docker_remove_delay=""6hrs"" --docker_socket=""/var/run/docker.sock"" --docker_stop_timeout=""0ns"" --docker_store_dir=""/tmp/mesos/store/docker"" --enforce_container_disk_quota=""false"" --executor_registration_timeout=""1mins"" --executor_shutdown_grace_period=""5secs"" --fetcher_cache_dir=""/tmp/MasterMaintenanceTest_InverseOffersFilters_2zc09g/fetch"" --fetcher_cache_size=""2GB"" --frameworks_home="""" --gc_delay=""1weeks"" --gc_disk_headroom=""0.1"" --hadoop_home="""" --help=""false"" --hostname=""maintenance-host"" --hostname_lookup=""true"" --image_provisioner_backend=""copy"" --initialize_driver_logging=""true"" --isolation=""posix/cpu,posix/mem"" --launcher_dir=""/home/vagrant/build-mesos/src"" --logbufsecs=""0"" --logging_level=""INFO"" --oversubscribed_resources_interval=""15secs"" --perf_duration=""10secs"" --perf_interval=""1mins"" --qos_correction_interval_min=""0ns"" --quiet=""false"" --recover=""reconnect"" --recovery_timeout=""15mins"" --registration_backoff_factor=""10ms"" --resources=""cpus:2;mem:1024;disk:1024;ports:[31000-32000]"" --revocable_cpu_low_priority=""true"" --sandbox_directory=""/mnt/mesos/sandbox"" --strict=""true"" --switch_user=""true"" --systemd_runtime_directory=""/run/systemd/system"" --version=""false"" --work_dir=""/tmp/MasterMaintenanceTest_InverseOffersFilters_2zc09g"" I1113 16:43:58.551501 8746 credentials.hpp:85] Loading credential for authentication from '/tmp/MasterMaintenanceTest_InverseOffersFilters_2zc09g/credential' I1113 16:43:58.551703 8746 slave.cpp:322] Slave using credential for: test-principal I1113 16:43:58.552422 8746 slave.cpp:392] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I1113 16:43:58.552510 8746 slave.cpp:400] Slave attributes: [ ] I1113 16:43:58.552532 8746 slave.cpp:405] Slave hostname: maintenance-host I1113 16:43:58.552547 8746 slave.cpp:410] Slave checkpoint: true I1113 16:43:58.553520 8746 state.cpp:54] Recovering state from '/tmp/MasterMaintenanceTest_InverseOffersFilters_2zc09g/meta' I1113 16:43:58.553938 8746 status_update_manager.cpp:202] Recovering status update manager I1113 16:43:58.554251 8746 slave.cpp:4230] Finished recovery I1113 16:43:58.555016 8746 slave.cpp:729] New master detected at master@10.0.2.15:45384 I1113 16:43:58.555166 8746 slave.cpp:792] Authenticating with master master@10.0.2.15:45384 I1113 16:43:58.555207 8746 slave.cpp:797] Using default CRAM-MD5 authenticatee I1113 16:43:58.555589 8746 slave.cpp:765] Detecting new master I1113 16:43:58.555076 8749 status_update_manager.cpp:176] Pausing sending status updates I1113 16:43:58.555719 8742 authenticatee.cpp:123] Creating new client SASL connection I1113 16:43:58.560645 8744 master.cpp:5150] Authenticating slave(5)@10.0.2.15:45384 I1113 16:43:58.561305 8744 authenticator.cpp:100] Creating new server SASL connection I1113 16:43:58.566682 8744 authenticatee.cpp:214] Received SASL authentication mechanisms: CRAM-MD5 I1113 16:43:58.566779 8744 authenticatee.cpp:240] Attempting to authenticate with mechanism 'CRAM-MD5' I1113 16:43:58.566872 8744 authenticator.cpp:205] Received SASL authentication start I1113 16:43:58.566936 8744 authenticator.cpp:327] Authentication requires more steps I1113 16:43:58.567602 8744 authenticatee.cpp:260] Received SASL authentication step I1113 16:43:58.567775 8744 authenticator.cpp:233] Received SASL authentication step I1113 16:43:58.568128 8744 authenticator.cpp:319] Authentication success I1113 16:43:58.568282 8742 authenticatee.cpp:300] Authentication success I1113 16:43:58.568320 8749 master.cpp:5180] Successfully authenticated principal 'test-principal' at slave(5)@10.0.2.15:45384 I1113 16:43:58.568701 8742 slave.cpp:860] Successfully authenticated with master master@10.0.2.15:45384 I1113 16:43:58.569272 8747 master.cpp:3859] Registering slave at slave(5)@10.0.2.15:45384 (maintenance-host) with id d59449fc-5462-43c5-b935-e05563fdd4b6-S0 I1113 16:43:58.570096 8747 registrar.cpp:441] Applied 1 operations in 59195ns; attempting to update the 'registry' I1113 16:43:58.570772 8748 log.cpp:685] Attempting to append 362 bytes to the log I1113 16:43:58.570772 8749 coordinator.cpp:350] Coordinator attempting to write APPEND action at position 3 I1113 16:43:58.572155 8745 replica.cpp:540] Replica received write request for position 3 from (69)@10.0.2.15:45384 I1113 16:43:58.572801 8745 leveldb.cpp:343] Persisting action (381 bytes) to leveldb took 563073ns I1113 16:43:58.572854 8745 replica.cpp:715] Persisted action at 3 I1113 16:43:58.573707 8745 replica.cpp:694] Replica received learned notice for position 3 from @0.0.0.0:0 I1113 16:43:58.574255 8745 leveldb.cpp:343] Persisting action (383 bytes) to leveldb took 485234ns I1113 16:43:58.574311 8745 replica.cpp:715] Persisted action at 3 I1113 16:43:58.574342 8745 replica.cpp:700] Replica learned APPEND action at position 3 I1113 16:43:58.575857 8747 master.cpp:3847] Ignoring register slave message from slave(5)@10.0.2.15:45384 (maintenance-host) as admission is already in progress I1113 16:43:58.576217 8744 log.cpp:704] Attempting to truncate the log to 3 I1113 16:43:58.575887 8748 registrar.cpp:486] Successfully updated the 'registry' in 5.682176ms I1113 16:43:58.576400 8744 coordinator.cpp:350] Coordinator attempting to write TRUNCATE action at position 4 I1113 16:43:58.577169 8746 master.cpp:3927] Registered slave d59449fc-5462-43c5-b935-e05563fdd4b6-S0 at slave(5)@10.0.2.15:45384 (maintenance-host) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I1113 16:43:58.577287 8745 hierarchical.cpp:344] Added slave d59449fc-5462-43c5-b935-e05563fdd4b6-S0 (maintenance-host) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (allocated: ) I1113 16:43:58.577472 8744 slave.cpp:904] Registered with master master@10.0.2.15:45384; given slave ID d59449fc-5462-43c5-b935-e05563fdd4b6-S0 I1113 16:43:58.577999 8745 status_update_manager.cpp:183] Resuming sending status updates I1113 16:43:58.578279 8748 replica.cpp:540] Replica received write request for position 4 from (70)@10.0.2.15:45384 I1113 16:43:58.578346 8744 slave.cpp:963] Forwarding total oversubscribed resources I1113 16:43:58.578734 8744 master.cpp:4269] Received update of slave d59449fc-5462-43c5-b935-e05563fdd4b6-S0 at slave(5)@10.0.2.15:45384 (maintenance-host) with total oversubscribed resources I1113 16:43:58.578846 8748 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 304993ns I1113 16:43:58.578889 8748 replica.cpp:715] Persisted action at 4 I1113 16:43:58.578897 8744 hierarchical.cpp:400] Slave d59449fc-5462-43c5-b935-e05563fdd4b6-S0 (maintenance-host) updated with oversubscribed resources (total: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000], allocated: ) I1113 16:43:58.579463 8744 replica.cpp:694] Replica received learned notice for position 4 from @0.0.0.0:0 I1113 16:43:58.579888 8744 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 384596ns I1113 16:43:58.579952 8744 leveldb.cpp:401] Deleting ~2 keys from leveldb took 27011ns I1113 16:43:58.579977 8744 replica.cpp:715] Persisted action at 4 I1113 16:43:58.580001 8744 replica.cpp:700] Replica learned TRUNCATE action at position 4 I1113 16:43:58.584300 8743 slave.cpp:191] Slave started on 6)@10.0.2.15:45384 I1113 16:43:58.584398 8743 slave.cpp:192] Flags at startup: --appc_store_dir=""/tmp/mesos/store/appc"" --authenticatee=""crammd5"" --cgroups_cpu_enable_pids_and_tids_count=""false"" --cgroups_enable_cfs=""false"" --cgroups_hierarchy=""/sys/fs/cgroup"" --cgroups_limit_swap=""false"" --cgroups_root=""mesos"" --container_disk_watch_interval=""15secs"" --containerizers=""mesos"" --credential=""/tmp/MasterMaintenanceTest_InverseOffersFilters_CDFgvt/credential"" --default_role=""*"" --disk_watch_interval=""1mins"" --docker=""docker"" --docker_auth_server=""auth.docker.io"" --docker_auth_server_port=""443"" --docker_kill_orphans=""true"" --docker_local_archives_dir=""/tmp/mesos/images/docker"" --docker_puller=""local"" --docker_puller_timeout=""60"" --docker_registry=""registry-1.docker.io"" --docker_registry_port=""443"" --docker_remove_delay=""6hrs"" --docker_socket=""/var/run/docker.sock"" --docker_stop_timeout=""0ns"" --docker_store_dir=""/tmp/mesos/store/docker"" --enforce_container_disk_quota=""false"" --executor_registration_timeout=""1mins"" --executor_shutdown_grace_period=""5secs"" --fetcher_cache_dir=""/tmp/MasterMaintenanceTest_InverseOffersFilters_CDFgvt/fetch"" --fetcher_cache_size=""2GB"" --frameworks_home="""" --gc_delay=""1weeks"" --gc_disk_headroom=""0.1"" --hadoop_home="""" --help=""false"" --hostname=""maintenance-host-2"" --hostname_lookup=""true"" --image_provisioner_backend=""copy"" --initialize_driver_logging=""true"" --isolation=""posix/cpu,posix/mem"" --launcher_dir=""/home/vagrant/build-mesos/src"" --logbufsecs=""0"" --logging_level=""INFO"" --oversubscribed_resources_interval=""15secs"" --perf_duration=""10secs"" --perf_interval=""1mins"" --qos_correction_interval_min=""0ns"" --quiet=""false"" --recover=""reconnect"" --recovery_timeout=""15mins"" --registration_backoff_factor=""10ms"" --resources=""cpus:2;mem:1024;disk:1024;ports:[31000-32000]"" --revocable_cpu_low_priority=""true"" --sandbox_directory=""/mnt/mesos/sandbox"" --strict=""true"" --switch_user=""true"" --systemd_runtime_directory=""/run/systemd/system"" --version=""false"" --work_dir=""/tmp/MasterMaintenanceTest_InverseOffersFilters_CDFgvt"" I1113 16:43:58.584731 8743 credentials.hpp:85] Loading credential for authentication from '/tmp/MasterMaintenanceTest_InverseOffersFilters_CDFgvt/credential' I1113 16:43:58.584915 8743 slave.cpp:322] Slave using credential for: test-principal I1113 16:43:58.585309 8743 slave.cpp:392] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I1113 16:43:58.585482 8743 slave.cpp:400] Slave attributes: [ ] I1113 16:43:58.585566 8743 slave.cpp:405] Slave hostname: maintenance-host-2 I1113 16:43:58.585619 8743 slave.cpp:410] Slave checkpoint: true I1113 16:43:58.586431 8743 state.cpp:54] Recovering state from '/tmp/MasterMaintenanceTest_InverseOffersFilters_CDFgvt/meta' I1113 16:43:58.586890 8745 status_update_manager.cpp:202] Recovering status update manager I1113 16:43:58.587136 8745 slave.cpp:4230] Finished recovery I1113 16:43:58.587817 8745 slave.cpp:729] New master detected at master@10.0.2.15:45384 I1113 16:43:58.587836 8747 status_update_manager.cpp:176] Pausing sending status updates I1113 16:43:58.587908 8745 slave.cpp:792] Authenticating with master master@10.0.2.15:45384 I1113 16:43:58.587934 8745 slave.cpp:797] Using default CRAM-MD5 authenticatee I1113 16:43:58.588043 8745 slave.cpp:765] Detecting new master I1113 16:43:58.588170 8745 authenticatee.cpp:123] Creating new client SASL connection I1113 16:43:58.592891 8745 master.cpp:5150] Authenticating slave(6)@10.0.2.15:45384 I1113 16:43:58.594146 8745 authenticator.cpp:100] Creating new server SASL connection I1113 16:43:58.599606 8749 authenticatee.cpp:214] Received SASL authentication mechanisms: CRAM-MD5 I1113 16:43:58.599684 8749 authenticatee.cpp:240] Attempting to authenticate with mechanism 'CRAM-MD5' I1113 16:43:58.599774 8749 authenticator.cpp:205] Received SASL authentication start I1113 16:43:58.599830 8749 authenticator.cpp:327] Authentication requires more steps I1113 ...",3 MESOS-3923,"Implement AuthN handling in Master for the Scheduler endpoint","If authentication(AuthN) is enabled on a master, frameworks attempting to use the HTTP Scheduler API can't register. {code} $ cat /tmp/subscribe-943257503176798091.bin | http --print=HhBb --stream --pretty=colors --auth verification:password1 POST :5050/api/v1/scheduler Accept:application/x-protobuf Content-Type:application/x-protobuf POST /api/v1/scheduler HTTP/1.1 Connection: keep-alive Content-Type: application/x-protobuf Accept-Encoding: gzip, deflate Accept: application/x-protobuf Content-Length: 126 User-Agent: HTTPie/0.9.0 Host: localhost:5050 Authorization: Basic dmVyaWZpY2F0aW9uOnBhc3N3b3JkMQ== +-----------------------------------------+ | NOTE: binary data not shown in terminal | +-----------------------------------------+ HTTP/1.1 401 Unauthorized Date: Fri, 13 Nov 2015 20:00:45 GMT WWW-authenticate: Basic realm=""Mesos master"" Content-Length: 65 HTTP schedulers are not supported when authentication is required {code} Authorization(AuthZ) is already supported for HTTP based frameworks.",5 MESOS-3925,"Add HDFS based URI fetcher plugin.","This plugin uses HDFS client to fetch artifacts. It can support schemes like hdfs/hftp/s3/s3n It'll shell out the hadoop command to do the actual fetching.",3 MESOS-3926,"Modularize URI fetcher plugin interface. ","So that we can add custom URI fetcher plugins using modules.",3 MESOS-3928,"ROOT tests fail on Mesos 0.26 on Ubuntu/CentOS","Running {{0.26.0-rc1}} on both CentOS 7.1 and Ubuntu 14.04 with {{sudo}} privileges, causes segfaults when running Docker tests. Logs attached.",2 MESOS-3929,"Automate the process of landing commits for committers","This script should do the following things 1) Apply a chain of reviews to a local branch 2) Push the commits upstream 3) Mark the reviews as submitted 4) Optionally close any attached JIRA tickets ",3 MESOS-3934,"Libprocess: Unify the initialization of the MetricsProcess and ReaperProcess","Related to this [TODO|https://github.com/apache/mesos/blob/aa0cd7ed4edf1184cbc592b5caa2429a8373e813/3rdparty/libprocess/src/process.cpp#L949-L950]. The {{MetricsProcess}} and {{ReaperProcess}} are global processes (singletons) which are initialized upon first use. The two processes could be initialized alongside the {{gc}}, {{help}}, {{logging}}, {{profiler}}, and {{system}} (statistics) processes inside {{process::initialize}}. This is also necessary for libprocess re-initialization.",3 MESOS-3936,"Document possible task state transitions for framework authors","We should document the possible ways in which the state of a task can evolve over time; what happens when an agent is partitioned from the master; and more generally, how we recommend that framework authors develop fault-tolerant schedulers and do task state reconciliation.",5 MESOS-3937,"Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.","{noformat} ../configure make check sudo ./bin/mesos-tests.sh --gtest_filter=""DockerContainerizerTest.ROOT_DOCKER_Launch_Executor"" --verbose {noformat} {noformat} [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from DockerContainerizerTest I1117 15:08:09.265943 26380 leveldb.cpp:176] Opened db in 3.199666ms I1117 15:08:09.267761 26380 leveldb.cpp:183] Compacted db in 1.684873ms I1117 15:08:09.267902 26380 leveldb.cpp:198] Created db iterator in 58313ns I1117 15:08:09.267966 26380 leveldb.cpp:204] Seeked to beginning of db in 4927ns I1117 15:08:09.267997 26380 leveldb.cpp:273] Iterated through 0 keys in the db in 1605ns I1117 15:08:09.268156 26380 replica.cpp:780] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I1117 15:08:09.270148 26396 recover.cpp:449] Starting replica recovery I1117 15:08:09.272105 26396 recover.cpp:475] Replica is in EMPTY status I1117 15:08:09.275640 26396 replica.cpp:676] Replica in EMPTY status received a broadcasted recover request from (4)@10.0.2.15:50088 I1117 15:08:09.276578 26399 recover.cpp:195] Received a recover response from a replica in EMPTY status I1117 15:08:09.277600 26397 recover.cpp:566] Updating replica status to STARTING I1117 15:08:09.279613 26396 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 1.016098ms I1117 15:08:09.279731 26396 replica.cpp:323] Persisted replica status to STARTING I1117 15:08:09.280306 26399 recover.cpp:475] Replica is in STARTING status I1117 15:08:09.282181 26400 replica.cpp:676] Replica in STARTING status received a broadcasted recover request from (5)@10.0.2.15:50088 I1117 15:08:09.282552 26400 master.cpp:367] Master 59c600f1-92ff-4926-9c84-073d9b81f68a (vagrant-ubuntu-trusty-64) started on 10.0.2.15:50088 I1117 15:08:09.283021 26400 master.cpp:369] Flags at startup: --acls="""" --allocation_interval=""1secs"" --allocator=""HierarchicalDRF"" --authenticate=""true"" --authenticate_slaves=""true"" --authenticators=""crammd5"" --authorizers=""local"" --credentials=""/tmp/40AlT8/credentials"" --framework_sorter=""drf"" --help=""false"" --hostname_lookup=""true"" --initialize_driver_logging=""true"" --log_auto_initialize=""true"" --logbufsecs=""0"" --logging_level=""INFO"" --max_slave_ping_timeouts=""5"" --quiet=""false"" --recovery_slave_removal_limit=""100%"" --registry=""replicated_log"" --registry_fetch_timeout=""1mins"" --registry_store_timeout=""25secs"" --registry_strict=""true"" --root_submissions=""true"" --slave_ping_timeout=""15secs"" --slave_reregister_timeout=""10mins"" --user_sorter=""drf"" --version=""false"" --webui_dir=""/usr/local/share/mesos/webui"" --work_dir=""/tmp/40AlT8/master"" --zk_session_timeout=""10secs"" I1117 15:08:09.283920 26400 master.cpp:414] Master only allowing authenticated frameworks to register I1117 15:08:09.283972 26400 master.cpp:419] Master only allowing authenticated slaves to register I1117 15:08:09.284032 26400 credentials.hpp:37] Loading credentials for authentication from '/tmp/40AlT8/credentials' I1117 15:08:09.282944 26401 recover.cpp:195] Received a recover response from a replica in STARTING status I1117 15:08:09.284639 26401 recover.cpp:566] Updating replica status to VOTING I1117 15:08:09.285539 26400 master.cpp:458] Using default 'crammd5' authenticator I1117 15:08:09.285995 26401 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 1.075466ms I1117 15:08:09.286062 26401 replica.cpp:323] Persisted replica status to VOTING I1117 15:08:09.286200 26401 recover.cpp:580] Successfully joined the Paxos group I1117 15:08:09.286471 26401 recover.cpp:464] Recover process terminated I1117 15:08:09.287303 26400 authenticator.cpp:520] Initializing server SASL I1117 15:08:09.289371 26400 master.cpp:495] Authorization enabled I1117 15:08:09.296018 26399 master.cpp:1606] The newly elected leader is master@10.0.2.15:50088 with id 59c600f1-92ff-4926-9c84-073d9b81f68a I1117 15:08:09.296115 26399 master.cpp:1619] Elected as the leading master! I1117 15:08:09.296187 26399 master.cpp:1379] Recovering from registrar I1117 15:08:09.296717 26397 registrar.cpp:309] Recovering registrar I1117 15:08:09.298842 26396 log.cpp:661] Attempting to start the writer I1117 15:08:09.301563 26394 replica.cpp:496] Replica received implicit promise request from (6)@10.0.2.15:50088 with proposal 1 I1117 15:08:09.302561 26394 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 922719ns I1117 15:08:09.302635 26394 replica.cpp:345] Persisted promised to 1 I1117 15:08:09.303755 26394 coordinator.cpp:240] Coordinator attempting to fill missing positions I1117 15:08:09.306161 26394 replica.cpp:391] Replica received explicit promise request from (7)@10.0.2.15:50088 for position 0 with proposal 2 I1117 15:08:09.306972 26394 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 711749ns I1117 15:08:09.307034 26394 replica.cpp:715] Persisted action at 0 I1117 15:08:09.308732 26401 replica.cpp:540] Replica received write request for position 0 from (8)@10.0.2.15:50088 I1117 15:08:09.308830 26401 leveldb.cpp:438] Reading position from leveldb took 46444ns I1117 15:08:09.309710 26401 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 779098ns I1117 15:08:09.309754 26401 replica.cpp:715] Persisted action at 0 I1117 15:08:09.311007 26397 replica.cpp:694] Replica received learned notice for position 0 from @0.0.0.0:0 I1117 15:08:09.311652 26397 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 567289ns I1117 15:08:09.311731 26397 replica.cpp:715] Persisted action at 0 I1117 15:08:09.311771 26397 replica.cpp:700] Replica learned NOP action at position 0 I1117 15:08:09.313212 26397 log.cpp:677] Writer started with ending position 0 I1117 15:08:09.315682 26399 leveldb.cpp:438] Reading position from leveldb took 27974ns I1117 15:08:09.318694 26395 registrar.cpp:342] Successfully fetched the registry (0B) in 21.862144ms I1117 15:08:09.319007 26395 registrar.cpp:441] Applied 1 operations in 91867ns; attempting to update the 'registry' I1117 15:08:09.321730 26395 log.cpp:685] Attempting to append 193 bytes to the log I1117 15:08:09.321935 26397 coordinator.cpp:350] Coordinator attempting to write APPEND action at position 1 I1117 15:08:09.323103 26399 replica.cpp:540] Replica received write request for position 1 from (9)@10.0.2.15:50088 I1117 15:08:09.323917 26399 leveldb.cpp:343] Persisting action (212 bytes) to leveldb took 735223ns I1117 15:08:09.323983 26399 replica.cpp:715] Persisted action at 1 I1117 15:08:09.324975 26398 replica.cpp:694] Replica received learned notice for position 1 from @0.0.0.0:0 I1117 15:08:09.325695 26398 leveldb.cpp:343] Persisting action (214 bytes) to leveldb took 668268ns I1117 15:08:09.325741 26398 replica.cpp:715] Persisted action at 1 I1117 15:08:09.325778 26398 replica.cpp:700] Replica learned APPEND action at position 1 I1117 15:08:09.327258 26396 registrar.cpp:486] Successfully updated the 'registry' in 8.090112ms I1117 15:08:09.327525 26396 registrar.cpp:372] Successfully recovered registrar I1117 15:08:09.328083 26400 log.cpp:704] Attempting to truncate the log to 1 I1117 15:08:09.328251 26394 master.cpp:1416] Recovered 0 slaves from the Registry (154B) ; allowing 10mins for slaves to re-register I1117 15:08:09.328814 26396 coordinator.cpp:350] Coordinator attempting to write TRUNCATE action at position 2 I1117 15:08:09.330158 26401 replica.cpp:540] Replica received write request for position 2 from (10)@10.0.2.15:50088 I1117 15:08:09.330994 26401 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 760471ns I1117 15:08:09.331055 26401 replica.cpp:715] Persisted action at 2 I1117 15:08:09.331583 26401 replica.cpp:694] Replica received learned notice for position 2 from @0.0.0.0:0 I1117 15:08:09.332172 26401 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 497457ns I1117 15:08:09.332500 26401 leveldb.cpp:401] Deleting ~1 keys from leveldb took 49327ns I1117 15:08:09.332715 26401 replica.cpp:715] Persisted action at 2 I1117 15:08:09.332964 26401 replica.cpp:700] Replica learned TRUNCATE action at position 2 I1117 15:08:09.354073 26401 slave.cpp:191] Slave started on 1)@10.0.2.15:50088 I1117 15:08:09.354316 26401 slave.cpp:192] Flags at startup: --appc_store_dir=""/tmp/mesos/store/appc"" --authenticatee=""crammd5"" --cgroups_cpu_enable_pids_and_tids_count=""false"" --cgroups_enable_cfs=""false"" --cgroups_hierarchy=""/sys/fs/cgroup"" --cgroups_limit_swap=""false"" --cgroups_root=""mesos"" --container_disk_watch_interval=""15secs"" --containerizers=""mesos"" --credential=""/tmp/DockerContainerizerTest_ROOT_DOCKER_Launch_Executor_HaKhAQ/credential"" --default_role=""*"" --disk_watch_interval=""1mins"" --docker=""docker"" --docker_auth_server=""auth.docker.io"" --docker_auth_server_port=""443"" --docker_kill_orphans=""true"" --docker_local_archives_dir=""/tmp/mesos/images/docker"" --docker_puller=""local"" --docker_puller_timeout=""60"" --docker_registry=""registry-1.docker.io"" --docker_registry_port=""443"" --docker_remove_delay=""6hrs"" --docker_socket=""/var/run/docker.sock"" --docker_stop_timeout=""0ns"" --docker_store_dir=""/tmp/mesos/store/docker"" --enforce_container_disk_quota=""false"" --executor_registration_timeout=""1mins"" --executor_shutdown_grace_period=""5secs"" --fetcher_cache_dir=""/tmp/DockerContainerizerTest_ROOT_DOCKER_Launch_Executor_HaKhAQ/fetch"" --fetcher_cache_size=""2GB"" --frameworks_home="""" --gc_delay=""1weeks"" --gc_disk_headroom=""0.1"" --hadoop_home="""" --help=""false"" --hostname_lookup=""true"" --image_provisioner_backend=""copy"" --initialize_driver_logging=""true"" --isolation=""posix/cpu,posix/mem"" --launcher_dir=""/home/vagrant/mesos/build/src"" --logbufsecs=""0"" --logging_level=""INFO"" --oversubscribed_resources_interval=""15secs"" --perf_duration=""10secs"" --perf_interval=""1mins"" --qos_correction_interval_min=""0ns"" --quiet=""false"" --recover=""reconnect"" --recovery_timeout=""15mins"" --registration_backoff_factor=""10ms"" --resources=""cpus:2;mem:1024;disk:1024;ports:[31000-32000]"" --revocable_cpu_low_priority=""true"" --sandbox_directory=""/mnt/mesos/sandbox"" --strict=""true"" --switch_user=""true"" --systemd_runtime_directory=""/run/systemd/system"" --version=""false"" --work_dir=""/tmp/DockerContainerizerTest_ROOT_DOCKER_Launch_Executor_HaKhAQ"" I1117 15:08:09.355077 26401 credentials.hpp:85] Loading credential for authentication from '/tmp/DockerContainerizerTest_ROOT_DOCKER_Launch_Executor_HaKhAQ/credential' I1117 15:08:09.355587 26401 slave.cpp:322] Slave using credential for: test-principal I1117 15:08:09.357144 26401 slave.cpp:392] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I1117 15:08:09.357477 26401 slave.cpp:400] Slave attributes: [ ] I1117 15:08:09.357719 26401 slave.cpp:405] Slave hostname: vagrant-ubuntu-trusty-64 I1117 15:08:09.357936 26380 sched.cpp:166] Version: 0.26.0 I1117 15:08:09.358010 26401 slave.cpp:410] Slave checkpoint: true I1117 15:08:09.359058 26400 sched.cpp:264] New master detected at master@10.0.2.15:50088 I1117 15:08:09.359216 26400 sched.cpp:320] Authenticating with master master@10.0.2.15:50088 I1117 15:08:09.359277 26400 sched.cpp:327] Using default CRAM-MD5 authenticatee I1117 15:08:09.359856 26400 authenticatee.cpp:99] Initializing client SASL I1117 15:08:09.360539 26400 authenticatee.cpp:123] Creating new client SASL connection I1117 15:08:09.361399 26398 state.cpp:54] Recovering state from '/tmp/DockerContainerizerTest_ROOT_DOCKER_Launch_Executor_HaKhAQ/meta' I1117 15:08:09.361994 26398 status_update_manager.cpp:202] Recovering status update manager I1117 15:08:09.362191 26395 master.cpp:5150] Authenticating scheduler-38aa807a-672a-4e1e-b823-71f119980e86@10.0.2.15:50088 I1117 15:08:09.362565 26401 docker.cpp:536] Recovering Docker containers I1117 15:08:09.362908 26395 authenticator.cpp:100] Creating new server SASL connection I1117 15:08:09.363533 26401 slave.cpp:4230] Finished recovery I1117 15:08:09.363675 26394 authenticatee.cpp:214] Received SASL authentication mechanisms: CRAM-MD5 I1117 15:08:09.363950 26394 authenticatee.cpp:240] Attempting to authenticate with mechanism 'CRAM-MD5' I1117 15:08:09.364137 26394 authenticator.cpp:205] Received SASL authentication start I1117 15:08:09.364241 26394 authenticator.cpp:327] Authentication requires more steps I1117 15:08:09.364481 26394 authenticatee.cpp:260] Received SASL authentication step I1117 15:08:09.364667 26394 authenticator.cpp:233] Received SASL authentication step I1117 15:08:09.364828 26394 authenticator.cpp:319] Authentication success I1117 15:08:09.365039 26398 authenticatee.cpp:300] Authentication success I1117 15:08:09.365170 26398 master.cpp:5180] Successfully authenticated principal 'test-principal' at scheduler-38aa807a-672a-4e1e-b823-71f119980e86@10.0.2.15:50088 I1117 15:08:09.365656 26398 sched.cpp:409] Successfully authenticated with master master@10.0.2.15:50088 I1117 15:08:09.366044 26401 slave.cpp:729] New master detected at master@10.0.2.15:50088 I1117 15:08:09.366283 26398 master.cpp:2176] Received SUBSCRIBE call for framework 'default' at scheduler-38aa807a-672a-4e1e-b823-71f119980e86@10.0.2.15:50088 I1117 15:08:09.366317 26401 slave.cpp:792] Authenticating with master master@10.0.2.15:50088 I1117 15:08:09.366688 26401 slave.cpp:797] Using default CRAM-MD5 authenticatee I1117 15:08:09.366525 26395 status_update_manager.cpp:176] Pausing sending status updates I1117 15:08:09.366442 26398 master.cpp:1645] Authorizing framework principal 'test-principal' to receive offers for role '*' I1117 15:08:09.367207 26401 slave.cpp:765] Detecting new master I1117 15:08:09.367496 26395 master.cpp:2247] Subscribing framework default with checkpointing disabled and capabilities [ ] I1117 15:08:09.368417 26396 hierarchical.cpp:195] Added framework 59c600f1-92ff-4926-9c84-073d9b81f68a-0000 I1117 15:08:09.367250 26398 authenticatee.cpp:123] Creating new client SASL connection I1117 15:08:09.368506 26395 sched.cpp:643] Framework registered with 59c600f1-92ff-4926-9c84-073d9b81f68a-0000 I1117 15:08:09.369287 26398 master.cpp:5150] Authenticating slave(1)@10.0.2.15:50088 I1117 15:08:09.370213 26401 authenticator.cpp:100] Creating new server SASL connection I1117 15:08:09.370846 26396 authenticatee.cpp:214] Received SASL authentication mechanisms: CRAM-MD5 I1117 15:08:09.370964 26396 authenticatee.cpp:240] Attempting to authenticate with mechanism 'CRAM-MD5' I1117 15:08:09.371233 26396 authenticator.cpp:205] Received SASL authentication start I1117 15:08:09.371387 26396 authenticator.cpp:327] Authentication requires more steps I1117 15:08:09.371707 26398 authenticatee.cpp:260] Received SASL authentication step I1117 15:08:09.371835 26398 authenticator.cpp:233] Received SASL authentication step I1117 15:08:09.371944 26398 authenticator.cpp:319] Authentication success I1117 15:08:09.372195 26396 authenticatee.cpp:300] Authentication success I1117 15:08:09.372248 26398 master.cpp:5180] Successfully authenticated principal 'test-principal' at slave(1)@10.0.2.15:50088 I1117 15:08:09.373002 26396 slave.cpp:860] Successfully authenticated with master master@10.0.2.15:50088 I1117 15:08:09.373566 26398 master.cpp:3859] Registering slave at slave(1)@10.0.2.15:50088 (vagrant-ubuntu-trusty-64) with id 59c600f1-92ff-4926-9c84-073d9b81f68a-S0 I1117 15:08:09.374301 26401 registrar.cpp:441] Applied 1 operations in 65094ns; attempting to update the 'registry' I1117 15:08:09.376809 26400 log.cpp:685] Attempting to append 374 bytes to the log I1117 15:08:09.376994 26399 coordinator.cpp:350] Coordinator attempting to write APPEND action at position 3 I1117 15:08:09.377960 26397 replica.cpp:540] Replica received write request for position 3 from (16)@10.0.2.15:50088 I1117 15:08:09.378844 26397 leveldb.cpp:343] Persisting action (393 bytes) to leveldb took 805302ns I1117 15:08:09.378904 26397 replica.cpp:715] Persisted action at 3 I1117 15:08:09.379823 26400 replica.cpp:694] Replica received learned notice for position 3 from @0.0.0.0:0 I1117 15:08:09.380592 26400 leveldb.cpp:343] Persisting action (395 bytes) to leveldb took 691729ns I1117 15:08:09.380666 26400 replica.cpp:715] Persisted action at 3 I1117 15:08:09.380702 26400 replica.cpp:700] Replica learned APPEND action at position 3 I1117 15:08:09.382014 26398 registrar.cpp:486] Successfully updated the 'registry' in 7.384064ms I1117 15:08:09.382184 26400 log.cpp:704] Attempting to truncate the log to 3 I1117 15:08:09.382380 26398 coordinator.cpp:350] Coordinator attempting to write TRUNCATE action at position 4 I1117 15:08:09.383361 26399 master.cpp:3927] Registered slave 59c600f1-92ff-4926-9c84-073d9b81f68a-S0 at slave(1)@10.0.2.15:50088 (vagrant-ubuntu-trusty-64) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I1117 15:08:09.383437 26396 slave.cpp:904] Registered with master master@10.0.2.15:50088; given slave ID 59c600f1-92ff-4926-9c84-073d9b81f68a-S0 I1117 15:08:09.383741 26400 status_update_manager.cpp:183] Resuming sending status updates I1117 15:08:09.384004 26401 hierarchical.cpp:344] Added slave 59c600f1-92ff-4926-9c84-073d9b81f68a-S0 (vagrant-ubuntu-trusty-64) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (allocated: ) I1117 15:08:09.384101 26396 slave.cpp:963] Forwarding total oversubscribed resources I1117 15:08:09.384831 26396 master.cpp:4269] Received update of slave 59c600f1-92ff-4926-9c84-073d9b81f68a-S0 at slave(1)@10.0.2.15:50088 (vagrant-ubuntu-trusty-64) with total oversubscribed resources I1117 15:08:09.384466 26398 replica.cpp:540] Replica received write request for position 4 from (17)@10.0.2.15:50088 I1117 15:08:09.385957 26397 master.cpp:4979] Sending 1 offers to framework 59c600f1-92ff-4926-9c84-073d9b81f68a-0000 (default) at scheduler-38aa807a-672a-4e1e-b823-71f119980e86@10.0.2.15:50088 I1117 15:08:09.386066 26401 hierarchical.cpp:400] Slave 59c600f1-92ff-4926-9c84-073d9b81f68a-S0 (vagrant-ubuntu-trusty-64) updated with oversubscribed resources (total: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000], allocated: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]) I1117 15:08:09.386219 26398 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 605641ns I1117 15:08:09.386445 26398 replica.cpp:715] Persisted action at 4 I1117 15:08:09.388450 26397 replica.cpp:694] Replica received learned notice for position 4 from @0.0.0.0:0 I1117 15:08:09.389235 26397 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 715846ns I1117 15:08:09.389345 26397 leveldb.cpp:401] Deleting ~2 keys from leveldb took 40455ns I1117 15:08:09.389402 26397 replica.cpp:715] Persisted action at 4 I1117 15:08:09.389464 26397 replica.cpp:700] Replica learned TRUNCATE action at position 4 I1117 15:08:09.390585 26394 master.cpp:2915] Processing ACCEPT call for offers: [ 59c600f1-92ff-4926-9c84-073d9b81f68a-O0 ] on slave 59c600f1-92ff-4926-9c84-073d9b81f68a-S0 at slave(1)@10.0.2.15:50088 (vagrant-ubuntu-trusty-64) for framework 59c600f1-92ff-4926-9c84-073d9b81f68a-0000 (default) at scheduler-38aa807a-672a-4e1e-b823-71f119980e86@10.0.2.15:50088 I1117 15:08:09.390805 26394 master.cpp:2711] Authorizing framework principal 'test-principal' to launch task 1 as user 'root' W1117 15:08:09.393517 26396 validation.cpp:422] Executor e1 for task 1 uses less CPUs (None) than the minimum required (0.01). Please update your executor, as this will be mandatory in future releases. W1117 15:08:09.393632 26396 validation.cpp:434] Executor e1 for task 1 uses less memory (None) than the minimum required (32MB). Please update your executor, as this will be mandatory in future releases. I1117 15:08:09.394270 26396 master.hpp:176] Adding task 1 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 59c600f1-92ff-4926-9c84-073d9b81f68a-S0 (vagrant-ubuntu-trusty-64) I1117 15:08:09.394580 26396 master.cpp:3245] Launching task 1 of framework 59c600f1-92ff-4926-9c84-0...",2 MESOS-3938,"Allow setting quotas for the default '*' role","Investigate use cases and implications of the possibility to set quota for the '*' role. For example, having quota for '*' set can effectively reduce the scope of the quota capacity heuristic.",3 MESOS-3939,"ubsan error in net::IP::create(sockaddr const&): misaligned address","Running ubsan from GCC 5.2 on the current Mesos unit tests yields this, among other problems: {noformat} /mesos/3rdparty/libprocess/3rdparty/stout/include/stout/ip.hpp:230:56: runtime error: reference binding to misaligned address 0x00000199629c for type 'const struct sockaddr_storage', which requires 8 byte alignment 0x00000199629c: note: pointer points here 00 00 00 00 02 00 00 00 ff ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ^ #0 0x5950cb in net::IP::create(sockaddr const&) (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x5950cb) #1 0x5970cd in net::IPNetwork::fromLinkDevice(std::__cxx11::basic_string, std::allocator > const&, int) (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x5970cd) #2 0x58e006 in NetTest_LinkDevice_Test::TestBody() (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x58e006) #3 0x85abd5 in void testing::internal::HandleSehExceptionsInMethodIfSupported(testing::Test*, void (testing::Test::*)(), char const*) (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x85abd5) #4 0x848abc in void testing::internal::HandleExceptionsInMethodIfSupported(testing::Test*, void (testing::Test::*)(), char const*) (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x848abc) #5 0x7e2755 in testing::Test::Run() (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x7e2755) #6 0x7e44a0 in testing::TestInfo::Run() (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x7e44a0) #7 0x7e5ffa in testing::TestCase::Run() (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x7e5ffa) #8 0x7ffe21 in testing::internal::UnitTestImpl::RunAllTests() (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x7ffe21) #9 0x85d7a5 in bool testing::internal::HandleSehExceptionsInMethodIfSupported(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x85d7a5) #10 0x84b37a in bool testing::internal::HandleExceptionsInMethodIfSupported(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x84b37a) #11 0x7f8a4a in testing::UnitTest::Run() (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x7f8a4a) #12 0x608a96 in RUN_ALL_TESTS() (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x608a96) #13 0x60896b in main (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x60896b) #14 0x7fd0f0c7fa3f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x20a3f) #15 0x4145c8 in _start (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x4145c8) {noformat}",2 MESOS-3940,"/reserve and /unreserve should be permissive under a master without authentication.","Currently, the {{/reserve}} and {{/unreserve}} endpoints do not work without authentication enabled on the master. When authentication is disabled on the master, these endpoints should just be permissive.",1 MESOS-3943,"Support dynamic weight in allocator","This JIRA will focus on update the allocator API to support weight update of a role.",5 MESOS-3945,"Add operator documentation for /weight endpoint","This JIRA ticket will update the related doc to apply to dynamic weights, and add an new operator guide for dynamic weights which describes basic usage of the /weights endpoint.",2 MESOS-3949,"User CGroup Isolation tests fail on Centos 6.","UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup and UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup fail on CentOS 6.6 with similar output when libevent and SSL are enabled. {noformat} sudo ./bin/mesos-tests.sh --gtest_filter=""UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup"" --verbose {noformat} {noformat} [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from UserCgroupIsolatorTest/0, where TypeParam = mesos::internal::slave::CgroupsMemIsolatorProcess userdel: user 'mesos.test.unprivileged.user' does not exist [ RUN ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup I1118 16:53:35.273717 30249 mem.cpp:605] Started listening for OOM events for container 867a829e-4a26-43f5-86e0-938bf1f47688 I1118 16:53:35.274538 30249 mem.cpp:725] Started listening on low memory pressure events for container 867a829e-4a26-43f5-86e0-938bf1f47688 I1118 16:53:35.275164 30249 mem.cpp:725] Started listening on medium memory pressure events for container 867a829e-4a26-43f5-86e0-938bf1f47688 I1118 16:53:35.275784 30249 mem.cpp:725] Started listening on critical memory pressure events for container 867a829e-4a26-43f5-86e0-938bf1f47688 I1118 16:53:35.276448 30249 mem.cpp:356] Updated 'memory.soft_limit_in_bytes' to 1GB for container 867a829e-4a26-43f5-86e0-938bf1f47688 I1118 16:53:35.277331 30249 mem.cpp:391] Updated 'memory.limit_in_bytes' to 1GB for container 867a829e-4a26-43f5-86e0-938bf1f47688 -bash: /sys/fs/cgroup/memory/mesos/867a829e-4a26-43f5-86e0-938bf1f47688/cgroup.procs: No such file or directory mkdir: cannot create directory `/sys/fs/cgroup/memory/mesos/867a829e-4a26-43f5-86e0-938bf1f47688/user': No such file or directory ../../src/tests/containerizer/isolator_tests.cpp:1307: Failure Value of: os::system( ""su - "" + UNPRIVILEGED_USERNAME + "" -c 'mkdir "" + path::join(flags.cgroups_hierarchy, userCgroup) + ""'"") Actual: 256 Expected: 0 -bash: /sys/fs/cgroup/memory/mesos/867a829e-4a26-43f5-86e0-938bf1f47688/user/cgroup.procs: No such file or directory ../../src/tests/containerizer/isolator_tests.cpp:1316: Failure Value of: os::system( ""su - "" + UNPRIVILEGED_USERNAME + "" -c 'echo $$ >"" + path::join(flags.cgroups_hierarchy, userCgroup, ""cgroup.procs"") + ""'"") Actual: 256 Expected: 0 [ FAILED ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup, where TypeParam = mesos::internal::slave::CgroupsMemIsolatorProcess (149 ms) {noformat} {noformat} sudo ./bin/mesos-tests.sh --gtest_filter=""UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup"" --verbose {noformat} {noformat} [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from UserCgroupIsolatorTest/1, where TypeParam = mesos::internal::slave::CgroupsCpushareIsolatorProcess userdel: user 'mesos.test.unprivileged.user' does not exist [ RUN ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup I1118 17:01:00.550706 30357 cpushare.cpp:392] Updated 'cpu.shares' to 1024 (cpus 1) for container e57f4343-1a97-4b44-b347-803be47ace80 -bash: /sys/fs/cgroup/cpuacct/mesos/e57f4343-1a97-4b44-b347-803be47ace80/cgroup.procs: No such file or directory mkdir: cannot create directory `/sys/fs/cgroup/cpuacct/mesos/e57f4343-1a97-4b44-b347-803be47ace80/user': No such file or directory ../../src/tests/containerizer/isolator_tests.cpp:1307: Failure Value of: os::system( ""su - "" + UNPRIVILEGED_USERNAME + "" -c 'mkdir "" + path::join(flags.cgroups_hierarchy, userCgroup) + ""'"") Actual: 256 Expected: 0 -bash: /sys/fs/cgroup/cpuacct/mesos/e57f4343-1a97-4b44-b347-803be47ace80/user/cgroup.procs: No such file or directory ../../src/tests/containerizer/isolator_tests.cpp:1316: Failure Value of: os::system( ""su - "" + UNPRIVILEGED_USERNAME + "" -c 'echo $$ >"" + path::join(flags.cgroups_hierarchy, userCgroup, ""cgroup.procs"") + ""'"") Actual: 256 Expected: 0 -bash: /sys/fs/cgroup/cpu/mesos/e57f4343-1a97-4b44-b347-803be47ace80/cgroup.procs: No such file or directory mkdir: cannot create directory `/sys/fs/cgroup/cpu/mesos/e57f4343-1a97-4b44-b347-803be47ace80/user': No such file or directory ../../src/tests/containerizer/isolator_tests.cpp:1307: Failure Value of: os::system( ""su - "" + UNPRIVILEGED_USERNAME + "" -c 'mkdir "" + path::join(flags.cgroups_hierarchy, userCgroup) + ""'"") Actual: 256 Expected: 0 -bash: /sys/fs/cgroup/cpu/mesos/e57f4343-1a97-4b44-b347-803be47ace80/user/cgroup.procs: No such file or directory ../../src/tests/containerizer/isolator_tests.cpp:1316: Failure Value of: os::system( ""su - "" + UNPRIVILEGED_USERNAME + "" -c 'echo $$ >"" + path::join(flags.cgroups_hierarchy, userCgroup, ""cgroup.procs"") + ""'"") Actual: 256 Expected: 0 [ FAILED ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup, where TypeParam = mesos::internal::slave::CgroupsCpushareIsolatorProcess (116 ms) {noformat}",3 MESOS-3951,"Make HDFS tool wrappers asynchronous.","The existing HDFS tool wrappers (src/hdfs/hdfs.hpp) are synchronous. They use os::shell to shell out the 'hadoop' commands. This makes it very hard to be reused at other locations in the code base. The URI fetcher HDFS plugin will try to re-use the existing HDFS tool wrappers. In order to do that, we need to make it asynchronous first.",5 MESOS-3960,"Standardize quota endpoints","To be consistent with other operator endpoints, require a single JSON object in the request as opposed to key-value pairs encoded in a string.",3 MESOS-3964,"LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs and LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs_Big_Quota fail on Debian 8.","sudo ./bin/mesos-test.sh --gtest_filter=""LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs"" {noformat} ... F1119 14:34:52.514742 30706 isolator_tests.cpp:455] CHECK_SOME(isolator): Failed to find 'cpu.cfs_quota_us'. Your kernel might be too old to use the CFS cgroups feature. {noformat} ",2 MESOS-3965,"Ensure resources in `QuotaInfo` protobuf do not contain `role`","{{QuotaInfo}} protobuf currently stores per-role quotas, including {{Resource}} objects. These resources are neither statically nor dynamically reserved, hence they may not contain {{role}} field. We should ensure this field is unset, as well as update validation routine for {{QuotaInfo}}",3 MESOS-3967,"Add integration tests for quota","These tests should verify whether quota implements declared functionality. This will require the whole pipeline: master harness code and an allocator implementation (in contrast to to isolated master and allocator tests).",8 MESOS-3969,"Failing 'make distcheck' on Debian 8, somehow SSL-related.","As non-root: make distcheck. {noformat} /bin/mkdir -p '/home/vagrant/mesos/build/mesos-0.26.0/_inst/bin' /bin/bash ../libtool --mode=install /usr/bin/install -c mesos-local mesos-log mesos mesos-execute mesos-resolve '/home/vagrant/mesos/build/mesos-0.26.0/_inst/bin' libtool: install: /usr/bin/install -c .libs/mesos-local /home/vagrant/mesos/build/mesos-0.26.0/_inst/bin/mesos-local libtool: install: /usr/bin/install -c .libs/mesos-log /home/vagrant/mesos/build/mesos-0.26.0/_inst/bin/mesos-log libtool: install: /usr/bin/install -c .libs/mesos /home/vagrant/mesos/build/mesos-0.26.0/_inst/bin/mesos libtool: install: /usr/bin/install -c .libs/mesos-execute /home/vagrant/mesos/build/mesos-0.26.0/_inst/bin/mesos-execute libtool: install: /usr/bin/install -c .libs/mesos-resolve /home/vagrant/mesos/build/mesos-0.26.0/_inst/bin/mesos-resolve Traceback (most recent call last): File """", line 1, in File ""/home/vagrant/mesos/build/mesos-0.26.0/build/3rdparty/pip-1.5.6/pip/__init_.py"", line 11, in from pip.vcs import git, mercurial, subversion, bazaar # noqa File ""/home/vagrant/mesos/build/mesos-0.26.0/_build/3rdparty/pip-1.5.6/pip/vcs/mercurial.py"", line 9, in from pip.download import path_to_url File ""/home/vagrant/mesos/build/mesos-0.26.0/_build/3rdparty/pip-1.5.6/pip/download.py"", line 22, in from pip._vendor import requests, six File ""/home/vagrant/mesos/build/mesos-0.26.0/build/3rdparty/pip-1.5.6/pip/_vendor/requests/__init_.py"", line 53, in from .packages.urllib3.contrib import pyopenssl File ""/home/vagrant/mesos/build/mesos-0.26.0/_build/3rdparty/pip-1.5.6/pip/_vendor/requests/packages/urllib3/contrib/pyopenssl.py"", line 70, in ssl.PROTOCOL_SSLv3: OpenSSL.SSL.SSLv3_METHOD, AttributeError: 'module' object has no attribute 'PROTOCOL_SSLv3' Traceback (most recent call last): File """", line 1, in File ""/home/vagrant/mesos/build/mesos-0.26.0/_build/3rd {noformat} ",3 MESOS-3973,"Failing 'make distcheck' on Mac OS X 10.10.5, also 10.11.","Non-root 'make distcheck. {noformat} ... [----------] Global test environment tear-down [==========] 826 tests from 113 test cases ran. (276624 ms total) [ PASSED ] 826 tests. YOU HAVE 6 DISABLED TESTS Making install in . make[3]: Nothing to be done for `install-exec-am'. ../install-sh -c -d '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/lib/pkgconfig' /usr/bin/install -c -m 644 mesos.pc '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/lib/pkgconfig' Making install in 3rdparty /Applications/Xcode.app/Contents/Developer/usr/bin/make install-recursive Making install in libprocess Making install in 3rdparty /Applications/Xcode.app/Contents/Developer/usr/bin/make install-recursive Making install in stout Making install in . make[9]: Nothing to be done for `install-exec-am'. make[9]: Nothing to be done for `install-data-am'. Making install in include make[9]: Nothing to be done for `install-exec-am'. ../../../../../../3rdparty/libprocess/3rdparty/stout/install-sh -c -d '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/include' ../../../../../../3rdparty/libprocess/3rdparty/stout/install-sh -c -d '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/include/stout' /usr/bin/install -c -m 644 ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/abort.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/attributes.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/base64.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/bits.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/bytes.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/cache.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/check.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/duration.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/dynamiclibrary.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/error.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/exit.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/flags.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/foreach.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/format.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/fs.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/gtest.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/gzip.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/hashmap.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/hashset.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/interval.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/ip.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/json.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/lambda.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/linkedhashmap.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/list.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/mac.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/multihashmap.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/multimap.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/net.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/none.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/nothing.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/numify.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/path.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/preprocessor.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/proc.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/protobuf.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/recordio.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/result.hpp '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/include/stout' ../../../../../../3rdparty/libprocess/3rdparty/stout/install-sh -c -d '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/include/stout/os' /usr/bin/install -c -m 644 ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/bootid.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/chdir.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/close.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/constants.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/environment.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/exists.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/fcntl.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/fork.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/ftruncate.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/getcwd.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/killtree.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/linux.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/ls.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/mkdir.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/mktemp.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/open.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/os.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/osx.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/permissions.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/process.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/pstree.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/read.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/realpath.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/rename.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/rm.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/sendfile.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/shell.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/signals.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/stat.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/sysctl.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/touch.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/utime.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/write.hpp '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/include/stout/os' ../../../../../../3rdparty/libprocess/3rdparty/stout/install-sh -c -d '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/include/stout/posix' /usr/bin/install -c -m 644 ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/posix/gzip.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/posix/os.hpp '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/include/stout/posix' ../../../../../../3rdparty/libprocess/3rdparty/stout/install-sh -c -d '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/include/stout/flags' /usr/bin/install -c -m 644 ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/flags/fetch.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/flags/flag.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/flags/flags.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/flags/parse.hpp '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/include/stout/flags' ../../../../../../3rdparty/libprocess/3rdparty/stout/install-sh -c -d '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/include/stout/tests' /usr/bin/install -c -m 644 ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/tests/utils.hpp '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/include/stout/tests' ../../../../../../3rdparty/libprocess/3rdparty/stout/install-sh -c -d '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/include/stout/os/windows' /usr/bin/install -c -m 644 ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/windows/bootid.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/windows/exists.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/windows/fcntl.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/windows/fork.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/windows/ftruncate.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/windows/killtree.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/windows/ls.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/windows/process.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/windows/pstree.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/windows/sendfile.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/windows/shell.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/windows/signals.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/windows/stat.hpp '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/include/stout/os/windows' ../../../../../../3rdparty/libprocess/3rdparty/stout/install-sh -c -d '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/include/stout/os/posix' /usr/bin/install -c -m 644 ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/posix/bootid.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/posix/exists.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/posix/fcntl.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/posix/fork.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/posix/ftruncate.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/posix/killtree.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/posix/ls.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/posix/process.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/posix/pstree.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/posix/sendfile.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/posix/shell.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/posix/signals.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/posix/stat.hpp '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/include/stout/os/posix' ../../../../../../3rdparty/libprocess/3rdparty/stout/install-sh -c -d '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/include/stout' /usr/bin/install -c -m 644 ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/set.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/some.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/stopwatch.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/stringify.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/strings.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/subcommand.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/svn.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/synchronized.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/thread_local.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/try.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/unimplemented.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/unreachable.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/utils.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/uuid.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/version.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/windows.hpp '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/include/stout' ../../../../../../3rdparty/libprocess/3rdparty/stout/install-sh -c -d '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/include/stout/windows' /usr/bin/install -c -m 644 ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/windows/format.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/windows/gzip.hpp ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/windows/os.hpp '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/include/stout/windows' ../../../../../../3rdparty/libprocess/3rdparty/stout/install-sh -c -d '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/include/stout/os/raw' /usr/bin/install -c -m 644 ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os/raw/environment.hpp '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/include/stout/os/raw' make[8]: Nothing to be done for `install-exec-am'. make[8]: Nothing to be done for `install-data-am'. Making install in . make[6]: Nothing to be done for `install-exec-am'. make[6]: Nothing to be done for `install-data-am'. Making install in include make[6]: Nothing to be done for `install-exec-am'. ../../../../3rdparty/libprocess/install-sh -c -d '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/include' ../../../../3rdparty/libprocess/install-sh -c -d '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/include/process' /usr/bin/install -c -m 644 ../../../../3rdparty/libprocess/include/process/address.hpp ../../../../3rdparty/libprocess/include/process/async.hpp ../../../../3rdparty/libprocess/include/process/check.hpp ../../../../3rdparty/libprocess/include/process/clock.hpp ../../../../3rdparty/libprocess/include/process/collect.hpp ../../../../3rdparty/libprocess/include/process/defer.hpp ../../../../3rdparty/libprocess/include/process/deferred.hpp ../../../../3rdparty/libprocess/include/process/delay.hpp ../../../../3rdparty/libprocess/include/process/dispatch.hpp ../../../../3rdparty/libprocess/include/process/event.hpp ../../../../3rdparty/libprocess/include/process/executor.hpp ../../../../3rdparty/libprocess/include/process/filter.hpp ../../../../3rdparty/libprocess/include/process/firewall.hpp ../../../../3rdparty/libprocess/include/process/future.hpp ../../../../3rdparty/libprocess/include/process/gc.hpp ../../../../3rdparty/libprocess/include/process/gmock.hpp ../../../../3rdparty/libprocess/include/process/gtest.hpp ../../../../3rdparty/libprocess/include/process/help.hpp ../../../../3rdparty/libprocess/include/process/http.hpp ../../../../3rdparty/libprocess/include/process/id.hpp ../../../../3rdparty/libprocess/include/process/io.hpp ../../../../3rdparty/libprocess/include/process/latch.hpp ../../../../3rdparty/libprocess/include/process/limiter.hpp ../../../../3rdparty/libprocess/include/process/logging.hpp ../../../../3rdparty/libprocess/include/process/message.hpp ../../../../3rdparty/libprocess/include/process/mime.hpp ../../../../3rdparty/libprocess/include/process/mutex.hpp ../../../../3rdparty/libprocess/include/process/network.hpp ../../../../3rdparty/libprocess/include/process/once.hpp ../../../../3rdparty/libprocess/include/process/owned.hpp ../../../../3rdparty/libprocess/include/process/pid.hpp ../../../../3rdparty/libprocess/include/process/process.hpp ../../../../3rdparty/libprocess/include/process/profiler.hpp ../../../../3rdparty/libprocess/include/process/protobuf.hpp ../../../../3rdparty/libprocess/include/process/queue.hpp ../../../../3rdparty/libprocess/include/process/reap.hpp ../../../../3rdparty/libprocess/include/process/run.hpp ../../../../3rdparty/libprocess/include/process/sequence.hpp ../../../../3rdparty/libprocess/include/process/shared.hpp ../../../../3rdparty/libprocess/include/process/socket.hpp '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/include/process' ../../../../3rdparty/libprocess/install-sh -c -d '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/include/process' /usr/bin/install -c -m 644 ../../../../3rdparty/libprocess/include/process/statistics.hpp ../../../../3rdparty/libprocess/include/process/system.hpp ../../../../3rdparty/libprocess/include/process/subprocess.hpp ../../../../3rdparty/libprocess/include/process/time.hpp ../../../../3rdparty/libprocess/include/process/timeout.hpp ../../../../3rdparty/libprocess/include/process/timer.hpp ../../../../3rdparty/libprocess/include/process/timeseries.hpp '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/include/process' ../../../../3rdparty/libprocess/install-sh -c -d '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/include/process/ssl' /usr/bin/install -c -m 644 ../../../../3rdparty/libprocess/include/process/ssl/gtest.hpp ../../../../3rdparty/libprocess/include/process/ssl/utilities.hpp '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/include/process/ssl' ../../../../3rdparty/libprocess/install-sh -c -d '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/include/process/metrics' /usr/bin/install -c -m 644 ../../../../3rdparty/libprocess/include/process/metrics/counter.hpp ../../../../3rdparty/libprocess/include/process/metrics/gauge.hpp ../../../../3rdparty/libprocess/include/process/metrics/metric.hpp ../../../../3rdparty/libprocess/include/process/metrics/metrics.hpp ../../../../3rdparty/libprocess/include/process/metrics/timer.hpp '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/include/process/metrics' make[5]: Nothing to be done for `install-exec-am'. make[5]: Nothing to be done for `install-data-am'. Making install in src /Applications/Xcode.app/Contents/Developer/usr/bin/make install-am test ""../.."" = "".."" || \ (../../install-sh -c -d python/cli/src/mesos && cp -pf ../../src/python/cli/src/mesos/__init__.py python/cli/src/mesos/__init__.py) test ""../.."" = "".."" || \ (../../install-sh -c -d python/cli/src/mesos && cp -pf ../../src/python/cli/src/mesos/cli.py python/cli/src/mesos/cli.py) test ""../.."" = "".."" || \ (../../install-sh -c -d python/cli/src/mesos && cp -pf ../../src/python/cli/src/mesos/futures.py python/cli/src/mesos/futures.py) test ""../.."" = "".."" || \ (../../install-sh -c -d python/cli/src/mesos && cp -pf ../../src/python/cli/src/mesos/http.py python/cli/src/mesos/http.py) test ""../.."" = "".."" || \ (../../install-sh -c -d python/interface/src/mesos && cp -pf ../../src/python/interface/src/mesos/__init__.py python/interface/src/mesos/__init__.py) test ""../.."" = "".."" || \ (../../install-sh -c -d python/interface/src/mesos/interface && cp -pf ../../src/python/interface/src/mesos/interface/__init__.py python/interface/src/mesos/interface/__init__.py) test ""../.."" = "".."" || \ (../../install-sh -c -d python/interface/src/mesos/v1 && cp -pf ../../src/python/interface/src/mesos/v1/__init__.py python/interface/src/mesos/v1/__init__.py) test ""../.."" = "".."" || \ (../../install-sh -c -d python/interface/src/mesos/v1/interface && cp -pf ../../src/python/interface/src/mesos/v1/interface/__init__.py python/interface/src/mesos/v1/interface/__init__.py) test ""../.."" = "".."" || \ (.....",2 MESOS-3975,"SSL build of mesos causes flaky testsuite.","When running the tests of an SSL build of Mesos on CentOS 7.1, I see spurious test failures that are, so far, not reproducible. The following tests did fail for me in complete runs but did seem fine when running them individually, in repetition. {noformat} DockerTest.ROOT_DOCKER_CheckPortResource {noformat} {noformat} ContainerizerTest.ROOT_CGROUPS_BalloonFramework {noformat} {noformat} [ RUN ] LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystemCommandExecutor 2015-11-20 19:08:38,826:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server refused to accept the client + /home/vagrant/mesos/build/src/mesos-containerizer mount --help=false --operation=make-rslave --path=/ + grep -E /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystemCommandExecutor_Tz7P8c/.+ /proc/self/mountinfo + grep -v 2b98025c-74f1-41d2-b35a-ce2cdfae347e + cut '-d ' -f5 + xargs --no-run-if-empty umount -l + mount -n --rbind /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystemCommandExecutor_Tz7P8c/provisioner/containers/2b98025c-74f1-41d2-b35a-ce2cdfae347e/backends/copy/rootfses/bed11080-474b-4c69-8e7f-0ab85e895b0d /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystemCommandExecutor_Tz7P8c/slaves/830e842e-c36a-4e4c-bff4-5b9568d7df12-S0/frameworks/830e842e-c36a-4e4c-bff4-5b9568d7df12-0000/executors/c735be54-c47f-4645-bfc1-2f4647e2cddb/runs/2b98025c-74f1-41d2-b35a-ce2cdfae347e/.rootfs Could not load cert file ../../src/tests/containerizer/filesystem_isolator_tests.cpp:354: Failure Value of: statusRunning.get().state() Actual: TASK_FAILED Expected: TASK_RUNNING 2015-11-20 19:08:42,164:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2015-11-20 19:08:45,501:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2015-11-20 19:08:48,837:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2015-11-20 19:08:52,174:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server refused to accept the client ../../src/tests/containerizer/filesystem_isolator_tests.cpp:355: Failure Failed to wait 15secs for statusFinished ../../src/tests/containerizer/filesystem_isolator_tests.cpp:349: Failure Actual function call count doesn't match EXPECT_CALL(sched, statusUpdate(&driver, _))... Expected: to be called twice Actual: called once - unsatisfied and active 2015-11-20 19:08:55,511:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server refused to accept the client *** Aborted at 1448046536 (unix time) try ""date -d @1448046536"" if you are using GNU date *** PC: @ 0x0 (unknown) *** SIGSEGV (@0x0) received by PID 21380 (TID 0x7fa1549e68c0) from PID 0; stack trace: *** @ 0x7fa141796fbb (unknown) @ 0x7fa14179b341 (unknown) @ 0x7fa14f096130 (unknown) {noformat} Vagrantfile generator: {noformat} cat << EOF > Vagrantfile # -*- mode: ruby -*-"" > # vi: set ft=ruby : Vagrant.configure(2) do |config| # Disable shared folder to prevent certain kernel module dependencies. config.vm.synced_folder ""."", ""/vagrant"", disabled: true config.vm.hostname = ""centos71"" config.vm.box = ""bento/centos-7.1"" config.vm.provider ""virtualbox"" do |vb| vb.memory = 16384 vb.cpus = 8 end config.vm.provider ""vmware_fusion"" do |vb| vb.memory = 9216 vb.cpus = 4 end config.vm.provision ""shell"", inline: <<-SHELL sudo yum -y update systemd sudo yum install -y tar wget sudo wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo sudo yum groupinstall -y ""Development Tools"" sudo yum install -y apache-maven python-devel java-1.7.0-openjdk-devel zlib-devel libcurl-devel openssl-devel cyrus-sasl-devel cyrus-sasl-md5 apr-devel subversion-devel apr-util-devel sudo yum install libevent-devel sudo yum install -y git sudo yum install -y docker sudo service docker start sudo docker info #sudo wget -qO- https://get.docker.com/ | sh SHELL end EOF vagrant up vagrant reload vagrant ssh -c "" git clone https://github.com/apache/mesos.git mesos cd mesos git checkout -b 0.26.0-rc1 0.26.0-rc1 ./bootstrap mkdir build cd build ../configure --enable-libevent --enable-ssl GTEST_FILTER="""" make check sudo ./bin/mesos-tests.sh "" {noformat}",5 MESOS-3976,"C++ HTTP Scheduler Library does not work with SSL enabled","The C++ HTTP scheduler library does not work against Mesos when SSL is enabled (without downgrade). The fix should be simple: * The library should detect if SSL is enabled. * If SSL is enabled, connections should be made with HTTPS instead of HTTP.",2 MESOS-3979,"Replace `QuotaInfo` with `Quota` in allocator interface","After introduction of C++ wrapper `Quota` for `QuotaInfo`, all allocator methods using `QuotaInfo` should be updated.",3 MESOS-3981,"Implement recovery in the Hierarchical allocator","The built-in Hierarchical allocator should implement the recovery (in the presence of quota).",3 MESOS-3983,"Tests for quota request validation","Tests should include: * JSON validation; * Absence of irrelevant fields; * Semantic validation.",3 MESOS-3984,"Tests for quota support in `allocate()` function.",NULL,3 MESOS-3985,"Tests for rescinding offers for quota",NULL,1 MESOS-3986,"Tests for allocator recovery",NULL,5 MESOS-3994,"Refactor registry client/puller to avoid JSON and struct.","We should get rid of all JSON and struct for message passing as function returned type. By using the methods provided by spec.hpp to refactor all unnecessary JSON message and struct in registry client and registry puller. Also, remove all redundant check in registry client that are already checked by spec validation. ",3 MESOS-3996,"libprocess: document when, why defer() is necessary","Current rules around this are pretty confusing and undocumented, as evidenced by some recent bugs in this area. Some example snippets in the mesos source code that were a result of this confusion and are indeed bugs: 1. https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/provisioner/docker/registry_client.cpp#L754 {code} return doHttpGet(blobURL, None(), true, true, None()) .then([this, blobURLPath, digest, filePath]( const http::Response& response) -> Future { Try fd = os::open( filePath.value, O_WRONLY | O_CREAT | O_TRUNC | O_CLOEXEC, S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH); {code} ",1 MESOS-4000,"Implicit roles: Design Doc",NULL,2 MESOS-4002,"ReservationEndpointsTest.UnreserveAvailableAndOfferedResources is flaky","Showed up on ASF CI: ( test kept looping on and on and ultimately failing the build after 300 minutes ) https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose,OS=ubuntu%3A14.04,label_exp=docker%7C%7CHadoop/1269/changes {code} [ RUN ] ReservationEndpointsTest.UnreserveAvailableAndOfferedResources I1124 01:07:20.050729 30260 leveldb.cpp:174] Opened db in 107.434842ms I1124 01:07:20.099630 30260 leveldb.cpp:181] Compacted db in 48.82312ms I1124 01:07:20.099722 30260 leveldb.cpp:196] Created db iterator in 29905ns I1124 01:07:20.099738 30260 leveldb.cpp:202] Seeked to beginning of db in 3145ns I1124 01:07:20.099750 30260 leveldb.cpp:271] Iterated through 0 keys in the db in 279ns I1124 01:07:20.099804 30260 replica.cpp:778] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I1124 01:07:20.100637 30292 recover.cpp:447] Starting replica recovery I1124 01:07:20.100934 30292 recover.cpp:473] Replica is in EMPTY status I1124 01:07:20.103240 30288 replica.cpp:674] Replica in EMPTY status received a broadcasted recover request from (6305)@172.17.18.107:37993 I1124 01:07:20.103672 30292 recover.cpp:193] Received a recover response from a replica in EMPTY status I1124 01:07:20.104142 30292 recover.cpp:564] Updating replica status to STARTING I1124 01:07:20.114534 30284 master.cpp:365] Master ad27bc60-16d1-4239-9a65-235a991f9600 (9f2f81738d5e) started on 172.17.18.107:37993 I1124 01:07:20.114558 30284 master.cpp:367] Flags at startup: --acls="""" --allocation_interval=""1000secs"" --allocator=""HierarchicalDRF"" --authenticate=""true"" --authenticate_slaves=""true"" --authenticators=""crammd5"" --authorizers=""local"" --credentials=""/tmp/I60I5f/credentials"" --framework_sorter=""drf"" --help=""false"" --hostname_lookup=""true"" --initialize_driver_logging=""true"" --log_auto_initialize=""true"" --logbufsecs=""0"" --logging_level=""INFO"" --max_slave_ping_timeouts=""5"" --quiet=""false"" --recovery_slave_removal_limit=""100%"" --registry=""replicated_log"" --registry_fetch_timeout=""1mins"" --registry_store_timeout=""25secs"" --registry_strict=""true"" --roles=""role"" --root_submissions=""true"" --slave_ping_timeout=""15secs"" --slave_reregister_timeout=""10mins"" --user_sorter=""drf"" --version=""false"" --webui_dir=""/mesos/mesos-0.26.0/_inst/share/mesos/webui"" --work_dir=""/tmp/I60I5f/master"" --zk_session_timeout=""10secs"" I1124 01:07:20.114809 30284 master.cpp:412] Master only allowing authenticated frameworks to register I1124 01:07:20.114820 30284 master.cpp:417] Master only allowing authenticated slaves to register I1124 01:07:20.114825 30284 credentials.hpp:35] Loading credentials for authentication from '/tmp/I60I5f/credentials' I1124 01:07:20.115067 30284 master.cpp:456] Using default 'crammd5' authenticator I1124 01:07:20.115320 30284 master.cpp:493] Authorization enabled I1124 01:07:20.115792 30285 hierarchical.cpp:162] Initialized hierarchical allocator process I1124 01:07:20.115855 30285 whitelist_watcher.cpp:77] No whitelist given I1124 01:07:20.118755 30285 master.cpp:1625] The newly elected leader is master@172.17.18.107:37993 with id ad27bc60-16d1-4239-9a65-235a991f9600 I1124 01:07:20.118788 30285 master.cpp:1638] Elected as the leading master! I1124 01:07:20.118809 30285 master.cpp:1383] Recovering from registrar I1124 01:07:20.119078 30285 registrar.cpp:307] Recovering registrar I1124 01:07:20.143256 30292 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 38.787419ms I1124 01:07:20.143347 30292 replica.cpp:321] Persisted replica status to STARTING I1124 01:07:20.143717 30292 recover.cpp:473] Replica is in STARTING status I1124 01:07:20.145454 30286 replica.cpp:674] Replica in STARTING status received a broadcasted recover request from (6307)@172.17.18.107:37993 I1124 01:07:20.145979 30292 recover.cpp:193] Received a recover response from a replica in STARTING status I1124 01:07:20.146654 30292 recover.cpp:564] Updating replica status to VOTING I1124 01:07:20.182672 30286 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 35.422256ms I1124 01:07:20.182747 30286 replica.cpp:321] Persisted replica status to VOTING I1124 01:07:20.182929 30286 recover.cpp:578] Successfully joined the Paxos group I1124 01:07:20.183115 30286 recover.cpp:462] Recover process terminated I1124 01:07:20.183831 30286 log.cpp:659] Attempting to start the writer I1124 01:07:20.185907 30285 replica.cpp:494] Replica received implicit promise request from (6308)@172.17.18.107:37993 with proposal 1 I1124 01:07:20.225256 30285 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 39.291288ms I1124 01:07:20.225344 30285 replica.cpp:343] Persisted promised to 1 I1124 01:07:20.226850 30286 coordinator.cpp:238] Coordinator attempting to fill missing positions I1124 01:07:20.228394 30293 replica.cpp:389] Replica received explicit promise request from (6309)@172.17.18.107:37993 for position 0 with proposal 2 I1124 01:07:20.266371 30293 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 37.874181ms I1124 01:07:20.266456 30293 replica.cpp:713] Persisted action at 0 I1124 01:07:20.267927 30293 replica.cpp:538] Replica received write request for position 0 from (6310)@172.17.18.107:37993 I1124 01:07:20.268002 30293 leveldb.cpp:436] Reading position from leveldb took 37139ns I1124 01:07:20.308117 30293 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 39.961976ms I1124 01:07:20.308205 30293 replica.cpp:713] Persisted action at 0 I1124 01:07:20.309033 30290 replica.cpp:692] Replica received learned notice for position 0 from @0.0.0.0:0 I1124 01:07:20.343257 30290 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 34.175337ms I1124 01:07:20.343343 30290 replica.cpp:713] Persisted action at 0 I1124 01:07:20.343377 30290 replica.cpp:698] Replica learned NOP action at position 0 I1124 01:07:20.344446 30282 log.cpp:675] Writer started with ending position 0 I1124 01:07:20.346143 30291 leveldb.cpp:436] Reading position from leveldb took 56896ns I1124 01:07:20.347618 30291 registrar.cpp:340] Successfully fetched the registry (0B) in 228.495104ms I1124 01:07:20.347862 30291 registrar.cpp:439] Applied 1 operations in 41164ns; attempting to update the 'registry' I1124 01:07:20.348794 30279 log.cpp:683] Attempting to append 178 bytes to the log I1124 01:07:20.349081 30279 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 1 I1124 01:07:20.350244 30294 replica.cpp:538] Replica received write request for position 1 from (6311)@172.17.18.107:37993 I1124 01:07:20.385246 30294 leveldb.cpp:341] Persisting action (197 bytes) to leveldb took 34.872508ms I1124 01:07:20.385323 30294 replica.cpp:713] Persisted action at 1 I1124 01:07:20.386814 30294 replica.cpp:692] Replica received learned notice for position 1 from @0.0.0.0:0 I1124 01:07:20.425163 30294 leveldb.cpp:341] Persisting action (199 bytes) to leveldb took 38.282493ms I1124 01:07:20.425262 30294 replica.cpp:713] Persisted action at 1 I1124 01:07:20.425298 30294 replica.cpp:698] Replica learned APPEND action at position 1 I1124 01:07:20.427994 30287 registrar.cpp:484] Successfully updated the 'registry' in 79.949056ms I1124 01:07:20.428141 30283 log.cpp:702] Attempting to truncate the log to 1 I1124 01:07:20.428738 30287 registrar.cpp:370] Successfully recovered registrar I1124 01:07:20.429306 30290 master.cpp:1435] Recovered 0 slaves from the Registry (139B) ; allowing 10mins for slaves to re-register I1124 01:07:20.429592 30290 hierarchical.cpp:174] Allocator recovery is not supported yet I1124 01:07:20.430083 30294 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 2 I1124 01:07:20.431411 30294 replica.cpp:538] Replica received write request for position 2 from (6312)@172.17.18.107:37993 I1124 01:07:20.467258 30294 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 35.661978ms I1124 01:07:20.467342 30294 replica.cpp:713] Persisted action at 2 I1124 01:07:20.468842 30290 replica.cpp:692] Replica received learned notice for position 2 from @0.0.0.0:0 I1124 01:07:20.502264 30290 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 33.367074ms I1124 01:07:20.502426 30290 leveldb.cpp:399] Deleting ~1 keys from leveldb took 80765ns I1124 01:07:20.502452 30290 replica.cpp:713] Persisted action at 2 I1124 01:07:20.502488 30290 replica.cpp:698] Replica learned TRUNCATE action at position 2 I1124 01:07:20.510509 30260 containerizer.cpp:141] Using isolation: posix/cpu,posix/mem,filesystem/posix W1124 01:07:20.511119 30260 backend.cpp:48] Failed to create 'bind' backend: BindBackend requires root privileges I1124 01:07:20.516801 30288 slave.cpp:189] Slave started on 219)@172.17.18.107:37993 I1124 01:07:20.516839 30288 slave.cpp:190] Flags at startup: --appc_store_dir=""/tmp/mesos/store/appc"" --authenticatee=""crammd5"" --cgroups_cpu_enable_pids_and_tids_count=""false"" --cgroups_enable_cfs=""false"" --cgroups_hierarchy=""/sys/fs/cgroup"" --cgroups_limit_swap=""false"" --cgroups_root=""mesos"" --container_disk_watch_interval=""15secs"" --containerizers=""mesos"" --credential=""/tmp/ReservationEndpointsTest_UnreserveAvailableAndOfferedResources_CSzecr/credential"" --default_role=""*"" --disk_watch_interval=""1mins"" --docker=""docker"" --docker_auth_server=""auth.docker.io"" --docker_auth_server_port=""443"" --docker_kill_orphans=""true"" --docker_local_archives_dir=""/tmp/mesos/images/docker"" --docker_puller=""local"" --docker_puller_timeout=""60"" --docker_registry=""registry-1.docker.io"" --docker_registry_port=""443"" --docker_remove_delay=""6hrs"" --docker_socket=""/var/run/docker.sock"" --docker_stop_timeout=""0ns"" --docker_store_dir=""/tmp/mesos/store/docker"" --enforce_container_disk_quota=""false"" --executor_registration_timeout=""1mins"" --executor_shutdown_grace_period=""5secs"" --fetcher_cache_dir=""/tmp/ReservationEndpointsTest_UnreserveAvailableAndOfferedResources_CSzecr/fetch"" --fetcher_cache_size=""2GB"" --frameworks_home="""" --gc_delay=""1weeks"" --gc_disk_headroom=""0.1"" --hadoop_home="""" --help=""false"" --hostname_lookup=""true"" --image_provisioner_backend=""copy"" --initialize_driver_logging=""true"" --isolation=""posix/cpu,posix/mem"" --launcher_dir=""/mesos/mesos-0.26.0/_build/src"" --logbufsecs=""0"" --logging_level=""INFO"" --oversubscribed_resources_interval=""15secs"" --perf_duration=""10secs"" --perf_interval=""1mins"" --qos_correction_interval_min=""0ns"" --quiet=""false"" --recover=""reconnect"" --recovery_timeout=""15mins"" --registration_backoff_factor=""10ms"" --resources=""cpus:2;mem:1024;disk:1024;ports:[31000-32000]"" --revocable_cpu_low_priority=""true"" --sandbox_directory=""/mnt/mesos/sandbox"" --strict=""true"" --switch_user=""true"" --systemd_runtime_directory=""/run/systemd/system"" --version=""false"" --work_dir=""/tmp/ReservationEndpointsTest_UnreserveAvailableAndOfferedResources_CSzecr"" I1124 01:07:20.517670 30288 credentials.hpp:83] Loading credential for authentication from '/tmp/ReservationEndpointsTest_UnreserveAvailableAndOfferedResources_CSzecr/credential' I1124 01:07:20.517982 30288 slave.cpp:320] Slave using credential for: test-principal I1124 01:07:20.518334 30288 resources.cpp:472] Parsing resources as JSON failed: cpus:2;mem:1024;disk:1024;ports:[31000-32000] Trying semicolon-delimited string format instead I1124 01:07:20.518815 30260 resources.cpp:472] Parsing resources as JSON failed: cpus:1;mem:128 Trying semicolon-delimited string format instead I1124 01:07:20.518975 30288 slave.cpp:390] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I1124 01:07:20.519104 30288 slave.cpp:398] Slave attributes: [ ] I1124 01:07:20.519124 30288 slave.cpp:403] Slave hostname: 9f2f81738d5e I1124 01:07:20.519136 30288 slave.cpp:408] Slave checkpoint: true I1124 01:07:20.519407 30260 resources.cpp:472] Parsing resources as JSON failed: mem:384 Trying semicolon-delimited string format instead I1124 01:07:20.522702 30288 state.cpp:52] Recovering state from '/tmp/ReservationEndpointsTest_UnreserveAvailableAndOfferedResources_CSzecr/meta' I1124 01:07:20.523265 30288 status_update_manager.cpp:200] Recovering status update manager I1124 01:07:20.523531 30288 containerizer.cpp:383] Recovering containerizer I1124 01:07:20.524998 30288 slave.cpp:4258] Finished recovery I1124 01:07:20.525802 30288 slave.cpp:4430] Querying resource estimator for oversubscribable resources I1124 01:07:20.526753 30288 slave.cpp:727] New master detected at master@172.17.18.107:37993 I1124 01:07:20.527292 30288 slave.cpp:790] Authenticating with master master@172.17.18.107:37993 I1124 01:07:20.528240 30288 slave.cpp:795] Using default CRAM-MD5 authenticatee I1124 01:07:20.527003 30286 status_update_manager.cpp:174] Pausing sending status updates I1124 01:07:20.528955 30285 authenticatee.cpp:121] Creating new client SASL connection I1124 01:07:20.529469 30285 master.cpp:5169] Authenticating slave(219)@172.17.18.107:37993 I1124 01:07:20.529729 30283 authenticator.cpp:413] Starting authentication session for crammd5_authenticatee(515)@172.17.18.107:37993 I1124 01:07:20.530287 30283 authenticator.cpp:98] Creating new server SASL connection I1124 01:07:20.530764 30285 authenticatee.cpp:212] Received SASL authentication mechanisms: CRAM-MD5 I1124 01:07:20.530903 30285 authenticatee.cpp:238] Attempting to authenticate with mechanism 'CRAM-MD5' I1124 01:07:20.531096 30285 authenticator.cpp:203] Received SASL authentication start I1124 01:07:20.531241 30285 authenticator.cpp:325] Authentication requires more steps I1124 01:07:20.531388 30285 authenticatee.cpp:258] Received SASL authentication step I1124 01:07:20.531616 30285 authenticator.cpp:231] Received SASL authentication step I1124 01:07:20.531668 30285 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: '9f2f81738d5e' server FQDN: '9f2f81738d5e' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I1124 01:07:20.531690 30285 auxprop.cpp:179] Looking up auxiliary property '*userPassword' I1124 01:07:20.531774 30285 auxprop.cpp:179] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I1124 01:07:20.531834 30285 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: '9f2f81738d5e' server FQDN: '9f2f81738d5e' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I1124 01:07:20.531855 30285 auxprop.cpp:129] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I1124 01:07:20.531867 30285 auxprop.cpp:129] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I1124 01:07:20.531903 30285 authenticator.cpp:317] Authentication success I1124 01:07:20.532016 30283 authenticatee.cpp:298] Authentication success I1124 01:07:20.532331 30281 master.cpp:5199] Successfully authenticated principal 'test-principal' at slave(219)@172.17.18.107:37993 I1124 01:07:20.532652 30291 authenticator.cpp:431] Authentication session cleanup for crammd5_authenticatee(515)@172.17.18.107:37993 I1124 01:07:20.533113 30288 slave.cpp:763] Detecting new master I1124 01:07:20.533628 30288 slave.cpp:4444] Received oversubscribable resources from the resource estimator I1124 01:07:20.546396 30288 slave.cpp:858] Successfully authenticated with master master@172.17.18.107:37993 I1124 01:07:20.547111 30287 master.cpp:3878] Registering slave at slave(219)@172.17.18.107:37993 (9f2f81738d5e) with id ad27bc60-16d1-4239-9a65-235a991f9600-S0 I1124 01:07:20.547886 30287 registrar.cpp:439] Applied 1 operations in 91121ns; attempting to update the 'registry' I1124 01:07:20.550647 30287 log.cpp:683] Attempting to append 347 bytes to the log I1124 01:07:20.550935 30279 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 3 I1124 01:07:20.551534 30288 slave.cpp:1252] Will retry registration in 3.399312ms if necessary I1124 01:07:20.551868 30291 replica.cpp:538] Replica received write request for position 3 from (6324)@172.17.18.107:37993 I1124 01:07:20.557605 30281 slave.cpp:1252] Will retry registration in 16.296866ms if necessary I1124 01:07:20.557891 30293 master.cpp:3866] Ignoring register slave message from slave(219)@172.17.18.107:37993 (9f2f81738d5e) as admission is already in progress I1124 01:07:20.574681 30279 slave.cpp:1252] Will retry registration in 73.52632ms if necessary I1124 01:07:20.575078 30293 master.cpp:3866] Ignoring register slave message from slave(219)@172.17.18.107:37993 (9f2f81738d5e) as admission is already in progress I1124 01:07:20.586236 30291 leveldb.cpp:341] Persisting action (366 bytes) to leveldb took 34.301173ms I1124 01:07:20.586287 30291 replica.cpp:713] Persisted action at 3 I1124 01:07:20.587509 30289 replica.cpp:692] Replica received learned notice for position 3 from @0.0.0.0:0 I1124 01:07:20.611263 30289 leveldb.cpp:341] Persisting action (368 bytes) to leveldb took 23.677211ms I1124 01:07:20.611352 30289 replica.cpp:713] Persisted action at 3 I1124 01:07:20.611387 30289 replica.cpp:698] Replica learned APPEND action at position 3 I1124 01:07:20.613580 30279 registrar.cpp:484] Successfully updated the 'registry' in 65.490944ms I1124 01:07:20.613802 30288 log.cpp:702] Attempting to truncate the log to 3 I1124 01:07:20.613993 30288 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 4 I1124 01:07:20.615281 30289 replica.cpp:538] Replica received write request for position 4 from (6325)@172.17.18.107:37993 I1124 01:07:20.615883 30279 master.cpp:3946] Registered slave ad27bc60-16d1-4239-9a65-235a991f9600-S0 at slave(219)@172.17.18.107:37993 (9f2f81738d5e) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I1124 01:07:20.616261 30282 slave.cpp:902] Registered with master master@172.17.18.107:37993; given slave ID ad27bc60-16d1-4239-9a65-235a991f9600-S0 I1124 01:07:20.616883 30282 fetcher.cpp:79] Clearing fetcher cache I1124 01:07:20.617261 30280 status_update_manager.cpp:181] Resuming sending status updates I1124 01:07:20.617766 30282 slave.cpp:925] Checkpointing SlaveInfo to '/tmp/ReservationEndpointsTest_UnreserveAvailableAndOfferedResources_CSzecr/meta/slaves/ad27bc60-16d1-4239-9a65-235a991f9600-S0/slave.info' I1124 01:07:20.616550 30284 hierarchical.cpp:380] Added slave ad27bc60-16d1-4239-9a65-235a991f9600-S0 (9f2f81738d5e) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (allocated: ) I1124 01:07:20.618670 30282 slave.cpp:961] Forwarding total oversubscribed resources I1124 01:07:20.618932 30282 slave.cpp:3197] Received ping from slave-observer(216)@172.17.18.107:37993 I1124 01:07:20.619288 30285 master.cpp:4288] Received update of slave ad27bc60-16d1-4239-9a65-235a991f9600-S0 at slave(219)@172.17.18.107:37993 (9f2f81738d5e) with total oversubscribed resources I1124 01:07:20.619446 30284 hierarchical.cpp:1066] No resources available to allocate! I1124 01:07:20.619526 30284 hierarchical.cpp:1159] No inverse offers to send out! I1124 01:07:20.619568 30284 hierarchical.cpp:977] Performed allocation for slave ad27bc60-16d1-4239-9a65-235a991f9600-S0 in 1.108641ms I1124 01:07:20.620057 30284 hierarchical.cpp:436] Slave ad27bc60-16d1-4239-9a65-235a991f9600-S0 (9f2f81738d5e) updated with oversubscribed resources (total: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000], allocated: ) I1124 01:07:20.620393 30284 hierarchical.cpp:1066] No resources available to allocate! I1124 01:07:20.620462 30284 hierarchical.cpp:1159] No inverse offers to send out! I1124 01:07:20.620507 30284 hierarchical.cpp:977] Performed allocation for slave ad27bc60-16d1-4239-9a65-235a991f9600-S0 in 395959ns I1124 01:07:20.624356 30285 process.cpp:3067] Handling HTTP event for process 'master' with path: '/master/reserve' I1124 01:07:20.624418 30285 http.cpp:336] HTTP POST for /master/reserve from 172.17.18.107:48995 I1124 01:07:20.626936 30285 master.cpp:6224] Sending checkpointed resources cpus(role, te...",1 MESOS-4003,"Pass agent work_dir to isolator modules","Some isolator modules can benefit from access to the agent's {{work_dir}}. For example, the DVD isolator (https://github.com/emccode/mesos-module-dvdi) is currently forced to mount external volumes in a hard-coded directory. Making the {{work_dir}} accessible to the isolator via {{Isolator::recover()}} would allow the isolator to mount volumes within the agent's {{work_dir}}. This can be accomplished by simply adding an overloaded signature for {{Isolator::recover()}} which includes the {{work_dir}} as a parameter.",1 MESOS-4004,"Support default entrypoint and command runtime config in Mesos containerizer","We need to use the entrypoint and command runtime configuration returned from image to be used in Mesos containerizer.",3 MESOS-4005,"Support workdir runtime configuration from image ","We need to support workdir runtime configuration returned from image such as Dockerfile.",2 MESOS-4009,"RegistryClientTest.SimpleRegistryPuller doesn't compile with GCC 5.1.1","GCC 5.1.1 has {{-Werror=sign-compare}} in {{-Wall}} and stumbles over a comparison between signed and unsigned int in {{provisioner_docker_tests.cpp}}.",1 MESOS-4013,"Introduce status endpoint for quota","This endpoint is for querying quota status via the GET method.",5 MESOS-4014,"Introduce remove endpoint for quota","This endpoint is for removing quotas via the DELETE method.",3 MESOS-4020,"Introduce filter for non-revocable resources in `Resources`","{{Resources}} class defines some handy filters, like {{revocable()}}, {{unreserved()}}, and so on. This ticket proposes to add one more: {{nonRevocable()}}.",1 MESOS-4021,"Remove quota from Registry for quota remove request","When a remove quota requests hits the endpoint and passes validation, quota should be removed from the registry before the allocator is notified about the change.",1 MESOS-4026,"RegistryClientTest.SimpleRegistryPuller is flaky","From ASF CI: https://builds.apache.org/job/Mesos/1289/COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,OS=centos:7,label_exp=docker%7C%7CHadoop/console {code} [ RUN ] RegistryClientTest.SimpleRegistryPuller I1127 02:51:40.235900 362 registry_client.cpp:511] Response status for url 'https://localhost:57828/v2/library/busybox/manifests/latest': 401 Unauthorized I1127 02:51:40.249766 360 registry_client.cpp:511] Response status for url 'https://localhost:57828/v2/library/busybox/manifests/latest': 200 OK I1127 02:51:40.251137 361 registry_puller.cpp:195] Downloading layer '1ce2e90b0bc7224de3db1f0d646fe8e2c4dd37f1793928287f6074bc451a57ea' for image 'busybox:latest' I1127 02:51:40.258514 354 registry_client.cpp:511] Response status for url 'https://localhost:57828/v2/library/busybox/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4': 307 Temporary Redirect I1127 02:51:40.264171 367 libevent_ssl_socket.cpp:1023] Socket error: Connection reset by peer ../../src/tests/containerizer/provisioner_docker_tests.cpp:1210: Failure (socket).failure(): Failed accept: connection error: Connection reset by peer [ FAILED ] RegistryClientTest.SimpleRegistryPuller (349 ms) {code} Logs from a previous run that passed: {code} [ RUN ] RegistryClientTest.SimpleRegistryPuller I1126 18:49:05.306396 349 registry_client.cpp:511] Response status for url 'https://localhost:53492/v2/library/busybox/manifests/latest': 401 Unauthorized I1126 18:49:05.321362 347 registry_client.cpp:511] Response status for url 'https://localhost:53492/v2/library/busybox/manifests/latest': 200 OK I1126 18:49:05.322720 352 registry_puller.cpp:195] Downloading layer '1ce2e90b0bc7224de3db1f0d646fe8e2c4dd37f1793928287f6074bc451a57ea' for image 'busybox:latest' I1126 18:49:05.331317 350 registry_client.cpp:511] Response status for url 'https://localhost:53492/v2/library/busybox/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4': 307 Temporary Redirect I1126 18:49:05.370625 352 registry_client.cpp:511] Response status for url 'https://127.0.0.1:53492/': 200 OK I1126 18:49:05.372102 355 registry_puller.cpp:294] Untarring layer '1ce2e90b0bc7224de3db1f0d646fe8e2c4dd37f1793928287f6074bc451a57ea' downloaded from registry to directory 'output_dir' [ OK ] RegistryClientTest.SimpleRegistryPuller (353 ms) {code}",4 MESOS-4029,"ContentType/SchedulerTest is flaky.","SSL build, [Ubuntu 14.04|https://github.com/tillt/mesos-vagrant-ci/blob/master/ubuntu14/setup.sh], non-root test run. {noformat} [----------] 22 tests from ContentType/SchedulerTest [ RUN ] ContentType/SchedulerTest.Subscribe/0 [ OK ] ContentType/SchedulerTest.Subscribe/0 (48 ms) *** Aborted at 1448928007 (unix time) try ""date -d @1448928007"" if you are using GNU date *** [ RUN ] ContentType/SchedulerTest.Subscribe/1 PC: @ 0x1451b8e testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith() *** SIGSEGV (@0x100000030) received by PID 21320 (TID 0x2b549e5d4700) from PID 48; stack trace: *** @ 0x2b54c95940b7 os::Linux::chained_handler() @ 0x2b54c9598219 JVM_handle_linux_signal @ 0x2b5496300340 (unknown) @ 0x1451b8e testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith() @ 0xe2ea6d _ZN7testing8internal18FunctionMockerBaseIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS6_SaIS6_EEEEE10InvokeWithERKSt5tupleIJSC_EE @ 0xe2b1bc testing::internal::FunctionMocker<>::Invoke() @ 0x1118aed mesos::internal::tests::SchedulerTest::Callbacks::received() @ 0x111c453 _ZNKSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS0_2v19scheduler5EventESt5dequeIS8_SaIS8_EEEEEclIJSE_EvEEvRS4_DpOT_ @ 0x111c001 _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_EEEEESt17reference_wrapperIS5_ESt12_PlaceholderILi1EEEE6__callIvJSF_EJLm0ELm1EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE @ 0x111b90d _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_EEEEESt17reference_wrapperIS5_ESt12_PlaceholderILi1EEEEclIJSF_EvEET0_DpOT_ @ 0x111ae09 std::_Function_handler<>::_M_invoke() @ 0x2b5493c6da09 std::function<>::operator()() @ 0x2b5493c688ee process::AsyncExecutorProcess::execute<>() @ 0x2b5493c6db2a _ZZN7process8dispatchI7NothingNS_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS8_SaIS8_EEEEESC_PvSG_SC_SJ_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSO_FSL_T1_T2_T3_ET4_T5_T6_ENKUlPNS_11ProcessBaseEE_clES11_ @ 0x2b5493c765a4 _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingNS0_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeISC_SaISC_EEEEESG_PvSK_SG_SN_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSS_FSP_T1_T2_T3_ET4_T5_T6_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ @ 0x2b54946b1201 std::function<>::operator()() @ 0x2b549469960f process::ProcessBase::visit() @ 0x2b549469d480 process::DispatchEvent::visit() @ 0x9dc0ba process::ProcessBase::serve() @ 0x2b54946958cc process::ProcessManager::resume() @ 0x2b5494692a9c _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_ @ 0x2b549469ccac _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0EEEET_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE @ 0x2b549469cc5c _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_ @ 0x2b549469cbee _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE @ 0x2b549469cb45 _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv @ 0x2b549469cade _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv @ 0x2b5495b81a40 (unknown) @ 0x2b54962f8182 start_thread @ 0x2b549660847d (unknown) make[3]: *** [check-local] Segmentation fault make[3]: Leaving directory `/home/vagrant/mesos/build/src' make[2]: *** [check-am] Error 2 make[2]: Leaving directory `/home/vagrant/mesos/build/src' make[1]: *** [check] Error 2 make[1]: Leaving directory `/home/vagrant/mesos/build/src' make: *** [check-recursive] Error 1 {noformat}",2 MESOS-4036,"Install instructions for CentOS 6.6 lead to errors running `perf`","After using the current installation instructions in the getting started documentation, {{perf}} will not run on CentOS 6.6 because the version of elfutils included in devtoolset-2 is not compatible with the version of {{perf}} installed by {{yum}}. Installing and using devtoolset-3, however (http://linux.web.cern.ch/linux/scientific6/docs/softwarecollections.shtml) fixes this issue. This could be resolved by updating the getting started documentation to recommend installing devtoolset-3.",1 MESOS-4046,"Enable `Env` specified in docker image can be returned from docker pull","Currently docker pull only return an image structure, which only contains entrypoint info. We have docker inspect as a subprocess inside docker pull, which contains many other useful information of a docker image. We should be able to support returning environment variables information from the image.",3 MESOS-4047,"MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery is flaky","{code:title=Output from passed test} [----------] 1 test from MemoryPressureMesosTest 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.000430889 s, 2.4 GB/s [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery I1202 11:09:14.319327 5062 exec.cpp:134] Version: 0.27.0 I1202 11:09:14.333317 5079 exec.cpp:208] Executor registered on slave bea15b35-9aa1-4b57-96fb-29b5f70638ac-S0 Registered executor on ubuntu Starting task 4e62294c-cfcf-4a13-b699-c6a4b7ac5162 sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done' Forked command at 5085 I1202 11:09:14.391739 5077 exec.cpp:254] Received reconnect request from slave bea15b35-9aa1-4b57-96fb-29b5f70638ac-S0 I1202 11:09:14.398598 5082 exec.cpp:231] Executor re-registered on slave bea15b35-9aa1-4b57-96fb-29b5f70638ac-S0 Re-registered executor on ubuntu Shutting down Sending SIGTERM to process tree at pid 5085 Killing the following process trees: [ -+- 5085 sh -c while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done \--- 5086 dd count=512 bs=1M if=/dev/zero of=./temp ] [ OK ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery (1096 ms) {code} {code:title=Output from failed test} [----------] 1 test from MemoryPressureMesosTest 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.000404489 s, 2.6 GB/s [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery I1202 11:09:15.509950 5109 exec.cpp:134] Version: 0.27.0 I1202 11:09:15.568183 5123 exec.cpp:208] Executor registered on slave 88734acc-718e-45b0-95b9-d8f07cea8a9e-S0 Registered executor on ubuntu Starting task 14b6bab9-9f60-4130-bdc4-44efba262bc6 Forked command at 5132 sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done' I1202 11:09:15.665498 5129 exec.cpp:254] Received reconnect request from slave 88734acc-718e-45b0-95b9-d8f07cea8a9e-S0 I1202 11:09:15.670995 5123 exec.cpp:381] Executor asked to shutdown Shutting down Sending SIGTERM to process tree at pid 5132 ../../src/tests/containerizer/memory_pressure_tests.cpp:283: Failure (usage).failure(): Unknown container: ebe90e15-72fa-4519-837b-62f43052c913 *** Aborted at 1449083355 (unix time) try ""date -d @1449083355"" if you are using GNU date *** {code} Notice that in the failed test, the executor is asked to shutdown when it tries to reconnect to the agent.",1 MESOS-4053,"MemoryPressureMesosTest tests fail on CentOS 6.6","{{MemoryPressureMesosTest.CGROUPS_ROOT_Statistics}} and {{MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery}} fail on CentOS 6.6. It seems that mounted cgroups are not properly cleaned up after previous tests, so multiple hierarchies are detected and thus an error is produced: {code} [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics ../../src/tests/mesos.cpp:849: Failure Value of: _baseHierarchy.get() Actual: ""/cgroup"" Expected: baseHierarchy Which is: ""/tmp/mesos_test_cgroup"" ------------------------------------------------------------- Multiple cgroups base hierarchies detected: '/tmp/mesos_test_cgroup' '/cgroup' Mesos does not support multiple cgroups base hierarchies. Please unmount the corresponding (or all) subsystems. ------------------------------------------------------------- ../../src/tests/mesos.cpp:932: Failure (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup '/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy [ FAILED ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (12 ms) [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery ../../src/tests/mesos.cpp:849: Failure Value of: _baseHierarchy.get() Actual: ""/cgroup"" Expected: baseHierarchy Which is: ""/tmp/mesos_test_cgroup"" ------------------------------------------------------------- Multiple cgroups base hierarchies detected: '/tmp/mesos_test_cgroup' '/cgroup' Mesos does not support multiple cgroups base hierarchies. Please unmount the corresponding (or all) subsystems. ------------------------------------------------------------- ../../src/tests/mesos.cpp:932: Failure (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup '/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy [ FAILED ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery (7 ms) {code}",3 MESOS-4056,"Respond with `MethodNotAllowed` if a request uses an unsupported method.","We are inconsistent right now in how we respond to endpoint requests with unsupported methods: both {{MethodNotAllowed}} and {{BadRequest}} are used. We are also not consistent in the error message we include in the body. This ticket proposes use {{MethodNotAllowed}} with standardized message text.",1 MESOS-4058,"Do not use `Resource.role` for resources in quota request.","To be consistent with other operator endpoints and to adhere to the principal of least surprise, move role from each {{Resource}} in quota set request to the request itself. {{Resource.role}} is used for reserved resources. Since quota is not a direct reservation request, to avoid confusion we shall not reuse this field for communicating the role for which quota should be reserved. Food for thought: Shall we try to keep internal storage protobufs as close as possible to operator's JSON to provide some sort of a schema or decouple those two for the sake of flexibility?",1 MESOS-4059,"Investigate remaining flakiness in MasterMaintenanceTest.InverseOffersFilters","Per comments in MESOS-3916, the fix for that issue decreased the degree of flakiness, but it seems that some intermittent test failures do occur -- should be investigated. *Flakiness in task acknowledgment* {code} I1203 18:25:04.609817 28732 status_update_manager.cpp:392] Received status update acknowledgement (UUID: 6afd012e-8e88-41b2-8239-a9b852d07ca1) for task 26305fdd-edb0-4764-8b8a-2558f2b2d81b of framework c7900911-cc7a-4dde-92e7-48fe82cddd9e-0000 W1203 18:25:04.610076 28732 status_update_manager.cpp:762] Unexpected status update acknowledgement (received 6afd012e-8e88-41b2-8239-a9b852d07ca1, expecting 82fc7a7b-e64a-4f4d-ab74-76abac42b4e6) for update TASK_RUNNING (UUID: 82fc7a7b-e64a-4f4d-ab74-76abac42b4e6) for task 26305fdd-edb0-4764-8b8a-2558f2b2d81b of framework c7900911-cc7a-4dde-92e7-48fe82cddd9e-0000 E1203 18:25:04.610339 28736 slave.cpp:2339] Failed to handle status update acknowledgement (UUID: 6afd012e-8e88-41b2-8239-a9b852d07ca1) for task 26305fdd-edb0-4764-8b8a-2558f2b2d81b of framework c7900911-cc7a-4dde-92e7-48fe82cddd9e-0000: Duplicate acknowledgemen {code} This is a race between [launching and acknowledging two tasks|https://github.com/apache/mesos/blob/75aaaacb89fa961b249c9ab7fa0f45dfa9d415a5/src/tests/master_maintenance_tests.cpp#L1486-L1517]. The status updates for each task are not necessarily received in the same order as launching the tasks. *Flakiness in first inverse offer filter* See [this comment in MESOS-3916|https://issues.apache.org/jira/browse/MESOS-3916?focusedCommentId=15027478&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15027478] for the explanation. The related logs are above the comment.",1 MESOS-4064,"Add ContainerInfo to internal Task protobuf.","In what seems like an oversight, when ContainerInfo was added to TaskInfo, it was not added to our internal Task protobuf. Also, unlike the agent, it appears that the master does not use protobuf::createTask. We should try remove the manual construction in the master in favor of construction through protobuf::createTask. Partial contents of ContainerInfo should be exposed through state endpoints on the master and the agent. ",3 MESOS-4066,"Agent should not return partial state when a request is made to /state endpoint during recovery.","Currently when a user is hitting /state.json on the agent, it may return partial state if the agent has failed over and is recovering. There is currently no clear way to tell if this is the case when looking at a response, so the user may incorrectly interpret the agent as being empty of tasks. We could consider exposing the 'state' enum of the agent in the endpoint: {code} enum State { RECOVERING, // Slave is doing recovery. DISCONNECTED, // Slave is not connected to the master. RUNNING, // Slave has (re-)registered. TERMINATING, // Slave is shutting down. } state; {code} This may be a bit tricky to maintain as far as backwards-compatibility of the endpoint, if we were to alter this enum. Exposing this would allow users to be more informed about the state of the agent.",3 MESOS-4067,"ReservationTest.ACLMultipleOperations is flaky","Observed from the CI: https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,OS=ubuntu%3A14.04,label_exp=docker%7C%7CHadoop/1319/changes",2 MESOS-4069,"libevent_ssl_socket assertion fails ","Have been seeing the following socket receive error frequently: {code} F1204 11:12:47.301839 54104 libevent_ssl_socket.cpp:245] Check failed: length > 0 *** Check failure stack trace: *** @ 0x7f73227fe5a6 google::LogMessage::Fail() @ 0x7f73227fe4f2 google::LogMessage::SendToLog() @ 0x7f73227fdef4 google::LogMessage::Flush() @ 0x7f7322800e08 google::LogMessageFatal::~LogMessageFatal() @ 0x7f73227b93e2 process::network::LibeventSSLSocketImpl::recv_callback() @ 0x7f73227b9182 process::network::LibeventSSLSocketImpl::recv_callback() @ 0x7f731cbc75cc bufferevent_run_deferred_callbacks_locked @ 0x7f731cbbdc5d event_base_loop @ 0x7f73227d9ded process::EventLoop::run() @ 0x7f73227a3101 _ZNSt12_Bind_simpleIFPFvvEvEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE @ 0x7f73227a305b std::_Bind_simple<>::operator()() @ 0x7f73227a2ff4 std::thread::_Impl<>::_M_run() @ 0x7f731e0d1a40 (unknown) @ 0x7f731de0a182 start_thread @ 0x7f731db3730d (unknown) @ (nil) (unknown) {code} In this case this was a HTTP get over SSL. The url being: https://dseasb33srnrn.cloudfront.net:443/registry-v2/docker/registry/v2/blobs/sha256/44/44be94a95984bb47dc3a193f59bf8c04d5e877160b745b119278f38753a6f58f/data?Expires=1449259252&Signature=Q4CQdr1LbxsiYyVebmetrx~lqDgQfHVkGxpbMM3PoISn6r07DXIzBX6~tl1iZx9uXdfr~5awH8Kxwh-y8b0dTV3mLTZAVlneZlHbhBAX9qbYMd180-QvUvrFezwOlSmX4B3idvo-zK0CarUu3Ev1hbJz5y3olwe2ZC~RXHEwzkQ_&Key-Pair-Id=APKAJECH5M7VWIS5YZ6Q *Steps to reproduce:* 1. Run master 2. Run slave from your build directory as as: {code} GLOG_v=1;SSL_ENABLED=1;SSL_KEY_FILE=;SSL_CERT_FILE=;sudo -E ./bin/mesos-slave.sh \ --master=127.0.0.1:5050 \ --executor_registration_timeout=5mins \ --containerizers=mesos \ --isolation=filesystem/linux \ --image_providers=DOCKER \ --docker_puller_timeout=600 \ --launcher_dir=$MESOS_BUILD_DIR/src/.libs \ --switch_user=""false"" \ --docker_puller=""registry"" {code} 3. Run mesos-execute from your build directory as : {code} ./src/mesos-execute \ --master=127.0.0.1:5050 \ --command=""uname -a"" \ --name=test \ --docker_image=ubuntu {code}",8 MESOS-4073,"Expose recovery parameters from Hierarchical allocator","While implementing recovery in the hierarchical allocator, we introduced some internal constants that influence the recovery process: {{ALLOCATION_HOLD_OFF_RECOVERY_TIMEOUT}} and {{AGENT_RECOVERY_FACTOR}}. We should expose these parameters for operators to configure. However, I am a bit reluctant to expose them as master flags, because they are implementation specific. It would be nice to combine all hierarchical allocator-related flags into one (maybe JSON) file, similar to how we do it for modules.",3 MESOS-4074,"Tests for master failover in presence of quota",NULL,5 MESOS-4075,"Continue test suite execution across crashing tests.","Currently, mesos-tests.sh exits when a test crashes. This is inconvenient when trying to find out all tests that fail. mesos-tests.sh should rate a test that crashes as failed and continue the same way as if the test merely returned with a failure result and exited properly.",8 MESOS-4082,"Add tests for quota authentication and authorization.",NULL,3 MESOS-4085,"Implement implicit roles","See also design doc: MESOS-4000.",5 MESOS-4087,"Introduce a module for logging executor/task output","Existing executor/task logs are logged to files in their sandbox directory, with some nuances based on which containerizer is used (see background section in linked document). A logger for executor/task logs has the following requirements: * The logger is given a command to run and must handle the stdout/stderr of the command. * The handling of stdout/stderr must be resilient across agent failover. Logging should not stop if the agent fails. * Logs should be readable, presumably via the web UI, or via some other module-specific UI.",5 MESOS-4088,"Modularize existing plain-file logging for executor/task logs launched with the Mesos Containerizer","Once a module for executor/task output logging has been introduced, the default module will mirror the existing behavior. Executor/task stdout/stderr is piped into files within the executor's sandbox directory. The files are exposed in the web UI, via the {{/files}} endpoint.",2 MESOS-4090,"Create light-weight executor only and scheduler only mesos eggs","Currently, when running tasks in docker containers, if the executor uses the mesos.native python library, the execution environment inside the container (OS, native libs, etc) must match the execution environment outside the container fairly closely in order to load the mesos.so library. The solution here can be to introduce a much lighter weight python egg, mesos.executor, which only includes code (and dependencies) needed to create and run an MesosExecutorDriver. Executors can then use this native library instead of mesos.native.",5 MESOS-4098,"Allow interactive terminal for mesos containerizer","Today mesos containerizer does not have a way to run tasks that require interactive sessions. An example use case is running a task that requires a manual password entry from an operator. Another use case could be debugging (gdb). ",10 MESOS-4099,"parallel make tests does not build all test targets","When inside 3rdparty/libprocess: Running {{make -j8 tests}} from a clean build does not yield the {{libprocess-tests}} binary. Running it a subsequent time triggers more compilation and ends up yielding the {{libprocess-tests}} binary. This suggests the {{test}} target is not being built correctly.",1 MESOS-4102,"Quota doesn't allocate resources on slave joining","See attached patch. {{framework1}} is not allocated any resources, despite the fact that the resources on {{agent2}} can safely be allocated to it without risk of violating {{quota1}}. If I understand the intended quota behavior correctly, this doesn't seem intended. Note that if the framework is added _after_ the slaves are added, the resources on {{agent2}} are allocated to {{framework1}}.",5 MESOS-4104,"Design document for interactive terminal for mesos containerizer","As a first step to address the use cases, propose a design document covering the requirement, design and implementation details.",4 MESOS-4107,"`os::strerror_r` breaks the Windows build","`os::strerror_r` does not exist on Windows.",1 MESOS-4108,"Implement `os::mkdtemp` for Windows","Used basically exclusively for testing, this insecure and otherwise-not-quite-suitable-for-prod function needs to work to run what will eventually become the FS tests.",5 MESOS-4109,"HTTPConnectionTest.ClosingResponse is flaky","Output of the test: {code} [ RUN ] HTTPConnectionTest.ClosingResponse I1210 01:20:27.048532 26671 process.cpp:3077] Handling HTTP event for process '(22)' with path: '/(22)/get' ../../../3rdparty/libprocess/src/tests/http_tests.cpp:919: Failure Actual function call count doesn't match EXPECT_CALL(*http.process, get(_))... Expected: to be called twice Actual: called once - unsatisfied and active [ FAILED ] HTTPConnectionTest.ClosingResponse (43 ms) {code}",1 MESOS-4110,"Implement `WindowsError` to correspond with `ErrnoError`.","In the C standard library, `errno` records the last error on a thread. You can pretty-print it with `strerror`. In Stout, we report these errors with `ErrnoError`. The Windows API has something similar, called `GetLastError()`. The way to pretty-print this is hilariously unintuitive and terrible, so in this case it is actually very beneficial to wrap it with something similar to `ErrnoError`, maybe called `WindowsError`.",5 MESOS-4112,"Clean up libprocess gtest macros","This ticket is regarding the libprocess gtest helpers in {{3rdparty/libprocess/include/process/gtest.hpp}}. The pattern in this file seems to be a set of macros: * {{AWAIT_ASSERT__FOR}} * {{AWAIT_ASSERT_}} -- default of 15 seconds * {{AWAIT_\_FOR}} -- alias for {{AWAIT_ASSERT__FOR}} * {{AWAIT_}} -- alias for {{AWAIT_ASSERT_}} * {{AWAIT_EXPECT__FOR}} * {{AWAIT_EXPECT_}} -- default of 15 seconds (1) {{AWAIT_EQ_FOR}} should be added for completeness. (2) In {{gtest}}, we've got {{EXPECT_EQ}} as well as the {{bool}}-specific versions: {{EXPECT_TRUE}} and {{EXPECT_FALSE}}. We should adopt this pattern in these helpers as well. Keeping the pattern above in mind, the following are missing: * {{AWAIT_ASSERT_TRUE_FOR}} * {{AWAIT_ASSERT_TRUE}} * {{AWAIT_ASSERT_FALSE_FOR}} * {{AWAIT_ASSERT_FALSE}} * {{AWAIT_EXPECT_TRUE_FOR}} * {{AWAIT_EXPECT_FALSE_FOR}} (3) There are HTTP response related macros at the bottom of the file, e.g. {{AWAIT_EXPECT_RESPONSE_STATUS_EQ}}, however these are missing their {{ASSERT}} counterparts. -(4) The reason for (3) presumably is because we reach for {{EXPECT}} over {{ASSERT}} in general due to the test suite crashing behavior of {{ASSERT}}. If this is the case, it would be worthwhile considering whether macros such as {{AWAIT_READY}} should alias {{AWAIT_EXPECT_READY}} rather than {{AWAIT_ASSERT_READY}}.- (5) There are a few more missing macros, given {{AWAIT_EQ_FOR}} and {{AWAIT_EQ}} which aliases to {{AWAIT_ASSERT_EQ_FOR}} and {{AWAIT_ASSERT_EQ}} respectively, we should also add {{AWAIT_TRUE_FOR}}, {{AWAIT_TRUE}}, {{AWAIT_FALSE_FOR}}, and {{AWAIT_FALSE}} as well.",2 MESOS-4114,"Add field VIP to message Port","We would like to extend the Mesos protocol buffer 'Port' to include an optional repeated string named ""VIP"" - to map it to a well known virtual IP, or virtual hostname for discovery purposes. We also want this field exposed in DiscoveryInfo in state.json.",2 MESOS-4115,"Fix possible race conditions in registry client tests.","RegistryClient tests show flakiness which manifests as socket timeouts or unexpected buffer showing up in the blobs. Investigate them for possible race conditions.",5 MESOS-4116,"Add tests for quotas + empty roles (no registered frameworks)",NULL,2 MESOS-4126,"Construct the error string in `MethodNotAllowed`.","Consider constructing the error string in {{MethodNotAllowed}} rather than at the invocation site. Currently we want all error messages follow the same pattern, so instead of writing {code} return MethodNotAllowed({""POST""}, ""Expecting 'POST', received '"" + request.method + ""'""); {code} we can write something like {code} MethodNotAllowed({""POST""}, request.method)` {code} ",1 MESOS-4127,"Ensure `Content-Type` field is set for some responses.","As pointed out by [~anandmazumdar] in https://reviews.apache.org/r/40905/, we should make sure we set the {{Content-Type}} files for some responses.",3 MESOS-4128,"Refactor sorter factories in allocator and improve comments around them","For clarity we want to refactor the factory section in the allocator and explain the purpose (and necessity) of all sorters.",3 MESOS-4130,"Document how the fetcher can reach across a proxy connection.","The fetcher uses libcurl for downloading content from HTTP, HTTPS, etc. There is no source code in the pertinent parts of ""net.hpp"" that deals with proxy settings. However, libcurl automatically picks up certain environment variables and adjusts its settings accordingly. See ""man libcurl-tutorial"" for details. See section ""Proxies"", subsection ""Environment Variables"". If you follow this recipe in your Mesos agent startup script, you can use a proxy. We should document this in the fetcher (cache) doc (http://mesos.apache.org/documentation/latest/fetcher/). ",1 MESOS-4136,"Add a ContainerLogger module that restrains log sizes","One of the major problems this logger module aims to solve is overflowing executor/task log files. Log files are simply written to disk, and are not managed other than via occasional garbage collection by the agent process (and this only deals with terminated executors). We should add a {{ContainerLogger}} module that truncates logs as it reaches a configurable maximum size. Additionally, we should determine if the web UI's {{pailer}} needs to be changed to deal with logs that are not append-only. This will be a non-default module which will also serve as an example for how to implement the module.",3 MESOS-4137,"Modularize plain-file logging for executor/task logs launched with the Docker Containerizer","Adding a hook inside the Docker containerizer is slightly more involved than the Mesos containerizer. Docker executors/tasks perform plain-file logging in different places depending on whether the agent is in a Docker container itself || Agent || Code || | Not in container | {{DockerContainerizerProcess::launchExecutorProcess}} | | In container | {{Docker::run}} in a {{mesos-docker-executor}} process | This means a {{ContainerLogger}} will need to be loaded or hooked into the {{mesos-docker-executor}}. Or we will need to change how piping in done in {{mesos-docker-executor}}.",3 MESOS-4143,"Reserve/UnReserve Dynamic Reservation Endpoints allow reservations on non-existing roles","When working with Dynamic reservations via the /reserve and /unreserve endpoints, it is possible to reserve resources for roles that have not been specified via the --roles flag on the master. However, these roles are not usable because the roles have not been defined, nor are they added to the list of roles available. Per the mailing list, changing roles after the fact is not possible at this time. (That may be another JIRA), more importantly, the /reserve and /unreserve end points should not allow reservation of roles not specified by --roles. ",2 MESOS-4149,"Clean up authentication implementation for quota","To authenticate quota requests we allowed {{QuotaHandler}} to call private {{Http::authenticate()}} function. Once MESOS-3231 lands we do not need neither this injection, nor {{authenticate()}} calls in the {{QuotaHandler}}.",1 MESOS-4150,"Implement container logger module metadata recovery","The {{ContainerLoggers}} are intended to be isolated from agent failover, in the same way that executors do not crash when the agent process crashes. For default {{ContainerLogger}} s, like the {{SandboxContainerLogger}} and the (tentatively named) {{TruncatingSandboxContainerLogger}}, the log files are exposed during agent recovery regardless. For non-default {{ContainerLogger}} s, the recovery of executor metadata may be necessary to rebuild endpoints that expose the logs. This can be implemented as part of {{Containerizer::recover}}.",3 MESOS-4154,"Rename shutdown_frameworks to teardown_framework","The mesos is now using teardown framework to shutdown a framework but the acls are still using shutdown_framework, it is better to rename shutdown_framework to teardown_framework for acl to keep consistent. This is a post review request for https://reviews.apache.org/r/40829/",2 MESOS-4160,"Log recover tests are slow.","On Mac OS 10.10.4, some tests take longer than {{1s}} to finish: {code} RecoverTest.AutoInitialization (1003 ms) RecoverTest.AutoInitializationRetry (1000 ms) {code}",1 MESOS-4164,"MasterTest.RecoverResources is slow.","The {{MasterTest.RecoverResources}} test takes more than {{1s}} to finish on my Mac OS 10.10.4: {code} MasterTest.RecoverResources (1018 ms) {code}",1 MESOS-4165,"MasterTest.MasterInfoOnReElection is slow.","The {{MasterTest.MasterInfoOnReElection}} test takes more than {{1s}} to finish on my Mac OS 10.10.4: {code} MasterTest.MasterInfoOnReElection (1024 ms) {code}",1 MESOS-4166,"MasterTest.LaunchCombinedOfferTest is slow.","The {{MasterTest.LaunchCombinedOfferTest}} test takes more than {{2s}} to finish on my Mac OS 10.10.4: {code} MasterTest.LaunchCombinedOfferTest (2023 ms) {code}",1 MESOS-4167,"MasterTest.OfferTimeout is slow.","The {{MasterTest.OfferTimeout}} test takes more than {{1s}} to finish on my Mac OS 10.10.4: {code} MasterTest.OfferTimeout (1053 ms) {code}",1 MESOS-4170,"OversubscriptionTest.UpdateAllocatorOnSchedulerFailover is slow.","The {{OversubscriptionTest.UpdateAllocatorOnSchedulerFailover}} test takes more than {{1s}} to finish on my Mac OS 10.10.4: {code} OversubscriptionTest.UpdateAllocatorOnSchedulerFailover (1018 ms) {code}",1 MESOS-4171,"OversubscriptionTest.RemoveCapabilitiesOnSchedulerFailover is slow.","The {{OversubscriptionTest.RemoveCapabilitiesOnSchedulerFailover}} test takes more than {{1s}} to finish on my Mac OS 10.10.4: {code} OversubscriptionTest.RemoveCapabilitiesOnSchedulerFailover (1018 ms) {code}",1 MESOS-4172,"GarbageCollectorIntegrationTest.Restart is slow","The {{GarbageCollectorIntegrationTest.Restart}} test takes more than {{5s}} to finish on my Mac OS 10.10.4: {code} GarbageCollectorIntegrationTest.Restart (5102 ms) {code}",3 MESOS-4174,"HookTest.VerifySlaveLaunchExecutorHook is slow.","The {{HookTest.VerifySlaveLaunchExecutorHook}} test takes more than {{5s}} to finish on my Mac OS 10.10.4: {code} HookTest.VerifySlaveLaunchExecutorHook (5061 ms) {code}",1 MESOS-4175,"ContentType/SchedulerTest.Decline is slow.","The {{ContentType/SchedulerTest.Decline}} test takes more than {{1s}} to finish on my Mac OS 10.10.4: {code} ContentType/SchedulerTest.Decline/0 (1022 ms) {code}",1 MESOS-4177,"Create a user doc for Executor HTTP API","We need a user doc similar to the corresponding one for the Scheduler HTTP API.",3 MESOS-4178,"Add persistent volume support to the Authorizer","This ticket is the first in a series that adds authorization support for persistent volume creation and destruction. Persistent volumes should be authorized with the {{principal}} of the reserving entity (framework or master). The idea is to introduce {{Create}} and {{Destroy}} into the ACL. {code} message Create { // Subjects. required Entity principals = 1; // Objects? Perhaps the kind of volume? allowed permissions? } message Destroy { // Subjects. required Entity principals = 1; // Objects. required Entity creator_principals = 2; } {code} ACLs for volume creation and destruction must be added to {{authorizer.proto}}, and the appropriate function overloads must be added to the Authorizer.",1 MESOS-4179,"Extend `Master` to authorize persistent volumes","This ticket is the second in a series that adds authorization support for persistent volumes. Methods {{Master::authorizeCreateVolume()}} and {{Master::authorizeDestroyVolume}} must be added to allow the Master to authorize these operations.",1 MESOS-4183,"Move operator<< definitions to .cpp files and include in .hpp where possible.","We often include complex headers like {{}} in "".hpp"" files to define {{operator<<()}} inline (e.g. ""mesos/authorizer/authorizer.hpp""). Instead, we can move definitions to corresponding "".cpp"" files and replace stream headers with {{iosfwd}}, for example, this is partially done for {{URI}} in ""mesos/uri/uri.hpp"".",3 MESOS-4184,"Jenkins builds for Centos fail with missing 'which' utility and incorrect 'java.home'","Jenkins builds are now consistently failing for centos 7, withe the failure: checking value of Java system property 'java.home'... /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.65-3.b17.el7.x86_64/jre configure: error: could not guess JAVA_HOME They also fail early on during 'bootstrap' with a missing 'which' command. The solution is to update support/docker_build.sh to install 'which' as well as make sure the proper versions of java are installed during the installation process. The problem here is that we install maven BEFORE installing java-1.7.0-openjdk-devel, causing maven to pull in a dependency on java-1.8.0-openjdk. This causes problems with finding the proper java.home in our mesos/configure script because of the mismatch between the most up to date jre (1.8.0) and the most up to date development tools (1.7.0). We can either update the script to pull in the 1.8 devel tools or move our dependence on maven until AFTER our installation of java-1.7.0-openjdk-devel. Unclear what the best solution is.",3 MESOS-4186,"Serialize docker v1 image spec as protobuf","Currently we only support v2 docker manifest serialization method. When we read docker image spec locally from disk, we should be able to parse v1 docker manifest as protobuf, which will make it easier to gather runtime config and other necessary info.",2 MESOS-4187,"Avoid using absolute URLs in documentation pages","Links from one documentation page to another should not use absolute URLs (e.g., {{http://mesos.apache.org/documentation/latest/...}}) for several good reasons. For instance, absolute URLs break when the docs are generated/previewed locally.",1 MESOS-4190,"Create a Design Doc for dynamic weights.","A short design doc for dynamic weights, it will focus on /weights API and the changes to the allocator API.",3 MESOS-4191,"Design doc for fixed point resources",NULL,5 MESOS-4192,"Add documentation for API Versioning","Currently, we don't have any documentation for: - How Mesos implements API versioning ? - How are protobufs versioned and how does mesos handle them internally ? - What do contributors need to do when they make a change to a external user facing protobuf ? The relevant design doc: https://docs.google.com/document/d/1-iQjo6778H_fU_1Zi_Yk6szg8qj-wqYgVgnx7u3h6OU/edit#heading=h.2gkbjz6amn7b ",3 MESOS-4193,"Port `process/file.hpp`",NULL,3 MESOS-4194,"MesosContainerizer* tests leak FDs (pipes)","If you run: {{bin/mesos-tests.sh --gtest_filter=""*MesosContainerizer*"" --gtest_repeat=-1 --gtest_break_on_failure}} And then check: {{lsof | grep mesos}} The number of open pipes will grow linearly with the number of test repetitions.",2 MESOS-4195,"Add dynamic reservation tests with no principal","Currently, there exist no dynamic reservation tests that include authorization of a framework that is registered with no principal. This should be added in order to more comprehensively test the dynamic reservation code.",1 MESOS-4196,"Enable running tests without authorizer.","We do not support creating {{Master}} instance without an {{Authorizer}} in tests: https://github.com/apache/mesos/blob/aa497e81c945677c570484a8aa1a8c8b2e979dfd/src/tests/cluster.cpp#L217. This leads to a segfault when {{masterFlags.acls = None();}} is used in a test, while it's a valid use case and should be allowed. Alternatively, we use {{masterFlags.acls = ACLs();}}, which triggers creation of {{LocalAuthorizer}} with emtpy {{ACLs}}, which seems to be semantically equal to the absence of an authorizer, given {{permissive}} flag is {{true}}. This equivalence should be verified by a test.",3 MESOS-4198,"Disk Resource Reservation is NOT Enforced for Persistent Volumes","If I create a persistent volume on a reserved disk resource, I am able to write data in excess of my reserved size. Disk resource reservation should be enforced just as ""cpus"" and ""mem"" reservations are enforced.",3 MESOS-4200,"Test case(s) for weights + allocation behavior","As far as I can see, we currently have NO test cases for behavior when weights are defined.",2 MESOS-4202,"Race in SSL socket shutdown ","libprocess Socket shares the ownership of the file descriptor with libevent. In the destructor of the libprocess libevent_ssl socket, we call ssl shutdown which is executed asynchronously. This causes the libprocess socket file descriptor tobe closed (and possibly reused) when the same file descriptor could be used bylibevent/ssl. Since we set the shutdown options as SSL_RECEIVED_SHUTDOWN, we leave the any write operations to continue with possibly closed file descriptor. This issue manifests as junk characters written to the file that has been handled the closed socket file descriptor (by OS) that has the above issue.",5 MESOS-4204,"Document that frameworks that participate in a role should cooperate",NULL,2 MESOS-4206,"Write new log-related documentation","This should include: * Default logging behavior for master, agent, framework, executor, task. * Master/agent: ** A summary of log-related flags. ** {{glog}} specific options. * Separation of master/agent logs from container logs. * The {{ContainerLogger}} module.",3 MESOS-4207,"Add an example bug due to a lack of defer() to the defer() documentation","In the past, some bugs have been introduced into the codebase due to a lack of {{defer()}} where it should have been used. It would be useful to add an example of this to the {{defer()}} documentation.",2 MESOS-4208,"PersistentVolumeTest.BadACLDropCreateAndDestroy is flaky","{noformat} [ RUN ] PersistentVolumeTest.BadACLDropCreateAndDestroy I1219 09:51:32.623245 31878 leveldb.cpp:174] Opened db in 4.393596ms I1219 09:51:32.624084 31878 leveldb.cpp:181] Compacted db in 709447ns I1219 09:51:32.624186 31878 leveldb.cpp:196] Created db iterator in 21252ns I1219 09:51:32.624290 31878 leveldb.cpp:202] Seeked to beginning of db in 11391ns I1219 09:51:32.624378 31878 leveldb.cpp:271] Iterated through 0 keys in the db in 611ns I1219 09:51:32.624505 31878 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I1219 09:51:32.625195 31904 recover.cpp:447] Starting replica recovery I1219 09:51:32.625641 31904 recover.cpp:473] Replica is in EMPTY status I1219 09:51:32.627305 31904 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (6740)@172.17.0.3:36408 I1219 09:51:32.627749 31904 recover.cpp:193] Received a recover response from a replica in EMPTY status I1219 09:51:32.628330 31904 recover.cpp:564] Updating replica status to STARTING I1219 09:51:32.629068 31906 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 410494ns I1219 09:51:32.629169 31906 replica.cpp:320] Persisted replica status to STARTING I1219 09:51:32.629598 31906 recover.cpp:473] Replica is in STARTING status I1219 09:51:32.630782 31912 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (6741)@172.17.0.3:36408 I1219 09:51:32.631166 31901 recover.cpp:193] Received a recover response from a replica in STARTING status I1219 09:51:32.632467 31902 recover.cpp:564] Updating replica status to VOTING I1219 09:51:32.633600 31907 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 311370ns I1219 09:51:32.633627 31907 replica.cpp:320] Persisted replica status to VOTING I1219 09:51:32.633719 31907 recover.cpp:578] Successfully joined the Paxos group I1219 09:51:32.633874 31907 recover.cpp:462] Recover process terminated I1219 09:51:32.636409 31909 master.cpp:365] Master bded856d-1c7f-4fad-a8bc-3629ba8c59d3 (60ab6e727501) started on 172.17.0.3:36408 I1219 09:51:32.636593 31909 master.cpp:367] Flags at startup: --acls=""create_volumes { principals { values: ""creator-principal"" } volume_types { type: ANY } } create_volumes { principals { type: ANY } volume_types { type: NONE } } "" --allocation_interval=""1secs"" --allocator=""HierarchicalDRF"" --authenticate=""false"" --authenticate_slaves=""true"" --authenticators=""crammd5"" --authorizers=""local"" --credentials=""/tmp/SpPF7B/credentials"" --framework_sorter=""drf"" --help=""false"" --hostname_lookup=""true"" --initialize_driver_logging=""true"" --log_auto_initialize=""true"" --logbufsecs=""0"" --logging_level=""INFO"" --max_slave_ping_timeouts=""5"" --quiet=""false"" --recovery_slave_removal_limit=""100%"" --registry=""replicated_log"" --registry_fetch_timeout=""1mins"" --registry_store_timeout=""25secs"" --registry_strict=""true"" --roles=""role1"" --root_submissions=""true"" --slave_ping_timeout=""15secs"" --slave_reregister_timeout=""10mins"" --user_sorter=""drf"" --version=""false"" --webui_dir=""/mesos/mesos-0.27.0/_inst/share/mesos/webui"" --work_dir=""/tmp/SpPF7B/master"" --zk_session_timeout=""10secs"" I1219 09:51:32.637055 31909 master.cpp:414] Master allowing unauthenticated frameworks to register I1219 09:51:32.637068 31909 master.cpp:417] Master only allowing authenticated slaves to register I1219 09:51:32.637094 31909 credentials.hpp:35] Loading credentials for authentication from '/tmp/SpPF7B/credentials' I1219 09:51:32.637403 31909 master.cpp:456] Using default 'crammd5' authenticator I1219 09:51:32.637555 31909 master.cpp:493] Authorization enabled W1219 09:51:32.637575 31909 master.cpp:553] The '--roles' flag is deprecated. This flag will be removed in the future. See the Mesos 0.27 upgrade notes for more information I1219 09:51:32.637806 31897 whitelist_watcher.cpp:77] No whitelist given I1219 09:51:32.637820 31910 hierarchical.cpp:147] Initialized hierarchical allocator process I1219 09:51:32.639677 31909 master.cpp:1629] The newly elected leader is master@172.17.0.3:36408 with id bded856d-1c7f-4fad-a8bc-3629ba8c59d3 I1219 09:51:32.639768 31909 master.cpp:1642] Elected as the leading master! I1219 09:51:32.639892 31909 master.cpp:1387] Recovering from registrar I1219 09:51:32.640136 31907 registrar.cpp:307] Recovering registrar I1219 09:51:32.640929 31901 log.cpp:659] Attempting to start the writer I1219 09:51:32.642199 31912 replica.cpp:493] Replica received implicit promise request from (6742)@172.17.0.3:36408 with proposal 1 I1219 09:51:32.642719 31912 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 445876ns I1219 09:51:32.642755 31912 replica.cpp:342] Persisted promised to 1 I1219 09:51:32.643478 31904 coordinator.cpp:238] Coordinator attempting to fill missing positions I1219 09:51:32.645009 31909 replica.cpp:388] Replica received explicit promise request from (6743)@172.17.0.3:36408 for position 0 with proposal 2 I1219 09:51:32.645356 31909 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 310064ns I1219 09:51:32.645382 31909 replica.cpp:712] Persisted action at 0 I1219 09:51:32.646662 31909 replica.cpp:537] Replica received write request for position 0 from (6744)@172.17.0.3:36408 I1219 09:51:32.646721 31909 leveldb.cpp:436] Reading position from leveldb took 29298ns I1219 09:51:32.647047 31909 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 283424ns I1219 09:51:32.647073 31909 replica.cpp:712] Persisted action at 0 I1219 09:51:32.647722 31909 replica.cpp:691] Replica received learned notice for position 0 from @0.0.0.0:0 I1219 09:51:32.648052 31909 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 300825ns I1219 09:51:32.648077 31909 replica.cpp:712] Persisted action at 0 I1219 09:51:32.648095 31909 replica.cpp:697] Replica learned NOP action at position 0 I1219 09:51:32.655295 31899 log.cpp:675] Writer started with ending position 0 I1219 09:51:32.656543 31905 leveldb.cpp:436] Reading position from leveldb took 32788ns I1219 09:51:32.658164 31905 registrar.cpp:340] Successfully fetched the registry (0B) in 0ns I1219 09:51:32.658604 31905 registrar.cpp:439] Applied 1 operations in 38183ns; attempting to update the 'registry' I1219 09:51:32.660102 31905 log.cpp:683] Attempting to append 170 bytes to the log I1219 09:51:32.660538 31906 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 1 I1219 09:51:32.661872 31906 replica.cpp:537] Replica received write request for position 1 from (6745)@172.17.0.3:36408 I1219 09:51:32.662719 31906 leveldb.cpp:341] Persisting action (189 bytes) to leveldb took 483018ns I1219 09:51:32.663054 31906 replica.cpp:712] Persisted action at 1 I1219 09:51:32.664008 31902 replica.cpp:691] Replica received learned notice for position 1 from @0.0.0.0:0 I1219 09:51:32.664330 31902 leveldb.cpp:341] Persisting action (191 bytes) to leveldb took 287310ns I1219 09:51:32.664355 31902 replica.cpp:712] Persisted action at 1 I1219 09:51:32.664376 31902 replica.cpp:697] Replica learned APPEND action at position 1 I1219 09:51:32.665365 31902 registrar.cpp:484] Successfully updated the 'registry' in 0ns I1219 09:51:32.665493 31902 registrar.cpp:370] Successfully recovered registrar I1219 09:51:32.665894 31902 master.cpp:1439] Recovered 0 slaves from the Registry (131B) ; allowing 10mins for slaves to re-register I1219 09:51:32.665990 31902 hierarchical.cpp:165] Skipping recovery of hierarchical allocator: nothing to recover I1219 09:51:32.666266 31902 log.cpp:702] Attempting to truncate the log to 1 I1219 09:51:32.666424 31902 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 2 I1219 09:51:32.667181 31907 replica.cpp:537] Replica received write request for position 2 from (6746)@172.17.0.3:36408 I1219 09:51:32.667768 31907 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 335947ns I1219 09:51:32.668067 31907 replica.cpp:712] Persisted action at 2 I1219 09:51:32.668942 31906 replica.cpp:691] Replica received learned notice for position 2 from @0.0.0.0:0 I1219 09:51:32.669240 31906 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 266566ns I1219 09:51:32.669292 31906 leveldb.cpp:399] Deleting ~1 keys from leveldb took 27852ns I1219 09:51:32.669314 31906 replica.cpp:712] Persisted action at 2 I1219 09:51:32.669334 31906 replica.cpp:697] Replica learned TRUNCATE action at position 2 I1219 09:51:32.691251 31878 containerizer.cpp:141] Using isolation: posix/cpu,posix/mem,filesystem/posix W1219 09:51:32.691759 31878 backend.cpp:48] Failed to create 'bind' backend: BindBackend requires root privileges I1219 09:51:32.697428 31901 slave.cpp:191] Slave started on 228)@172.17.0.3:36408 I1219 09:51:32.697459 31901 slave.cpp:192] Flags at startup: --appc_store_dir=""/tmp/mesos/store/appc"" --authenticatee=""crammd5"" --cgroups_cpu_enable_pids_and_tids_count=""false"" --cgroups_enable_cfs=""false"" --cgroups_hierarchy=""/sys/fs/cgroup"" --cgroups_limit_swap=""false"" --cgroups_root=""mesos"" --container_disk_watch_interval=""15secs"" --containerizers=""mesos"" --credential=""/tmp/PersistentVolumeTest_BadACLDropCreateAndDestroy_gWLtnc/credential"" --default_role=""*"" --disk_watch_interval=""1mins"" --docker=""docker"" --docker_auth_server=""auth.docker.io"" --docker_auth_server_port=""443"" --docker_kill_orphans=""true"" --docker_local_archives_dir=""/tmp/mesos/images/docker"" --docker_puller=""local"" --docker_puller_timeout=""60"" --docker_registry=""registry-1.docker.io"" --docker_registry_port=""443"" --docker_remove_delay=""6hrs"" --docker_socket=""/var/run/docker.sock"" --docker_stop_timeout=""0ns"" --docker_store_dir=""/tmp/mesos/store/docker"" --enforce_container_disk_quota=""false"" --executor_registration_timeout=""1mins"" --executor_shutdown_grace_period=""5secs"" --fetcher_cache_dir=""/tmp/PersistentVolumeTest_BadACLDropCreateAndDestroy_gWLtnc/fetch"" --fetcher_cache_size=""2GB"" --frameworks_home="""" --gc_delay=""1weeks"" --gc_disk_headroom=""0.1"" --hadoop_home="""" --help=""false"" --hostname_lookup=""true"" --image_provisioner_backend=""copy"" --initialize_driver_logging=""true"" --isolation=""posix/cpu,posix/mem"" --launcher_dir=""/mesos/mesos-0.27.0/_build/src"" --logbufsecs=""0"" --logging_level=""INFO"" --oversubscribed_resources_interval=""15secs"" --perf_duration=""10secs"" --perf_interval=""1mins"" --qos_correction_interval_min=""0ns"" --quiet=""false"" --recover=""reconnect"" --recovery_timeout=""15mins"" --registration_backoff_factor=""10ms"" --resources=""cpus:2;mem:1024;disk(role1):2048"" --revocable_cpu_low_priority=""true"" --sandbox_directory=""/mnt/mesos/sandbox"" --strict=""true"" --switch_user=""true"" --systemd_runtime_directory=""/run/systemd/system"" --version=""false"" --work_dir=""/tmp/PersistentVolumeTest_BadACLDropCreateAndDestroy_gWLtnc"" I1219 09:51:32.697963 31901 credentials.hpp:83] Loading credential for authentication from '/tmp/PersistentVolumeTest_BadACLDropCreateAndDestroy_gWLtnc/credential' I1219 09:51:32.698210 31901 slave.cpp:322] Slave using credential for: test-principal I1219 09:51:32.698449 31901 resources.cpp:478] Parsing resources as JSON failed: cpus:2;mem:1024;disk(role1):2048 Trying semicolon-delimited string format instead I1219 09:51:32.699065 31901 slave.cpp:392] Slave resources: cpus(*):2; mem(*):1024; disk(role1):2048; ports(*):[31000-32000] I1219 09:51:32.699137 31901 slave.cpp:400] Slave attributes: [ ] I1219 09:51:32.699151 31901 slave.cpp:405] Slave hostname: 60ab6e727501 I1219 09:51:32.699161 31901 slave.cpp:410] Slave checkpoint: true I1219 09:51:32.699364 31878 sched.cpp:164] Version: 0.27.0 I1219 09:51:32.700614 31911 sched.cpp:262] New master detected at master@172.17.0.3:36408 I1219 09:51:32.700703 31911 sched.cpp:272] No credentials provided. Attempting to register without authentication I1219 09:51:32.700724 31911 sched.cpp:714] Sending SUBSCRIBE call to master@172.17.0.3:36408 I1219 09:51:32.700839 31911 sched.cpp:747] Will retry registration in 620.399428ms if necessary I1219 09:51:32.701244 31903 master.cpp:2197] Received SUBSCRIBE call for framework 'default' at scheduler-0333dddc-4b41-40ed-8853-a1aadf1f1879@172.17.0.3:36408 I1219 09:51:32.701313 31903 master.cpp:1668] Authorizing framework principal 'test-principal' to receive offers for role 'role1' I1219 09:51:32.701625 31903 master.cpp:2268] Subscribing framework default with checkpointing disabled and capabilities [ ] I1219 09:51:32.702308 31903 hierarchical.cpp:260] Added framework bded856d-1c7f-4fad-a8bc-3629ba8c59d3-0000 I1219 09:51:32.702386 31903 hierarchical.cpp:1329] No resources available to allocate! I1219 09:51:32.702422 31903 hierarchical.cpp:1423] No inverse offers to send out! I1219 09:51:32.702448 31903 hierarchical.cpp:1079] Performed allocation for 0 slaves in 114358ns I1219 09:51:32.702638 31903 sched.cpp:641] Framework registered with bded856d-1c7f-4fad-a8bc-3629ba8c59d3-0000 I1219 09:51:32.702688 31903 sched.cpp:655] Scheduler::registered took 25558ns I1219 09:51:32.703553 31901 state.cpp:58] Recovering state from '/tmp/PersistentVolumeTest_BadACLDropCreateAndDestroy_gWLtnc/meta' I1219 09:51:32.704118 31897 status_update_manager.cpp:200] Recovering status update manager I1219 09:51:32.704407 31907 containerizer.cpp:383] Recovering containerizer I1219 09:51:32.705373 31907 slave.cpp:4427] Finished recovery I1219 09:51:32.705991 31907 slave.cpp:4599] Querying resource estimator for oversubscribable resources I1219 09:51:32.706277 31907 slave.cpp:4613] Received oversubscribable resources from the resource estimator I1219 09:51:32.706666 31907 slave.cpp:729] New master detected at master@172.17.0.3:36408 I1219 09:51:32.706738 31907 slave.cpp:792] Authenticating with master master@172.17.0.3:36408 I1219 09:51:32.706760 31907 slave.cpp:797] Using default CRAM-MD5 authenticatee I1219 09:51:32.706886 31899 status_update_manager.cpp:174] Pausing sending status updates I1219 09:51:32.706941 31907 slave.cpp:765] Detecting new master I1219 09:51:32.707036 31899 authenticatee.cpp:121] Creating new client SASL connection I1219 09:51:32.707291 31910 master.cpp:5423] Authenticating slave(228)@172.17.0.3:36408 I1219 09:51:32.707479 31910 authenticator.cpp:413] Starting authentication session for crammd5_authenticatee(510)@172.17.0.3:36408 I1219 09:51:32.707849 31910 authenticator.cpp:98] Creating new server SASL connection I1219 09:51:32.708082 31910 authenticatee.cpp:212] Received SASL authentication mechanisms: CRAM-MD5 I1219 09:51:32.708112 31910 authenticatee.cpp:238] Attempting to authenticate with mechanism 'CRAM-MD5' I1219 09:51:32.708196 31910 authenticator.cpp:203] Received SASL authentication start I1219 09:51:32.708395 31910 authenticator.cpp:325] Authentication requires more steps I1219 09:51:32.708611 31902 authenticatee.cpp:258] Received SASL authentication step I1219 09:51:32.708773 31910 authenticator.cpp:231] Received SASL authentication step I1219 09:51:32.708889 31910 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: '60ab6e727501' server FQDN: '60ab6e727501' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I1219 09:51:32.708976 31910 auxprop.cpp:179] Looking up auxiliary property '*userPassword' I1219 09:51:32.709096 31910 auxprop.cpp:179] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I1219 09:51:32.709200 31910 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: '60ab6e727501' server FQDN: '60ab6e727501' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I1219 09:51:32.709285 31910 auxprop.cpp:129] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I1219 09:51:32.709363 31910 auxprop.cpp:129] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I1219 09:51:32.709452 31910 authenticator.cpp:317] Authentication success I1219 09:51:32.709707 31910 authenticatee.cpp:298] Authentication success I1219 09:51:32.710252 31910 slave.cpp:860] Successfully authenticated with master master@172.17.0.3:36408 I1219 09:51:32.710525 31910 slave.cpp:1254] Will retry registration in 17.44437ms if necessary I1219 09:51:32.709839 31908 master.cpp:5453] Successfully authenticated principal 'test-principal' at slave(228)@172.17.0.3:36408 I1219 09:51:32.710985 31908 master.cpp:4132] Registering slave at slave(228)@172.17.0.3:36408 (60ab6e727501) with id bded856d-1c7f-4fad-a8bc-3629ba8c59d3-S0 I1219 09:51:32.711645 31908 registrar.cpp:439] Applied 1 operations in 83191ns; attempting to update the 'registry' I1219 09:51:32.709908 31912 authenticator.cpp:431] Authentication session cleanup for crammd5_authenticatee(510)@172.17.0.3:36408 I1219 09:51:32.713407 31908 log.cpp:683] Attempting to append 343 bytes to the log I1219 09:51:32.713646 31912 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 3 I1219 09:51:32.714884 31911 replica.cpp:537] Replica received write request for position 3 from (6758)@172.17.0.3:36408 I1219 09:51:32.715221 31911 leveldb.cpp:341] Persisting action (362 bytes) to leveldb took 288909ns I1219 09:51:32.715250 31911 replica.cpp:712] Persisted action at 3 I1219 09:51:32.716145 31912 replica.cpp:691] Replica received learned notice for position 3 from @0.0.0.0:0 I1219 09:51:32.716689 31912 leveldb.cpp:341] Persisting action (364 bytes) to leveldb took 512217ns I1219 09:51:32.716716 31912 replica.cpp:712] Persisted action at 3 I1219 09:51:32.716737 31912 replica.cpp:697] Replica learned APPEND action at position 3 I1219 09:51:32.718426 31911 registrar.cpp:484] Successfully updated the 'registry' in 0ns I1219 09:51:32.719441 31902 slave.cpp:3371] Received ping from slave-observer(228)@172.17.0.3:36408 I1219 09:51:32.719843 31909 log.cpp:702] Attempting to truncate the log to 3 I1219 09:51:32.719908 31911 master.cpp:4200] Registered slave bded856d-1c7f-4fad-a8bc-3629ba8c59d3-S0 at slave(228)@172.17.0.3:36408 (60ab6e727501) with cpus(*):2; mem(*):1024; disk(role1):2048; ports(*):[31000-32000] I1219 09:51:32.720064 31911 slave.cpp:904] Registered with master master@172.17.0.3:36408; given slave ID bded856d-1c7f-4fad-a8bc-3629ba8c59d3-S0 I1219 09:51:32.720088 31911 fetcher.cpp:81] Clearing fetcher cache I1219 09:51:32.720491 31911 slave.cpp:927] Checkpointing SlaveInfo to '/tmp/PersistentVolumeTest_BadACLDropCreateAndDestroy_gWLtnc/meta/slaves/bded856d-1c7f-4fad-a8bc-3629ba8c59d3-S0/slave.info' I1219 09:51:32.720844 31909 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 4 I1219 09:51:32.720929 31911 slave.cpp:963] Forwarding total oversubscribed resources I1219 09:51:32.721017 31903 status_update_manager.cpp:181] Resuming sending status updates I1219 09:51:32.721099 31911 master.cpp:4542] Received update of slave bded856d-1c7f-4fad-a8bc-3629ba8c59d3-S0 at slave(228)@172.17.0.3:36408 (60ab6e727501) with total oversubscribed resources I1219 09:51:32.721141 31905 hierarchical.cpp:465] Added slave bded856d-1c7f-4fad-a8bc-3629ba8c59d3-S0 (60ab6e727501) with cpus(*):2; mem(*):1024; disk(role1):2048; ports(*):[31000-32000] (allocated: ) I1219 09:51:32.721879 31911 replica.cpp:537] Replica received write request for position 4 from (6759)@172.17.0.3:36408 I1219 09:51:32.722293 31905 hierarchical.cpp:1423] No inverse offers to send out! I1219 09:51:32.722337 31905 hierarchical.cpp:1101] Performed allocation for slave bded856d-1c7f-4fad-a8bc-3629ba8c59d3-S0 in 1.155563ms I1219 09:51:32.722681 31905 hierarchical.cpp:521] Slave bded856d-1c7f-4fad-a8bc-3629ba8c59d3-S0 (60ab6e727501) updated with oversubscribed resources (total: cpus(*):2; mem(*):1024; disk(role1):2048; ports(*):[31000-32000], allocated: cpus(*):2; mem(*):1024; ports(*):[31000-32000]; disk(role1):2048) I1219 09:51:32.722713 31909 master.cpp:5252] Sending 1 offers to framework bded856d-...",1 MESOS-4209,"Document ""how to program with dynamic reservations and persistent volumes""","Specifically, some of the gotchas around: * Retrying reservation attempts after a timeout * Fuzzy-matching resources to determine whether a reservation/PV is successful * Represent client state as a state machine and repeatedly move ""toward"" successful terminate stats Should also point to persistent volume example framework. We should also ask Gabriel and others (Arango?) who have built frameworks with PVs/DRs for feedback.",3 MESOS-4214,"Introduce HTTP endpoint /weights for updating weight",NULL,5 MESOS-4218,"Test for Quota Status Endpoint",NULL,3 MESOS-4222,"Document containerizer from user perspective.","Add documentation that covers: * Purpose of containerizers from a use case perspective. * What purpose does each containerizer (mesos. docker, compose) serve. * What criteria could be used to choose a containerizer.",3 MESOS-4223,"Document isolators from user perspective.","The documentation should cover: * Purpose of isolators (business/user perspective). * What is the criteria for choosing/picking any set of isolators.",4 MESOS-4224,"Document isolator internals.","Document isolators from developer perspective, possibly covering: * linux isolators * posix isolators * filesystem, network isolators",4 MESOS-4225,"Exposed docker/appc image manifest to mesos containerizer.","Collect docker image manifest from disk(which contains all runtime configurations), and pass it back to provisioner, so that mesos containerizer can grab all necessary info from provisioner.",2 MESOS-4226,"Enable passing docker image environment variables runtime config to provisioner","Collect environment variables runtime config information from a docker image, and save as a map. Pass it back to provisioner, and handling environment variables merge issue.",1 MESOS-4227,"Enable passing docker image cmd runtime config to provisioner","Cmd is the command to run when starting a container. We should be able to collect Cmd config information from a docker image, and pass it back to provisioner.",1 MESOS-4240,"Pull provisioner from linux filesystem isolator to Mesos containerizer.","The rationale behind this change is that many of the image specifications (e.g., Docker/Appc) are not just for filesystems. They also specify runtime configurations (e.g., environment variables, volumes, etc) for the container. Provisioner should return those runtime configurations to the Mesos containerizer and Mesos containerizer will delegate the isolation of those runtime configurations to the relevant isolator. Here is what it will be look like eventually. We could do those changes in phases: 1) Provisioner will return a ProvisionInfo which includes a 'rootfs' and image specific runtime configurations (could be the Docker/Appc manifest). 2) Then, the Mesos containerizer will generate a ContainerConfig (a protobuf which includes rootfs, sandbox, docker/appc manifest, similar to OCI's host independent config.json) and pass that to each isolator in 'prepare'. Imaging in the future, a DockerRuntimeIsolator takes the docker manifest from ContainerConfig and prepare the container. 3) The isolator's prepare function will return a ContainerLaunchInfo (contains environment variables, namespaces, etc.) which will be used by Mesos containerize to launch containers. Imaging that information will be passed to the launcher in the future. We can do the renaming (ContainerPrepareInfo -> ContainerLaunchInfo) later. ",5 MESOS-4241,"Consolidate docker store slave flags","Currently there are too many slave flags for configuring the docker store/puller. We can remove the following flags: docker_auth_server_port docker_local_archives_dir docker_registry_port docker_puller And consolidate them into the existing flags.",3 MESOS-4255,"Add mechanism for testing recovery of HTTP based executors","Currently, the slave process generates a process ID every time it is initialized via {{process::ID::generate}} function call. This is a problem for testing HTTP executors as it can't retry if there is a disconnection after an agent restart since the prefix is incremented. {code} Agent PID before: slave(1)@127.0.0.1:43915 Agent PID after restart: slave(2)@127.0.0.1:43915 {code} There are a couple of ways to fix this: - Add a constructor to {{Slave}} exclusively for testing that passes on a fixed {{ID}} instead of relying on {{ID::generate}}. - Currently we delegate to slave(1)@ i.e. (1) when nothing is specified as the URL in libprocess i.e. {{127.0.0.1:43915/api/v1/executor}} would delegate to {{slave(1)@127.0.0.1:43915/api/v1/executor}}. Instead of defaulting to (1), we can default to the last known active ID.",3 MESOS-4257,"ExamplesTest.NoExecutorFramework runs forever.","{noformat: title=Good Run} [ RUN ] ExamplesTest.NoExecutorFramework I1221 23:10:02.721617 32528 exec.cpp:444] Ignoring exited event because the driver is aborted! Using temporary directory '/tmp/ExamplesTest_NoExecutorFramework_fCmFLn' I1221 23:10:02.721675 32539 exec.cpp:444] Ignoring exited event because the driver is aborted! I1221 23:10:02.722024 32554 exec.cpp:444] Ignoring exited event because the driver is aborted! WARNING: Logging before InitGoogleLogging() is written to STDERR I1221 23:10:05.179466 32569 resources.cpp:478] Parsing resources as JSON failed: cpus:0.1;mem:32;disk:32 Trying semicolon-delimited string format instead I1221 23:10:05.180269 32569 logging.cpp:172] Logging to STDERR I1221 23:10:05.185768 32569 process.cpp:998] libprocess is initialized on 172.17.0.2:40874 for 16 cpus I1221 23:10:05.200728 32569 leveldb.cpp:174] Opened db in 4.184362ms I1221 23:10:05.202234 32569 leveldb.cpp:181] Compacted db in 1.459268ms I1221 23:10:05.202353 32569 leveldb.cpp:196] Created db iterator in 73761ns I1221 23:10:05.202383 32569 leveldb.cpp:202] Seeked to beginning of db in 3382ns I1221 23:10:05.202405 32569 leveldb.cpp:271] Iterated through 0 keys in the db in 633ns I1221 23:10:05.202674 32569 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I1221 23:10:05.205301 32604 recover.cpp:447] Starting replica recovery I1221 23:10:05.206414 32569 local.cpp:239] Using 'local' authorizer I1221 23:10:05.206405 32604 recover.cpp:473] Replica is in EMPTY status I1221 23:10:05.209595 32594 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (4)@172.17.0.2:40874 I1221 23:10:05.210916 32596 recover.cpp:193] Received a recover response from a replica in EMPTY status I1221 23:10:05.211515 32597 master.cpp:365] Master 3931c1a8-1cd6-49eb-94c8-d01b33bb008e (6ccf2ee56b13) started on 172.17.0.2:40874 I1221 23:10:05.211699 32605 recover.cpp:564] Updating replica status to STARTING I1221 23:10:05.211539 32597 master.cpp:367] Flags at startup: --acls=""permissive: false register_frameworks { principals { type: SOME values: ""test-principal"" } roles { type: SOME values: ""*"" } } run_tasks { principals { type: SOME values: ""test-principal"" } users { type: SOME values: ""mesos"" } } "" --allocation_interval=""1secs"" --allocator=""HierarchicalDRF"" --authenticate=""true"" --authenticate_slaves=""false"" --authenticators=""crammd5"" --authorizers=""local"" --credentials=""/tmp/ExamplesTest_NoExecutorFramework_fCmFLn/credentials"" --framework_sorter=""drf"" --help=""true"" --hostname_lookup=""true"" --initialize_driver_logging=""true"" --log_auto_initialize=""true"" --logbufsecs=""0"" --logging_level=""INFO"" --max_slave_ping_timeouts=""5"" --quiet=""false"" --recovery_slave_removal_limit=""100%"" --registry=""replicated_log"" --registry_fetch_timeout=""1mins"" --registry_store_timeout=""5secs"" --registry_strict=""false"" --root_submissions=""true"" --slave_ping_timeout=""15secs"" --slave_reregister_timeout=""10mins"" --user_sorter=""drf"" --version=""false"" --webui_dir=""/mesos/mesos-0.27.0/src/webui"" --work_dir=""/tmp/mesos-otpdch"" --zk_session_timeout=""10secs"" I1221 23:10:05.212323 32597 master.cpp:412] Master only allowing authenticated frameworks to register I1221 23:10:05.212337 32597 master.cpp:419] Master allowing unauthenticated slaves to register I1221 23:10:05.212347 32597 credentials.hpp:35] Loading credentials for authentication from '/tmp/ExamplesTest_NoExecutorFramework_fCmFLn/credentials' W1221 23:10:05.212442 32597 credentials.hpp:50] Permissions on credentials file '/tmp/ExamplesTest_NoExecutorFramework_fCmFLn/credentials' are too open. It is recommended that your credentials file is NOT accessible by others. I1221 23:10:05.212606 32600 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 656857ns I1221 23:10:05.212620 32597 master.cpp:456] Using default 'crammd5' authenticator I1221 23:10:05.212631 32600 replica.cpp:320] Persisted replica status to STARTING I1221 23:10:05.212893 32597 authenticator.cpp:518] Initializing server SASL I1221 23:10:05.213091 32608 recover.cpp:473] Replica is in STARTING status I1221 23:10:05.213958 32595 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (5)@172.17.0.2:40874 I1221 23:10:05.214323 32594 recover.cpp:193] Received a recover response from a replica in STARTING status I1221 23:10:05.214689 32595 recover.cpp:564] Updating replica status to VOTING I1221 23:10:05.215353 32596 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 487419ns I1221 23:10:05.215384 32596 replica.cpp:320] Persisted replica status to VOTING I1221 23:10:05.215481 32605 recover.cpp:578] Successfully joined the Paxos group I1221 23:10:05.215867 32605 recover.cpp:462] Recover process terminated I1221 23:10:05.216111 32569 containerizer.cpp:141] Using isolation: filesystem/posix,posix/cpu,posix/mem W1221 23:10:05.217021 32569 backend.cpp:48] Failed to create 'bind' backend: BindBackend requires root privileges I1221 23:10:05.221482 32608 slave.cpp:191] Slave started on 1)@172.17.0.2:40874 I1221 23:10:05.221521 32608 slave.cpp:192] Flags at startup: --appc_store_dir=""/tmp/mesos/store/appc"" --authenticatee=""crammd5"" --cgroups_cpu_enable_pids_and_tids_count=""false"" --cgroups_enable_cfs=""false"" --cgroups_hierarchy=""/sys/fs/cgroup"" --cgroups_limit_swap=""false"" --cgroups_root=""mesos"" --container_disk_watch_interval=""15secs"" --containerizers=""mesos"" --default_role=""*"" --disk_watch_interval=""1mins"" --docker=""docker"" --docker_auth_server=""auth.docker.io"" --docker_auth_server_port=""443"" --docker_kill_orphans=""true"" --docker_local_archives_dir=""/tmp/mesos/images/docker"" --docker_puller=""local"" --docker_puller_timeout=""60"" --docker_registry=""registry-1.docker.io"" --docker_registry_port=""443"" --docker_remove_delay=""6hrs"" --docker_socket=""/var/run/docker.sock"" --docker_stop_timeout=""0ns"" --docker_store_dir=""/tmp/mesos/store/docker"" --enforce_container_disk_quota=""false"" --executor_registration_timeout=""1mins"" --executor_shutdown_grace_period=""5secs"" --fetcher_cache_dir=""/tmp/mesos/fetch"" --fetcher_cache_size=""2GB"" --frameworks_home="""" --gc_delay=""1weeks"" --gc_disk_headroom=""0.1"" --hadoop_home="""" --help=""false"" --hostname_lookup=""true"" --image_provisioner_backend=""copy"" --initialize_driver_logging=""true"" --isolation=""filesystem/posix,posix/cpu,posix/mem"" --launcher=""posix"" --launcher_dir=""/mesos/mesos-0.27.0/_build/src"" --logbufsecs=""0"" --logging_level=""INFO"" --oversubscribed_resources_interval=""15secs"" --perf_duration=""10secs"" --perf_interval=""1mins"" --qos_correction_interval_min=""0ns"" --quiet=""false"" --recover=""reconnect"" --recovery_timeout=""15mins"" --registration_backoff_factor=""1secs"" --resources=""cpus:2;mem:10240"" --revocable_cpu_low_priority=""true"" --sandbox_directory=""/mnt/mesos/sandbox"" --strict=""true"" --switch_user=""true"" --systemd_runtime_directory=""/run/systemd/system"" --version=""false"" --work_dir=""/tmp/mesos-otpdch/0"" I1221 23:10:05.222578 32608 resources.cpp:478] Parsing resources as JSON failed: cpus:2;mem:10240 Trying semicolon-delimited string format instead I1221 23:10:05.223465 32608 slave.cpp:392] Slave resources: cpus(*):2; mem(*):10240; disk(*):3.70122e+06; ports(*):[31000-32000] I1221 23:10:05.223621 32569 containerizer.cpp:141] Using isolation: filesystem/posix,posix/cpu,posix/mem I1221 23:10:05.223610 32608 slave.cpp:400] Slave attributes: [ ] I1221 23:10:05.223677 32608 slave.cpp:405] Slave hostname: 6ccf2ee56b13 I1221 23:10:05.223697 32608 slave.cpp:410] Slave checkpoint: true W1221 23:10:05.224143 32569 backend.cpp:48] Failed to create 'bind' backend: BindBackend requires root privileges I1221 23:10:05.226668 32604 slave.cpp:191] Slave started on 2)@172.17.0.2:40874 I1221 23:10:05.226692 32604 slave.cpp:192] Flags at startup: --appc_store_dir=""/tmp/mesos/store/appc"" --authenticatee=""crammd5"" --cgroups_cpu_enable_pids_and_tids_count=""false"" --cgroups_enable_cfs=""false"" --cgroups_hierarchy=""/sys/fs/cgroup"" --cgroups_limit_swap=""false"" --cgroups_root=""mesos"" --container_disk_watch_interval=""15secs"" --containerizers=""mesos"" --default_role=""*"" --disk_watch_interval=""1mins"" --docker=""docker"" --docker_auth_server=""auth.docker.io"" --docker_auth_server_port=""443"" --docker_kill_orphans=""true"" --docker_local_archives_dir=""/tmp/mesos/images/docker"" --docker_puller=""local"" --docker_puller_timeout=""60"" --docker_registry=""registry-1.docker.io"" --docker_registry_port=""443"" --docker_remove_delay=""6hrs"" --docker_socket=""/var/run/docker.sock"" --docker_stop_timeout=""0ns"" --docker_store_dir=""/tmp/mesos/store/docker"" --enforce_container_disk_quota=""false"" --executor_registration_timeout=""1mins"" --executor_shutdown_grace_period=""5secs"" --fetcher_cache_dir=""/tmp/mesos/fetch"" --fetcher_cache_size=""2GB"" --frameworks_home="""" --gc_delay=""1weeks"" --gc_disk_headroom=""0.1"" --hadoop_home="""" --help=""false"" --hostname_lookup=""true"" --image_provisioner_backend=""copy"" --initialize_driver_logging=""true"" --isolation=""filesystem/posix,posix/cpu,posix/mem"" --launcher=""posix"" --launcher_dir=""/mesos/mesos-0.27.0/_build/src"" --logbufsecs=""0"" --logging_level=""INFO"" --oversubscribed_resources_interval=""15secs"" --perf_duration=""10secs"" --perf_interval=""1mins"" --qos_correction_interval_min=""0ns"" --quiet=""false"" --recover=""reconnect"" --recovery_timeout=""15mins"" --registration_backoff_factor=""1secs"" --resources=""cpus:2;mem:10240"" --revocable_cpu_low_priority=""true"" --sandbox_directory=""/mnt/mesos/sandbox"" --strict=""true"" --switch_user=""true"" --systemd_runtime_directory=""/run/systemd/system"" --version=""false"" --work_dir=""/tmp/mesos-otpdch/1"" I1221 23:10:05.227520 32604 resources.cpp:478] Parsing resources as JSON failed: cpus:2;mem:10240 Trying semicolon-delimited string format instead I1221 23:10:05.228037 32604 slave.cpp:392] Slave resources: cpus(*):2; mem(*):10240; disk(*):3.70122e+06; ports(*):[31000-32000] I1221 23:10:05.228148 32604 slave.cpp:400] Slave attributes: [ ] I1221 23:10:05.228169 32604 slave.cpp:405] Slave hostname: 6ccf2ee56b13 I1221 23:10:05.228184 32604 slave.cpp:410] Slave checkpoint: true I1221 23:10:05.229123 32569 containerizer.cpp:141] Using isolation: filesystem/posix,posix/cpu,posix/mem I1221 23:10:05.229641 32605 state.cpp:58] Recovering state from '/tmp/mesos-otpdch/0/meta' W1221 23:10:05.229645 32569 backend.cpp:48] Failed to create 'bind' backend: BindBackend requires root privileges I1221 23:10:05.229636 32595 state.cpp:58] Recovering state from '/tmp/mesos-otpdch/1/meta' I1221 23:10:05.230242 32605 status_update_manager.cpp:200] Recovering status update manager I1221 23:10:05.230254 32598 status_update_manager.cpp:200] Recovering status update manager I1221 23:10:05.230515 32601 containerizer.cpp:383] Recovering containerizer I1221 23:10:05.230562 32602 containerizer.cpp:383] Recovering containerizer I1221 23:10:05.232681 32597 auxprop.cpp:71] Initialized in-memory auxiliary property plugin I1221 23:10:05.232803 32597 master.cpp:493] Authorization enabled I1221 23:10:05.232867 32600 slave.cpp:4427] Finished recovery I1221 23:10:05.232980 32598 slave.cpp:191] Slave started on 3)@172.17.0.2:40874 I1221 23:10:05.233039 32594 slave.cpp:4427] Finished recovery I1221 23:10:05.233376 32599 whitelist_watcher.cpp:77] No whitelist given I1221 23:10:05.233428 32601 hierarchical.cpp:147] Initialized hierarchical allocator process I1221 23:10:05.233003 32598 slave.cpp:192] Flags at startup: --appc_store_dir=""/tmp/mesos/store/appc"" --authenticatee=""crammd5"" --cgroups_cpu_enable_pids_and_tids_count=""false"" --cgroups_enable_cfs=""false"" --cgroups_hierarchy=""/sys/fs/cgroup"" --cgroups_limit_swap=""false"" --cgroups_root=""mesos"" --container_disk_watch_interval=""15secs"" --containerizers=""mesos"" --default_role=""*"" --disk_watch_interval=""1mins"" --docker=""docker"" --docker_auth_server=""auth.docker.io"" --docker_auth_server_port=""443"" --docker_kill_orphans=""true"" --docker_local_archives_dir=""/tmp/mesos/images/docker"" --docker_puller=""local"" --docker_puller_timeout=""60"" --docker_registry=""registry-1.docker.io"" --docker_registry_port=""443"" --docker_remove_delay=""6hrs"" --docker_socket=""/var/run/docker.sock"" --docker_stop_timeout=""0ns"" --docker_store_dir=""/tmp/mesos/store/docker"" --enforce_container_disk_quota=""false"" --executor_registration_timeout=""1mins"" --executor_shutdown_grace_period=""5secs"" --fetcher_cache_dir=""/tmp/mesos/fetch"" --fetcher_cache_size=""2GB"" --frameworks_home="""" --gc_delay=""1weeks"" --gc_disk_headroom=""0.1"" --hadoop_home="""" --help=""false"" --hostname_lookup=""true"" --image_provisioner_backend=""copy"" --initialize_driver_logging=""true"" --isolation=""filesystem/posix,posix/cpu,posix/mem"" --launcher=""posix"" --launcher_dir=""/mesos/mesos-0.27.0/_build/src"" --logbufsecs=""0"" --logging_level=""INFO"" --oversubscribed_resources_interval=""15secs"" --perf_duration=""10secs"" --perf_interval=""1mins"" --qos_correction_interval_min=""0ns"" --quiet=""false"" --recover=""reconnect"" --recovery_timeout=""15mins"" --registration_backoff_factor=""1secs"" --resources=""cpus:2;mem:10240"" --revocable_cpu_low_priority=""true"" --sandbox_directory=""/mnt/mesos/sandbox"" --strict=""true"" --switch_user=""true"" --systemd_runtime_directory=""/run/systemd/system"" --version=""false"" --work_dir=""/tmp/mesos-otpdch/2"" I1221 23:10:05.233744 32600 slave.cpp:4599] Querying resource estimator for oversubscribable resources I1221 23:10:05.233749 32598 resources.cpp:478] Parsing resources as JSON failed: cpus:2;mem:10240 Trying semicolon-delimited string format instead I1221 23:10:05.234222 32598 slave.cpp:392] Slave resources: cpus(*):2; mem(*):10240; disk(*):3.70122e+06; ports(*):[31000-32000] I1221 23:10:05.234284 32598 slave.cpp:400] Slave attributes: [ ] I1221 23:10:05.234299 32598 slave.cpp:405] Slave hostname: 6ccf2ee56b13 I1221 23:10:05.234311 32598 slave.cpp:410] Slave checkpoint: true I1221 23:10:05.234338 32600 slave.cpp:729] New master detected at master@172.17.0.2:40874 I1221 23:10:05.234376 32604 status_update_manager.cpp:174] Pausing sending status updates I1221 23:10:05.234424 32600 slave.cpp:754] No credentials provided. Attempting to register without authentication I1221 23:10:05.234522 32600 slave.cpp:765] Detecting new master I1221 23:10:05.234616 32569 sched.cpp:164] Version: 0.27.0 I1221 23:10:05.234658 32600 slave.cpp:4613] Received oversubscribable resources from the resource estimator I1221 23:10:05.234671 32594 slave.cpp:4599] Querying resource estimator for oversubscribable resources I1221 23:10:05.234884 32606 slave.cpp:4613] Received oversubscribable resources from the resource estimator I1221 23:10:05.235038 32595 status_update_manager.cpp:174] Pausing sending status updates I1221 23:10:05.235043 32606 slave.cpp:729] New master detected at master@172.17.0.2:40874 I1221 23:10:05.235111 32606 slave.cpp:754] No credentials provided. Attempting to register without authentication I1221 23:10:05.235147 32606 slave.cpp:765] Detecting new master I1221 23:10:05.235240 32594 state.cpp:58] Recovering state from '/tmp/mesos-otpdch/2/meta' I1221 23:10:05.235443 32608 status_update_manager.cpp:200] Recovering status update manager I1221 23:10:05.235625 32594 containerizer.cpp:383] Recovering containerizer I1221 23:10:05.236549 32599 slave.cpp:4427] Finished recovery I1221 23:10:05.236984 32593 sched.cpp:262] New master detected at master@172.17.0.2:40874 I1221 23:10:05.237004 32599 slave.cpp:4599] Querying resource estimator for oversubscribable resources I1221 23:10:05.237221 32593 sched.cpp:318] Authenticating with master master@172.17.0.2:40874 I1221 23:10:05.237277 32593 sched.cpp:325] Using default CRAM-MD5 authenticatee I1221 23:10:05.237285 32604 status_update_manager.cpp:174] Pausing sending status updates I1221 23:10:05.237288 32599 slave.cpp:729] New master detected at master@172.17.0.2:40874 I1221 23:10:05.237361 32599 slave.cpp:754] No credentials provided. Attempting to register without authentication I1221 23:10:05.237433 32599 slave.cpp:765] Detecting new master I1221 23:10:05.237565 32599 slave.cpp:4613] Received oversubscribable resources from the resource estimator I1221 23:10:05.238154 32605 authenticatee.cpp:97] Initializing client SASL I1221 23:10:05.238315 32605 authenticatee.cpp:121] Creating new client SASL connection I1221 23:10:05.239640 32597 master.cpp:1200] Dropping 'mesos.internal.AuthenticateMessage' message since not elected yet I1221 23:10:05.239765 32597 master.cpp:1629] The newly elected leader is master@172.17.0.2:40874 with id 3931c1a8-1cd6-49eb-94c8-d01b33bb008e I1221 23:10:05.239794 32597 master.cpp:1642] Elected as the leading master! I1221 23:10:05.239843 32597 master.cpp:1387] Recovering from registrar I1221 23:10:05.240056 32600 registrar.cpp:307] Recovering registrar I1221 23:10:05.241477 32608 log.cpp:659] Attempting to start the writer I1221 23:10:05.244540 32600 replica.cpp:493] Replica received implicit promise request from (39)@172.17.0.2:40874 with proposal 1 I1221 23:10:05.245358 32600 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 776937ns I1221 23:10:05.245393 32600 replica.cpp:342] Persisted promised to 1 I1221 23:10:05.246625 32601 coordinator.cpp:238] Coordinator attempting to fill missing positions I1221 23:10:05.248757 32605 replica.cpp:388] Replica received explicit promise request from (40)@172.17.0.2:40874 for position 0 with proposal 2 I1221 23:10:05.249214 32605 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 366567ns I1221 23:10:05.249246 32605 replica.cpp:712] Persisted action at 0 I1221 23:10:05.250998 32599 replica.cpp:537] Replica received write request for position 0 from (41)@172.17.0.2:40874 I1221 23:10:05.251111 32599 leveldb.cpp:436] Reading position from leveldb took 66773ns I1221 23:10:05.251734 32599 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 379612ns I1221 23:10:05.251759 32599 replica.cpp:712] Persisted action at 0 I1221 23:10:05.252555 32601 replica.cpp:691] Replica received learned notice for position 0 from @0.0.0.0:0 I1221 23:10:05.253010 32601 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 381858ns I1221 23:10:05.253036 32601 replica.cpp:712] Persisted action at 0 I1221 23:10:05.253068 32601 replica.cpp:697] Replica learned NOP action at position 0 I1221 23:10:05.254043 32595 log.cpp:675] Writer started with ending position 0 I1221 23:10:05.256741 32595 leveldb.cpp:436] Reading position from leveldb took 48607ns I1221 23:10:05.260617 32601 registrar.cpp:340] Successfully fetched the registry (0B) in 20.47616ms I1221 23:10:05.260988 32601 registrar.cpp:439] Applied 1 operations in 103123ns; attempting to update the 'registry' I1221 23:10:05.264700 32604 log.cpp:683] Attempting to append 170 bytes to the log I1221 23:10:05.265138 32601 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 1 I1221 23:10:05.266208 32603 replica.cpp:537] Replica received write request for position 1 from (42)@172.17.0.2:40874 I1221 23:10:05.266829 32603 leveldb.cpp:341] Persisting action (189 bytes) to leveldb took 551087ns I1221 23:10:05.266861 32603 replica.cpp:712] Persisted action at 1 I1221 23:10:05.267918 32605 replica.cpp:691] Replica received learned notice for position 1 from @0.0.0.0:0 I1221 23:10:05.268442 32605 leveldb.cpp:341] Persisting action (191 bytes) to leveldb took 453416ns I1221 23:10:05.268470 32605 replica.cpp:712] Persisted action at 1 I1221 23:10:05.268506 32605 replica.cpp:697] Replica learned APPEND action at position 1 I1221 23:10:05.270512 32606 registrar.cpp:484] Successfully updated the 'registry' in 9.375232ms I1221 23:10:05.270705 32606 registrar.cpp:370] Successfully recovered registrar I1221 23:10:05.271045 32602 log.cpp:702] Attempting to truncate the log to 1 I1221 23:10:05.271178 32603 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 2 I1221 23:10:05.271695 32605...",3 MESOS-4261,"Remove docker auth server flag","We currently use a configured docker auth server from a slave flag to get token auth for docker registry. However this doesn't work for private registries as docker registry supports sending down the correct auth server to contact. We should remove docker auth server flag completely and ask the docker registry for auth server.",3 MESOS-4262,"Enable net_cls subsytem in cgroup infrastructure","Currently the control group infrastructure within mesos supports only the memory and CPU subsystems. We need to enhance this infrastructure to support the net_cls subsystem as well. Details of the net_cls subsystem and its use-cases can be found here: https://www.kernel.org/doc/Documentation/cgroups/net_cls.txt Enabling the net_cls will allow us to provide operators to, potentially, regulate framework traffic on a per-container basis. ",5 MESOS-4263,"Report volume usage through ResourceStatistics.","POSIX disk isolator does not currently report volume usage through ResourceStatistics. {{PosixDiskIsolatorProcess::usage()}} should be amended to take into account volume usage as well. ",3 MESOS-4279,"Docker executor truncates task's output when the task is killed.","I'm implementing a graceful restarts of our mesos-marathon-docker setup and I came to a following issue: (it was already discussed on https://github.com/mesosphere/marathon/issues/2876 and guys form mesosphere got to a point that its probably a docker containerizer problem...) To sum it up: When i deploy simple python script to all mesos-slaves: {code} #!/usr/bin/python from time import sleep import signal import sys import datetime def sigterm_handler(_signo, _stack_frame): print ""got %i"" % _signo print datetime.datetime.now().time() sys.stdout.flush() sleep(2) print datetime.datetime.now().time() print ""ending"" sys.stdout.flush() sys.exit(0) signal.signal(signal.SIGTERM, sigterm_handler) signal.signal(signal.SIGINT, sigterm_handler) try: print ""Hello"" i = 0 while True: i += 1 print datetime.datetime.now().time() print ""Iteration #%i"" % i sys.stdout.flush() sleep(1) finally: print ""Goodbye"" {code} and I run it through Marathon like {code:javascript} data = { args: [""/tmp/script.py""], instances: 1, cpus: 0.1, mem: 256, id: ""marathon-test-api"" } {code} During the app restart I get expected result - the task receives sigterm and dies peacefully (during my script-specified 2 seconds period) But when i wrap this python script in a docker: {code} FROM node:4.2 RUN mkdir /app ADD . /app WORKDIR /app ENTRYPOINT [] {code} and run appropriate application by Marathon: {code:javascript} data = { args: [""./script.py""], container: { type: ""DOCKER"", docker: { image: ""bydga/marathon-test-api"" }, forcePullImage: yes }, cpus: 0.1, mem: 256, instances: 1, id: ""marathon-test-api"" } {code} The task during restart (issued from marathon) dies immediately without having a chance to do any cleanup. ",5 MESOS-4281,"Correctly handle disk quota usage when volumes are bind mounted into the container.","In its current implementation disk quota enforcement on the task sandbox will not work correctly when disk volumes are bind mounted into the task sandbox (this happens when Linux filesystem isolator is used).",3 MESOS-4282,"Update isolator prepare function to use ContainerLaunchInfo","Currently we have the isolator's prepare function returning ContainerPrepareInfo protobuf. We should enable ContainerLaunchInfo (contains environment variables, namespaces, etc.) to be returned which will be used by Mesos containerize to launch containers. By doing this (ContainerPrepareInfo -> ContainerLaunchInfo), we can select any necessary information and passing then to launcher.",2 MESOS-4284,"Draft design doc for multi-role frameworks","Create a document that describes the problems with having only single-role frameworks and proposes an MVP solution and implementation approach.",8 MESOS-4285,"Mesos command task doesn't support volumes with image","Currently volumes are stripped when an image is specified running a command task with Mesos containerizer. ",3 MESOS-4289,"Design doc for simple appc image discovery","Create a design document describing the following: - Model and abstraction of the Discoverer - Workflow of the discovery process ",5 MESOS-4291,"fs::enter(rootfs) does not work if 'rootfs' is read only.","I noticed this when I was testing the unified containerizer with the bind mount backend and no volumes. The current implementation of fs::enter will put the old root under /tmp/._old_root_.XXXXXX in the new rootfs. It assumes that /tmp is writable in the new rootfs, but this might not be true, especially if the bind mount backend is used. To solve the problem, what we can do is to mount tmpfs to /tmp in the new rootfs and umount it after pivot_root.",2 MESOS-4292,"Tests for quota with implicit roles.","With the introduction of implicit roles (MESOS-3988), we should make sure quota can be set for an inactive role (unknown to the master) and maybe transition it to the active state.",3 MESOS-4294,"Protobuf parse should support parsing JSON object containing JSON Null.","(This bug was exposed by MESOS-4184, when serializing docker v1 image manifest as protobuf). Currently protobuf::parse returns failures when parsing any JSON containing JSON::Null. If we have any protobuf field set as `JSON::Null`, any other non-repeated field cannot capture their value. For example, assuming we have a protobuf message: {noformat} message Nested { optional string str = 1; repeated string json_null = 2; } {noformat} If there exists any field containing JSON::Null, like below: {noformat} { \""str\"": \""message\"", \""json_null\"": null } {noformat} When we do protobuf::parse, it would return the following failure: {noformat} Failure parse: Not expecting a JSON null {noformat}",1 MESOS-4295,"Change documentation links to ""*.md""","Right now, links either use the form {noformat}[label](/documentation/latest/foo/){noformat} or {noformat}[label](foo.md){noformat}. We should probably switch to using the latter form consistently -- it previews better on Github, and it will make it easier to have multiple versions of the docs on the website at once in the future.",3 MESOS-4296,"Add docker URI fetcher plugin based on curl.","The existing registry client for docker assumes that Mesos is built using SSL support and SSL is enabled. That means Mesos built with libev (or if SSL is disabled) won't be able to use docker registry client to provision docker images. Given the new URI fetcher (MESOS-3918) work has been committed, we can add a new URI fetcher plugin for docker. The plugin will be based on curl so that https and 3xx redirects will be handled automatically. The docker registry puller will just use the URI fetcher to get docker images.",8 MESOS-4298,"Sync up configuration.md and flags.cpp","The https://reviews.apache.org/r/39923/ made some clean up for configuration.md but the related flags.cpp was not updated, we should update those files as well.",1 MESOS-4300,"Add AuthN and AuthZ to maintenance endpoints.","Maintenance endpoints are currently only restricted by firewall settings. They should also support authentication/authorization like other HTTP endpoints.",3 MESOS-4301,"Accepting an inverse offer prints misleading logs","Whenever a scheduler accepts an inverse offer, Mesos will print a line like this in the master logs: {code} W1125 10:05:53.155109 29362 master.cpp:2897] ACCEPT call used invalid offers '[ 932f7d7b-f2d4-42c7-9391-222c19b9d35b-O2 ]': Offer 932f7d7b-f2d4-42c7-9391-222c19b9d35b-O2 is no longer valid {code} Inverse offers should not trigger this warning.",1 MESOS-4304,"hdfs operations fail due to prepended / on path for non-hdfs hadoop clients.","This bug was resolved for the hdfs protocol for MESOS-3602 but since the process checks for the ""hdfs"" protocol at the beginning of the URI, the fix does not extend itself to non-hdfs hadoop clients. {code} I0107 01:22:01.259490 17678 logging.cpp:172] INFO level logging started! I0107 01:22:01.259856 17678 fetcher.cpp:422] Fetcher Info: {""cache_directory"":""\/tmp\/mesos\/fetch\/slaves\/530dda5a-481a-4117-8154-3aee637d3b38-S3\/root"",""items"":[{""action"":""BYPASS_CACHE"",""uri"":{""extract"":true,""value"":""maprfs:\/\/\/mesos\/storm-mesos-0.9.3.tgz""}},{""action"":""BYPASS_CACHE"",""uri"":{""extract"":true,""value"":""http:\/\/s0121.stag.urbanairship.com:36373\/conf\/storm.yaml""}}],""sandbox_directory"":""\/mnt\/data\/mesos\/slaves\/530dda5a-481a-4117-8154-3aee637d3b38-S3\/frameworks\/530dda5a-481a-4117-8154-3aee637d3b38-0000\/executors\/word-count-1-1452129714\/runs\/4443d5ac-d034-49b3-bf12-08fb9b0d92d0"",""user"":""root""} I0107 01:22:01.262171 17678 fetcher.cpp:377] Fetching URI 'maprfs:///mesos/storm-mesos-0.9.3.tgz' I0107 01:22:01.262212 17678 fetcher.cpp:248] Fetching directly into the sandbox directory I0107 01:22:01.262243 17678 fetcher.cpp:185] Fetching URI 'maprfs:///mesos/storm-mesos-0.9.3.tgz' I0107 01:22:01.671777 17678 fetcher.cpp:110] Downloading resource with Hadoop client from 'maprfs:///mesos/storm-mesos-0.9.3.tgz' to '/mnt/data/mesos/slaves/530dda5a-481a-4117-8154-3aee637d3b38-S3/frameworks/530dda5a-481a-4117-8154-3aee637d3b38-0000/executors/word-count-1-1452129714/runs/4443d5ac-d034-49b3-bf12-08fb9b0d92d0/storm-mesos-0.9.3.tgz' copyToLocal: java.net.URISyntaxException: Expected scheme-specific part at index 7: maprfs: Usage: java FsShell [-copyToLocal [-ignoreCrc] [-crc] ] E0107 01:22:02.435556 17678 shell.hpp:90] Command 'hadoop fs -copyToLocal '/maprfs:///mesos/storm-mesos-0.9.3.tgz' '/mnt/data/mesos/slaves/530dda5a-481a-4117-8154-3aee637d3b38-S3/frameworks/530dda5a-481a-4117-8154-3aee637d3b38-0000/executors/word-count-1-1452129714/runs/4443d5ac-d034-49b3-bf12-08fb9b0d92d0/storm-mesos-0.9.3.tgz'' failed; this is the output: Failed to fetch 'maprfs:///mesos/storm-mesos-0.9.3.tgz': HDFS copyToLocal failed: Failed to execute 'hadoop fs -copyToLocal '/maprfs:///mesos/storm-mesos-0.9.3.tgz' '/mnt/data/mesos/slaves/530dda5a-481a-4117-8154-3aee637d3b38-S3/frameworks/530dda5a-481a-4117-8154-3aee637d3b38-0000/executors/word-count-1-1452129714/runs/4443d5ac-d034-49b3-bf12-08fb9b0d92d0/storm-mesos-0.9.3.tgz''; the command was either not found or exited with a non-zero exit status: 255 Failed to synchronize with slave (it's probably exited) {code} After a brief chat with [~jieyu], it was recommended to fix the current hdfs client code because the new hadoop fetcher plugin is slated to use it.",1 MESOS-4307,"Expand the ""Getting Started"" installation instructions","The ""Getting Started"" documentation currently contains basic instructions to prepare several platforms for compilation and installation of Mesos. However, these instructions are not sufficient to run and pass all tests in the test suite, using all configuration options. The installation instructions should be made comprehensive in this respect. It may also be desirable to provide scripts that have been verified to prepare a particular base OS to build, install, and test Mesos. This would be very useful for both developers and users of Mesos. Note that using some features on some platforms requires the installation of software packages from sources that may not be completely reliable in the long-term; for example, packages which are maintained as personal projects of individuals. This should be noted in the instructions accordingly.",5 MESOS-4308,"Reliably report executor terminations to framework schedulers.","Now that executor terminations are reported (unreliably), we should investigate queuing up these messages (on the agent?) and resending them periodically until we get an acknowledgement, much like status updates do. From MESOS-313: The Scheduler interface has a callback for executorLost, but currently it is never called.",5 MESOS-4311,"Protobuf parse should pass error messages when parsing nested JSON.","Currently when protobuf::parse handles nested JSON objects, it cannot pass any error message out. We should enable showing those error messages.",1 MESOS-4314,"Publish Quota Documentation","Publish and finish the operator guide draft for quota which describes basic usage of the endpoints and few basic and advanced usage cases.",3 MESOS-4316,"Support get non-default weights by /weights","Like /quota, we should also add query logic for /weights to keep consistent. Then /roles no longer needs to show weight information.",5 MESOS-4318,"PersistentVolumeTest.BadACLNoPrincipal is flaky","https://builds.apache.org/job/Mesos/1457/COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,OS=centos:7,label_exp=docker%7C%7CHadoop/consoleFull {noformat} [ RUN ] PersistentVolumeTest.BadACLNoPrincipal I0108 01:13:16.117883 1325 leveldb.cpp:174] Opened db in 2.614722ms I0108 01:13:16.118650 1325 leveldb.cpp:181] Compacted db in 706567ns I0108 01:13:16.118702 1325 leveldb.cpp:196] Created db iterator in 24489ns I0108 01:13:16.118723 1325 leveldb.cpp:202] Seeked to beginning of db in 2436ns I0108 01:13:16.118738 1325 leveldb.cpp:271] Iterated through 0 keys in the db in 397ns I0108 01:13:16.118793 1325 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0108 01:13:16.119627 1348 recover.cpp:447] Starting replica recovery I0108 01:13:16.120352 1348 recover.cpp:473] Replica is in EMPTY status I0108 01:13:16.121750 1357 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (7084)@172.17.0.2:32801 I0108 01:13:16.122297 1353 recover.cpp:193] Received a recover response from a replica in EMPTY status I0108 01:13:16.122747 1350 recover.cpp:564] Updating replica status to STARTING I0108 01:13:16.123625 1354 master.cpp:365] Master 773d31e8-383d-4e4b-aa68-f9a3fb9f1fc2 (d9632dd1c41e) started on 172.17.0.2:32801 I0108 01:13:16.123946 1347 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 728242ns I0108 01:13:16.123999 1347 replica.cpp:320] Persisted replica status to STARTING I0108 01:13:16.123708 1354 master.cpp:367] Flags at startup: --acls=""create_volumes { principals { values: ""test-principal"" } volume_types { type: ANY } } create_volumes { principals { type: ANY } volume_types { type: NONE } } "" --allocation_interval=""1secs"" --allocator=""HierarchicalDRF"" --authenticate=""false"" --authenticate_slaves=""true"" --authenticators=""crammd5"" --authorizers=""local"" --credentials=""/tmp/f2rA75/credentials"" --framework_sorter=""drf"" --help=""false"" --hostname_lookup=""true"" --initialize_driver_logging=""true"" --log_auto_initialize=""true"" --logbufsecs=""0"" --logging_level=""INFO"" --max_slave_ping_timeouts=""5"" --quiet=""false"" --recovery_slave_removal_limit=""100%"" --registry=""replicated_log"" --registry_fetch_timeout=""1mins"" --registry_store_timeout=""25secs"" --registry_strict=""true"" --roles=""role1"" --root_submissions=""true"" --slave_ping_timeout=""15secs"" --slave_reregister_timeout=""10mins"" --user_sorter=""drf"" --version=""false"" --webui_dir=""/mesos/mesos-0.27.0/_inst/share/mesos/webui"" --work_dir=""/tmp/f2rA75/master"" --zk_session_timeout=""10secs"" I0108 01:13:16.124219 1354 master.cpp:414] Master allowing unauthenticated frameworks to register I0108 01:13:16.124236 1354 master.cpp:417] Master only allowing authenticated slaves to register I0108 01:13:16.124248 1354 credentials.hpp:35] Loading credentials for authentication from '/tmp/f2rA75/credentials' I0108 01:13:16.124294 1358 recover.cpp:473] Replica is in STARTING status I0108 01:13:16.124644 1354 master.cpp:456] Using default 'crammd5' authenticator I0108 01:13:16.124820 1354 master.cpp:493] Authorization enabled W0108 01:13:16.124843 1354 master.cpp:553] The '--roles' flag is deprecated. This flag will be removed in the future. See the Mesos 0.27 upgrade notes for more information I0108 01:13:16.125154 1348 hierarchical.cpp:147] Initialized hierarchical allocator process I0108 01:13:16.125334 1345 whitelist_watcher.cpp:77] No whitelist given I0108 01:13:16.126065 1346 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (7085)@172.17.0.2:32801 I0108 01:13:16.126806 1348 recover.cpp:193] Received a recover response from a replica in STARTING status I0108 01:13:16.128237 1354 recover.cpp:564] Updating replica status to VOTING I0108 01:13:16.128402 1359 master.cpp:1629] The newly elected leader is master@172.17.0.2:32801 with id 773d31e8-383d-4e4b-aa68-f9a3fb9f1fc2 I0108 01:13:16.128489 1359 master.cpp:1642] Elected as the leading master! I0108 01:13:16.128523 1359 master.cpp:1387] Recovering from registrar I0108 01:13:16.128756 1355 registrar.cpp:307] Recovering registrar I0108 01:13:16.129259 1344 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 531437ns I0108 01:13:16.129292 1344 replica.cpp:320] Persisted replica status to VOTING I0108 01:13:16.129425 1358 recover.cpp:578] Successfully joined the Paxos group I0108 01:13:16.129680 1358 recover.cpp:462] Recover process terminated I0108 01:13:16.130187 1358 log.cpp:659] Attempting to start the writer I0108 01:13:16.131613 1352 replica.cpp:493] Replica received implicit promise request from (7086)@172.17.0.2:32801 with proposal 1 I0108 01:13:16.131983 1352 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 333646ns I0108 01:13:16.132004 1352 replica.cpp:342] Persisted promised to 1 I0108 01:13:16.132627 1348 coordinator.cpp:238] Coordinator attempting to fill missing positions I0108 01:13:16.133896 1349 replica.cpp:388] Replica received explicit promise request from (7087)@172.17.0.2:32801 for position 0 with proposal 2 I0108 01:13:16.134289 1349 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 349652ns I0108 01:13:16.134317 1349 replica.cpp:712] Persisted action at 0 I0108 01:13:16.135470 1351 replica.cpp:537] Replica received write request for position 0 from (7088)@172.17.0.2:32801 I0108 01:13:16.135537 1351 leveldb.cpp:436] Reading position from leveldb took 36181ns I0108 01:13:16.135901 1351 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 308752ns I0108 01:13:16.135924 1351 replica.cpp:712] Persisted action at 0 I0108 01:13:16.136529 1347 replica.cpp:691] Replica received learned notice for position 0 from @0.0.0.0:0 I0108 01:13:16.136889 1347 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 327106ns I0108 01:13:16.136916 1347 replica.cpp:712] Persisted action at 0 I0108 01:13:16.136943 1347 replica.cpp:697] Replica learned NOP action at position 0 I0108 01:13:16.137707 1359 log.cpp:675] Writer started with ending position 0 I0108 01:13:16.138844 1348 leveldb.cpp:436] Reading position from leveldb took 31371ns I0108 01:13:16.139878 1356 registrar.cpp:340] Successfully fetched the registry (0B) in 0ns I0108 01:13:16.140012 1356 registrar.cpp:439] Applied 1 operations in 42063ns; attempting to update the 'registry' I0108 01:13:16.140797 1355 log.cpp:683] Attempting to append 170 bytes to the log I0108 01:13:16.140974 1345 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 1 I0108 01:13:16.141744 1354 replica.cpp:537] Replica received write request for position 1 from (7089)@172.17.0.2:32801 I0108 01:13:16.142226 1354 leveldb.cpp:341] Persisting action (189 bytes) to leveldb took 441971ns I0108 01:13:16.142251 1354 replica.cpp:712] Persisted action at 1 I0108 01:13:16.142860 1351 replica.cpp:691] Replica received learned notice for position 1 from @0.0.0.0:0 I0108 01:13:16.143198 1351 leveldb.cpp:341] Persisting action (191 bytes) to leveldb took 305928ns I0108 01:13:16.143223 1351 replica.cpp:712] Persisted action at 1 I0108 01:13:16.143241 1351 replica.cpp:697] Replica learned APPEND action at position 1 I0108 01:13:16.144271 1354 registrar.cpp:484] Successfully updated the 'registry' in 0ns I0108 01:13:16.144435 1354 registrar.cpp:370] Successfully recovered registrar I0108 01:13:16.144567 1359 log.cpp:702] Attempting to truncate the log to 1 I0108 01:13:16.144780 1359 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 2 I0108 01:13:16.144989 1348 hierarchical.cpp:165] Skipping recovery of hierarchical allocator: nothing to recover I0108 01:13:16.144928 1354 master.cpp:1439] Recovered 0 slaves from the Registry (131B) ; allowing 10mins for slaves to re-register I0108 01:13:16.145690 1357 replica.cpp:537] Replica received write request for position 2 from (7090)@172.17.0.2:32801 I0108 01:13:16.146072 1357 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 345113ns I0108 01:13:16.146097 1357 replica.cpp:712] Persisted action at 2 I0108 01:13:16.146667 1358 replica.cpp:691] Replica received learned notice for position 2 from @0.0.0.0:0 I0108 01:13:16.147060 1358 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 283648ns I0108 01:13:16.147116 1358 leveldb.cpp:399] Deleting ~1 keys from leveldb took 32174ns I0108 01:13:16.147135 1358 replica.cpp:712] Persisted action at 2 I0108 01:13:16.147153 1358 replica.cpp:697] Replica learned TRUNCATE action at position 2 I0108 01:13:16.166832 1325 containerizer.cpp:139] Using isolation: posix/cpu,posix/mem,filesystem/posix W0108 01:13:16.167556 1325 backend.cpp:48] Failed to create 'bind' backend: BindBackend requires root privileges I0108 01:13:16.170526 1349 slave.cpp:191] Slave started on 231)@172.17.0.2:32801 I0108 01:13:16.170718 1349 slave.cpp:192] Flags at startup: --appc_store_dir=""/tmp/mesos/store/appc"" --authenticatee=""crammd5"" --cgroups_cpu_enable_pids_and_tids_count=""false"" --cgroups_enable_cfs=""false"" --cgroups_hierarchy=""/sys/fs/cgroup"" --cgroups_limit_swap=""false"" --cgroups_root=""mesos"" --container_disk_watch_interval=""15secs"" --containerizers=""mesos"" --credential=""/tmp/PersistentVolumeTest_BadACLNoPrincipal_yqJjLY/credential"" --default_role=""*"" --disk_watch_interval=""1mins"" --docker=""docker"" --docker_auth_server=""https://auth.docker.io"" --docker_kill_orphans=""true"" --docker_puller_timeout=""60"" --docker_registry=""https://registry-1.docker.io"" --docker_remove_delay=""6hrs"" --docker_socket=""/var/run/docker.sock"" --docker_stop_timeout=""0ns"" --docker_store_dir=""/tmp/mesos/store/docker"" --enforce_container_disk_quota=""false"" --executor_registration_timeout=""1mins"" --executor_shutdown_grace_period=""5secs"" --fetcher_cache_dir=""/tmp/PersistentVolumeTest_BadACLNoPrincipal_yqJjLY/fetch"" --fetcher_cache_size=""2GB"" --frameworks_home="""" --gc_delay=""1weeks"" --gc_disk_headroom=""0.1"" --hadoop_home="""" --help=""false"" --hostname_lookup=""true"" --image_provisioner_backend=""copy"" --initialize_driver_logging=""true"" --isolation=""posix/cpu,posix/mem"" --launcher_dir=""/mesos/mesos-0.27.0/_build/src"" --logbufsecs=""0"" --logging_level=""INFO"" --oversubscribed_resources_interval=""15secs"" --perf_duration=""10secs"" --perf_interval=""1mins"" --qos_correction_interval_min=""0ns"" --quiet=""false"" --recover=""reconnect"" --recovery_timeout=""15mins"" --registration_backoff_factor=""10ms"" --resources=""cpus:2;mem:1024;disk(role1):2048"" --revocable_cpu_low_priority=""true"" --sandbox_directory=""/mnt/mesos/sandbox"" --strict=""true"" --switch_user=""true"" --systemd_runtime_directory=""/run/systemd/system"" --version=""false"" --work_dir=""/tmp/PersistentVolumeTest_BadACLNoPrincipal_yqJjLY"" I0108 01:13:16.171269 1349 credentials.hpp:83] Loading credential for authentication from '/tmp/PersistentVolumeTest_BadACLNoPrincipal_yqJjLY/credential' I0108 01:13:16.171505 1349 slave.cpp:322] Slave using credential for: test-principal I0108 01:13:16.171747 1349 resources.cpp:481] Parsing resources as JSON failed: cpus:2;mem:1024;disk(role1):2048 Trying semicolon-delimited string format instead I0108 01:13:16.172266 1349 slave.cpp:392] Slave resources: cpus(*):2; mem(*):1024; disk(role1):2048; ports(*):[31000-32000] I0108 01:13:16.172327 1349 slave.cpp:400] Slave attributes: [ ] I0108 01:13:16.172340 1349 slave.cpp:405] Slave hostname: d9632dd1c41e I0108 01:13:16.172353 1349 slave.cpp:410] Slave checkpoint: true I0108 01:13:16.173418 1353 state.cpp:58] Recovering state from '/tmp/PersistentVolumeTest_BadACLNoPrincipal_yqJjLY/meta' I0108 01:13:16.173521 1325 sched.cpp:164] Version: 0.27.0 I0108 01:13:16.174054 1345 status_update_manager.cpp:200] Recovering status update manager I0108 01:13:16.174289 1353 containerizer.cpp:387] Recovering containerizer I0108 01:13:16.174295 1356 sched.cpp:268] New master detected at master@172.17.0.2:32801 I0108 01:13:16.174387 1356 sched.cpp:278] No credentials provided. Attempting to register without authentication I0108 01:13:16.174409 1356 sched.cpp:722] Sending SUBSCRIBE call to master@172.17.0.2:32801 I0108 01:13:16.174515 1356 sched.cpp:755] Will retry registration in 1.699889272secs if necessary I0108 01:13:16.174653 1349 master.cpp:2197] Received SUBSCRIBE call for framework 'no-principal' at scheduler-bf0ed267-b4c4-412d-9fb0-84c85cd2fbce@172.17.0.2:32801 I0108 01:13:16.174823 1349 master.cpp:1668] Authorizing framework principal '' to receive offers for role 'role1' I0108 01:13:16.175250 1347 master.cpp:2268] Subscribing framework no-principal with checkpointing disabled and capabilities [ ] I0108 01:13:16.175359 1353 slave.cpp:4429] Finished recovery I0108 01:13:16.175715 1345 hierarchical.cpp:260] Added framework 773d31e8-383d-4e4b-aa68-f9a3fb9f1fc2-0000 I0108 01:13:16.175734 1351 sched.cpp:649] Framework registered with 773d31e8-383d-4e4b-aa68-f9a3fb9f1fc2-0000 I0108 01:13:16.175792 1345 hierarchical.cpp:1329] No resources available to allocate! I0108 01:13:16.175833 1345 hierarchical.cpp:1423] No inverse offers to send out! I0108 01:13:16.175853 1353 slave.cpp:4601] Querying resource estimator for oversubscribable resources I0108 01:13:16.175869 1345 hierarchical.cpp:1079] Performed allocation for 0 slaves in 127881ns I0108 01:13:16.175923 1351 sched.cpp:663] Scheduler::registered took 27956ns I0108 01:13:16.176110 1353 slave.cpp:729] New master detected at master@172.17.0.2:32801 I0108 01:13:16.176187 1353 slave.cpp:792] Authenticating with master master@172.17.0.2:32801 I0108 01:13:16.176216 1353 slave.cpp:797] Using default CRAM-MD5 authenticatee I0108 01:13:16.176398 1357 status_update_manager.cpp:174] Pausing sending status updates I0108 01:13:16.176404 1353 slave.cpp:765] Detecting new master I0108 01:13:16.176463 1358 authenticatee.cpp:121] Creating new client SASL connection I0108 01:13:16.176553 1353 slave.cpp:4615] Received oversubscribable resources from the resource estimator I0108 01:13:16.176709 1353 master.cpp:5445] Authenticating slave(231)@172.17.0.2:32801 I0108 01:13:16.176823 1359 authenticator.cpp:413] Starting authentication session for crammd5_authenticatee(516)@172.17.0.2:32801 I0108 01:13:16.177135 1348 authenticator.cpp:98] Creating new server SASL connection I0108 01:13:16.177373 1356 authenticatee.cpp:212] Received SASL authentication mechanisms: CRAM-MD5 I0108 01:13:16.177399 1356 authenticatee.cpp:238] Attempting to authenticate with mechanism 'CRAM-MD5' I0108 01:13:16.177502 1344 authenticator.cpp:203] Received SASL authentication start I0108 01:13:16.177563 1344 authenticator.cpp:325] Authentication requires more steps I0108 01:13:16.177680 1346 authenticatee.cpp:258] Received SASL authentication step I0108 01:13:16.177848 1354 authenticator.cpp:231] Received SASL authentication step I0108 01:13:16.177883 1354 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: 'd9632dd1c41e' server FQDN: 'd9632dd1c41e' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0108 01:13:16.177894 1354 auxprop.cpp:179] Looking up auxiliary property '*userPassword' I0108 01:13:16.177944 1354 auxprop.cpp:179] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0108 01:13:16.177994 1354 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: 'd9632dd1c41e' server FQDN: 'd9632dd1c41e' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0108 01:13:16.178014 1354 auxprop.cpp:129] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0108 01:13:16.178040 1354 auxprop.cpp:129] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0108 01:13:16.178066 1354 authenticator.cpp:317] Authentication success I0108 01:13:16.178256 1355 authenticatee.cpp:298] Authentication success I0108 01:13:16.178315 1354 master.cpp:5475] Successfully authenticated principal 'test-principal' at slave(231)@172.17.0.2:32801 I0108 01:13:16.178356 1355 authenticator.cpp:431] Authentication session cleanup for crammd5_authenticatee(516)@172.17.0.2:32801 I0108 01:13:16.178710 1354 slave.cpp:860] Successfully authenticated with master master@172.17.0.2:32801 I0108 01:13:16.178865 1354 slave.cpp:1254] Will retry registration in 13.009431ms if necessary I0108 01:13:16.179138 1350 master.cpp:4154] Registering slave at slave(231)@172.17.0.2:32801 (d9632dd1c41e) with id 773d31e8-383d-4e4b-aa68-f9a3fb9f1fc2-S0 I0108 01:13:16.179628 1345 registrar.cpp:439] Applied 1 operations in 71663ns; attempting to update the 'registry' I0108 01:13:16.180505 1356 log.cpp:683] Attempting to append 343 bytes to the log I0108 01:13:16.180711 1352 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 3 I0108 01:13:16.181499 1350 replica.cpp:537] Replica received write request for position 3 from (7103)@172.17.0.2:32801 I0108 01:13:16.182080 1350 leveldb.cpp:341] Persisting action (362 bytes) to leveldb took 537757ns I0108 01:13:16.182112 1350 replica.cpp:712] Persisted action at 3 I0108 01:13:16.182749 1351 replica.cpp:691] Replica received learned notice for position 3 from @0.0.0.0:0 I0108 01:13:16.183120 1351 leveldb.cpp:341] Persisting action (364 bytes) to leveldb took 340999ns I0108 01:13:16.183151 1351 replica.cpp:712] Persisted action at 3 I0108 01:13:16.183177 1351 replica.cpp:697] Replica learned APPEND action at position 3 I0108 01:13:16.184787 1348 registrar.cpp:484] Successfully updated the 'registry' in 0ns I0108 01:13:16.185287 1348 log.cpp:702] Attempting to truncate the log to 3 I0108 01:13:16.185484 1349 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 4 I0108 01:13:16.186043 1353 slave.cpp:3371] Received ping from slave-observer(230)@172.17.0.2:32801 I0108 01:13:16.186074 1345 master.cpp:4222] Registered slave 773d31e8-383d-4e4b-aa68-f9a3fb9f1fc2-S0 at slave(231)@172.17.0.2:32801 (d9632dd1c41e) with cpus(*):2; mem(*):1024; disk(role1):2048; ports(*):[31000-32000] I0108 01:13:16.186224 1353 slave.cpp:904] Registered with master master@172.17.0.2:32801; given slave ID 773d31e8-383d-4e4b-aa68-f9a3fb9f1fc2-S0 I0108 01:13:16.186441 1353 fetcher.cpp:81] Clearing fetcher cache I0108 01:13:16.186486 1349 hierarchical.cpp:465] Added slave 773d31e8-383d-4e4b-aa68-f9a3fb9f1fc2-S0 (d9632dd1c41e) with cpus(*):2; mem(*):1024; disk(role1):2048; ports(*):[31000-32000] (allocated: ) I0108 01:13:16.186658 1346 status_update_manager.cpp:181] Resuming sending status updates I0108 01:13:16.186885 1353 slave.cpp:927] Checkpointing SlaveInfo to '/tmp/PersistentVolumeTest_BadACLNoPrincipal_yqJjLY/meta/slaves/773d31e8-383d-4e4b-aa68-f9a3fb9f1fc2-S0/slave.info' I0108 01:13:16.186905 1350 replica.cpp:537] Replica received write request for position 4 from (7104)@172.17.0.2:32801 I0108 01:13:16.187595 1350 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 645704ns I0108 01:13:16.187628 1350 replica.cpp:712] Persisted action at 4 I0108 01:13:16.188347 1349 hierarchical.cpp:1423] No inverse offers to send out! I0108 01:13:16.188475 1349 hierarchical.cpp:1101] Performed allocation for slave 773d31e8-383d-4e4b-aa68-f9a3fb9f1fc2-S0 in 1.861833ms I0108 01:13:16.188560 1348 replica.cpp:691] Replica received learned notice for position 4 from @0.0.0.0:0 I0108 01:13:16.188385 1353 slave.cpp:963] Forwarding total oversubscribed resources I0108 01:13:16.189275 1344 master.cpp:5274] Sending 1 offers to framework 773d31e8-383d-4e4b-aa68-f9a3fb9f1fc2-0000 (no-principal) at scheduler-bf0ed267-b4c4-412d-9fb0-84c85cd2fbce@172.17.0.2:32801 I0108 01:13:16.189792 1344 master.cpp:4564] Received update of slave 773d31e8-383d-4e4b-aa68-f9a3fb9f1fc2-S0 at slave(231)@1...",1 MESOS-4329,"SlaveTest.LaunchTaskInfoWithContainerInfo cannot be execute in isolation","Executing {{SlaveTest.LaunchTaskInfoWithContainerInfo}} from {{468b8ec}} under OS X 10.10.5 in isolation fails due to missing cleanup, {code} % ./bin/mesos-tests.sh --gtest_filter=SlaveTest.LaunchTaskInfoWithContainerInfo Source directory: /ABC/DEF/src/mesos Build directory: /ABC/DEF/src/mesos/build ------------------------------------------------------------- We cannot run any Docker tests because: Docker tests not supported on non-Linux systems ------------------------------------------------------------- /usr/bin/nc /usr/bin/curl Note: Google Test filter = SlaveTest.LaunchTaskInfoWithContainerInfo-HealthCheckTest.ROOT_DOCKER_DockerHealthyTask:HealthCheckTest.ROOT_DOCKER_DockerHealthStatusChange:HierarchicalAllocator_BENCHMARK_Test.DeclineOffers:HookTest.ROOT_DOCKER_VerifySlavePreLaunchDockerHook:SlaveTest.ROOT_RunTaskWithCommandInfoWithoutUser:SlaveTest.DISABLED_ROOT_RunTaskWithCommandInfoWithUser:DockerContainerizerTest.ROOT_DOCKER_Launch:DockerContainerizerTest.ROOT_DOCKER_Kill:DockerContainerizerTest.ROOT_DOCKER_Usage:DockerContainerizerTest.ROOT_DOCKER_Recover:DockerContainerizerTest.ROOT_DOCKER_SkipRecoverNonDocker:DockerContainerizerTest.ROOT_DOCKER_Logs:DockerContainerizerTest.ROOT_DOCKER_Default_CMD:DockerContainerizerTest.ROOT_DOCKER_Default_CMD_Override:DockerContainerizerTest.ROOT_DOCKER_Default_CMD_Args:DockerContainerizerTest.ROOT_DOCKER_SlaveRecoveryTaskContainer:DockerContainerizerTest.DISABLED_ROOT_DOCKER_SlaveRecoveryExecutorContainer:DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping:DockerContainerizerTest.ROOT_DOCKER_LaunchSandboxWithColon:DockerContainerizerTest.ROOT_DOCKER_DestroyWhileFetching:DockerContainerizerTest.ROOT_DOCKER_DestroyWhilePulling:DockerContainerizerTest.ROOT_DOCKER_ExecutorCleanupWhenLaunchFailed:DockerContainerizerTest.ROOT_DOCKER_FetchFailure:DockerContainerizerTest.ROOT_DOCKER_DockerPullFailure:DockerContainerizerTest.ROOT_DOCKER_DockerInspectDiscard:DockerTest.ROOT_DOCKER_interface:DockerTest.ROOT_DOCKER_parsing_version:DockerTest.ROOT_DOCKER_CheckCommandWithShell:DockerTest.ROOT_DOCKER_CheckPortResource:DockerTest.ROOT_DOCKER_CancelPull:DockerTest.ROOT_DOCKER_MountRelative:DockerTest.ROOT_DOCKER_MountAbsolute:CopyBackendTest.ROOT_CopyBackend:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/0:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/1:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/2:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/3:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/4:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/5:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/6:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/7:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/8:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/9:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/10:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/11:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/12:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/13:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/14:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/15:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/16:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/17:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/18:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/19:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/20:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/21:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/22:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/23:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/24:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/25:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/26:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/27:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/28:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/29:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/30:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/31:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/32:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/33:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/34:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/35:SlaveCount/Registrar_BENCHMARK_Test.Performance/0:SlaveCount/Registrar_BENCHMARK_Test.Performance/1:SlaveCount/Registrar_BENCHMARK_Test.Performance/2:SlaveCount/Registrar_BENCHMARK_Test.Performance/3 [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from SlaveTest [ RUN ] SlaveTest.LaunchTaskInfoWithContainerInfo [ OK ] SlaveTest.LaunchTaskInfoWithContainerInfo (79 ms) [----------] 1 test from SlaveTest (79 ms total) [----------] Global test environment tear-down ../../src/tests/environment.cpp:569: Failure Failed Tests completed with child processes remaining: -+- 54487 /ABC/DEF/src/mesos/build/src/.libs/mesos-tests --gtest_filter=SlaveTest.LaunchTaskInfoWithContainerInfo \--- 54503 /bin/sh /ABC/DEF/src/mesos/build/src/mesos-containerizer launch --command={""shell"":true,""value"":""\/ABC\/DEF\/src\/mesos\/build\/src\/mesos-executor""} --commands={""commands"":[]} --directory=/tmp --help=false --pipe_read=10 --pipe_write=13 --user=test [==========] 1 test from 1 test case ran. (87 ms total) [ PASSED ] 1 test. [ FAILED ] 0 tests, listed below: 0 FAILED TESTS {code} ",1 MESOS-4333,"Refactor Appc provisioner tests ","Current tests can be refactored so that we can reuse some common tasks like test image creation. This will benefit future tests like appc image puller tests.",2 MESOS-4336,"Document supported file types for archive extraction by fetcher","The Mesos fetcher extracts specified URIs if requested to do so by the scheduler. However, the documentation at http://mesos.apache.org/documentation/latest/fetcher/ doesn't list the file types /extensions that will be extracted by the fetcher. [The relevant code|https://github.com/apache/mesos/blob/master/src/launcher/fetcher.cpp#L63] specifies an exhaustive list of extensions that will be extracted, the documentation should be updated to match.",1 MESOS-4337,"Implement a simple Windows version of dirent.hpp, for compatibility.",NULL,5 MESOS-4338,"Create utilities for common shell commands used. ","We spawn shell for command line utilities like tar, untar, sha256 etc. Would be great for resuse if we can create a common utilities class/file for all these utilities. ",5 MESOS-4342,"Add parameters to apply patches quiet","Added a parameters to apply the patches quiet; so it's easy for contributor to apply patches with -c.",1 MESOS-4344,"Allow operators to assign net_cls major handles to mesos agents","The net_cls cgroup allows operators to assign a 16-bit major and 16-bit minor network handle to tasks associated with a specific net_cls cgroup. In mesos we need to give the operator the ability to fix the 16-bit major handle used in an agent. Fixing the parent handle on the agent allows operators to install default firewall rules using the parent handle to enforce a default policy (say DENY ALL) for all container traffic till the container is allocated a minor handle. A simple way to achieve this requirement is to pass the major handle as a flag to the agent at startup. ",1 MESOS-4345,"Implement a network-handle manager for net_cls cgroup subsystem","As part of implementing the net_cls cgroup isolator we need a mechanism to manage the minor handles that will allocated to containers when they are associated with a net_cls cgroup. The network-handle manager needs to provide the following functionality: a) During normal operation keep track of the free and allocated network handles. There can be a total of 64K such network handles. b) On startup, learn the allocated network handle by walking the net_cls cgroup tree for mesos and build a map of free network handles available to the agent. ",8 MESOS-4347,"GMock warning in ReservationTest.ACLMultipleOperations","{noformat} [ RUN ] ReservationTest.ACLMultipleOperations GMOCK WARNING: Uninteresting mock function call - returning directly. Function call: shutdown(0x7fa2a311b300) Stack trace: [ OK ] ReservationTest.ACLMultipleOperations (174 ms) [----------] 1 test from ReservationTest (174 ms total) {noformat} Seems to occur non-deterministically for me, maybe once per 50 runs or so. OSX 10.10",1 MESOS-4348,"GMock warning in HookTest.VerifySlaveRunTaskHook, HookTest.VerifySlaveTaskStatusDecorator","{noformat} [ RUN ] HookTest.VerifySlaveRunTaskHook GMOCK WARNING: Uninteresting mock function call - returning directly. Function call: shutdown(0x7ff079cb2420) Stack trace: [ OK ] HookTest.VerifySlaveRunTaskHook (51 ms) [ RUN ] HookTest.VerifySlaveTaskStatusDecorator GMOCK WARNING: Uninteresting mock function call - returning directly. Function call: shutdown(0x7ff079cbb790) Stack trace: [ OK ] HookTest.VerifySlaveTaskStatusDecorator (54 ms) {noformat} Occurs non-deterministically for me. OSX 10.10.",1 MESOS-4349,"GMock warning in SlaveTest.ContainerUpdatedBeforeTaskReachesExecutor","{noformat} [ RUN ] SlaveTest.ContainerUpdatedBeforeTaskReachesExecutor GMOCK WARNING: Uninteresting mock function call - returning directly. Function call: shutdown(0x7fe189cae850) Stack trace: [ OK ] SlaveTest.ContainerUpdatedBeforeTaskReachesExecutor (51 ms) {noformat} Occurs non-deterministically for me on OSX 10.10, perhaps one run in ten.",1 MESOS-4350,"GMock warning on `offerRescinded` in `ReservationTest` fixture","Several tests involving checkpointing of resources in the {{ReservationTest}} fixture are throwing GMock warnings occasionally. Here is the output of {{GTEST_FILTER=""ReservationTest.*"" bin/mesos-tests.sh --gtest_repeat=10000 --gtest_break_on_failure=1 | grep -B 3 -A 6 WARNING}}: {code} ------------------------------------------------------------- We cannot run any Docker tests because: Docker tests not supported on non-Linux systems ------------------------------------------------------------- [ OK ] ReservationTest.MasterFailover (89 ms) [ RUN ] ReservationTest.CompatibleCheckpointedResources GMOCK WARNING: Uninteresting mock function call - returning directly. Function call: offerRescinded(0x7fff5f014960, @0x7feec320fab0 65537c10-285c-419e-b89f-191283402d85-O1) Stack trace: [ OK ] ReservationTest.CompatibleCheckpointedResources (52 ms) [ RUN ] ReservationTest.CompatibleCheckpointedResourcesWithPersistentVolumes [ OK ] ReservationTest.CompatibleCheckpointedResourcesWithPersistentVolumes (45 ms) -- -- [ OK ] ReservationTest.CompatibleCheckpointedResources (46 ms) [ RUN ] ReservationTest.CompatibleCheckpointedResourcesWithPersistentVolumes GMOCK WARNING: Uninteresting mock function call - returning directly. Function call: offerRescinded(0x7fff5f014960, @0x7feec796f220 bf4e1b52-02db-4763-8be0-3c759c80f1ba-O1) Stack trace: [ OK ] ReservationTest.CompatibleCheckpointedResourcesWithPersistentVolumes (63 ms) [ RUN ] ReservationTest.IncompatibleCheckpointedResources [ OK ] ReservationTest.IncompatibleCheckpointedResources (45 ms) -- -- [ OK ] ReservationTest.CompatibleCheckpointedResources (42 ms) [ RUN ] ReservationTest.CompatibleCheckpointedResourcesWithPersistentVolumes GMOCK WARNING: Uninteresting mock function call - returning directly. Function call: offerRescinded(0x7fff5f014960, @0x7feec7ad92b0 42a9f1ff-122e-4df7-9530-a96126e36f84-O1) Stack trace: [ OK ] ReservationTest.CompatibleCheckpointedResourcesWithPersistentVolumes (65 ms) [ RUN ] ReservationTest.IncompatibleCheckpointedResources [ OK ] ReservationTest.IncompatibleCheckpointedResources (46 ms) -- -- [ OK ] ReservationTest.CompatibleCheckpointedResourcesWithPersistentVolumes (49 ms) [ RUN ] ReservationTest.IncompatibleCheckpointedResources GMOCK WARNING: Uninteresting mock function call - returning directly. Function call: offerRescinded(0x7fff5f014960, @0x7feec7af4310 d5e1005f-abb8-4bfd-92e0-3976ee150fbf-O1) Stack trace: [ OK ] ReservationTest.IncompatibleCheckpointedResources (94 ms) [ RUN ] ReservationTest.GoodACLReserveThenUnreserve [ OK ] ReservationTest.GoodACLReserveThenUnreserve (57 ms) -- -- [ OK ] ReservationTest.CompatibleCheckpointedResources (43 ms) [ RUN ] ReservationTest.CompatibleCheckpointedResourcesWithPersistentVolumes GMOCK WARNING: Uninteresting mock function call - returning directly. Function call: offerRescinded(0x7fff5f014960, @0x7feec7cdadc0 36e15f52-3299-46fa-850d-970097fef8e2-O1) Stack trace: [ OK ] ReservationTest.CompatibleCheckpointedResourcesWithPersistentVolumes (62 ms) [ RUN ] ReservationTest.IncompatibleCheckpointedResources [ OK ] ReservationTest.IncompatibleCheckpointedResources (46 ms) -- -- [ OK ] ReservationTest.CompatibleCheckpointedResources (47 ms) [ RUN ] ReservationTest.CompatibleCheckpointedResourcesWithPersistentVolumes GMOCK WARNING: Uninteresting mock function call - returning directly. Function call: offerRescinded(0x7fff5f014960, @0x7feec8c1b580 c8dd35ab-7363-40e0-8e20-8c7dc76a8497-O1) Stack trace: [ OK ] ReservationTest.CompatibleCheckpointedResourcesWithPersistentVolumes (62 ms) [ RUN ] ReservationTest.IncompatibleCheckpointedResources [ OK ] ReservationTest.IncompatibleCheckpointedResources (45 ms) -- -- [ OK ] ReservationTest.CompatibleCheckpointedResources (47 ms) [ RUN ] ReservationTest.CompatibleCheckpointedResourcesWithPersistentVolumes GMOCK WARNING: Uninteresting mock function call - returning directly. Function call: offerRescinded(0x7fff5f014960, @0x7feecbd9b5b0 031c2148-8a20-4532-b77f-b6200c3791c8-O1) Stack trace: [ OK ] ReservationTest.CompatibleCheckpointedResourcesWithPersistentVolumes (62 ms) [ RUN ] ReservationTest.IncompatibleCheckpointedResources [ OK ] ReservationTest.IncompatibleCheckpointedResources (46 ms) -- -- [ OK ] ReservationTest.CompatibleCheckpointedResourcesWithPersistentVolumes (47 ms) [ RUN ] ReservationTest.IncompatibleCheckpointedResources GMOCK WARNING: Uninteresting mock function call - returning directly. Function call: offerRescinded(0x7fff5f014960, @0x7feecd52adb0 edc5a322-b220-4b13-a39b-99a523b172ba-O1) Stack trace: [ OK ] ReservationTest.IncompatibleCheckpointedResources (76 ms) [ RUN ] ReservationTest.GoodACLReserveThenUnreserve [ OK ] ReservationTest.GoodACLReserveThenUnreserve (63 ms) -- -- [ OK ] ReservationTest.SendingCheckpointResourcesMessage (45 ms) [ RUN ] ReservationTest.ResourcesCheckpointing GMOCK WARNING: Uninteresting mock function call - returning directly. Function call: offerRescinded(0x7fff5f015df8, @0x7feecfe16f00 09a90e67-a40f-4e42-8802-1a5644733a06-O1) Stack trace: [ OK ] ReservationTest.ResourcesCheckpointing (60 ms) [ RUN ] ReservationTest.MasterFailover [ OK ] ReservationTest.MasterFailover (89 ms) -- -- [ OK ] ReservationTest.CompatibleCheckpointedResources (43 ms) [ RUN ] ReservationTest.CompatibleCheckpointedResourcesWithPersistentVolumes GMOCK WARNING: Uninteresting mock function call - returning directly. Function call: offerRescinded(0x7fff5f014960, @0x7feecacceba0 84965984-28cd-4bc8-b25b-746583477d09-O1) Stack trace: [ OK ] ReservationTest.CompatibleCheckpointedResourcesWithPersistentVolumes (58 ms) [ RUN ] ReservationTest.IncompatibleCheckpointedResources [ OK ] ReservationTest.IncompatibleCheckpointedResources (68 ms) {code}",2 MESOS-4353,"Limit the number of processes created by libprocess","Currently libprocess will create {{max(8, number of CPU cores)}} processes during the initialization, see https://github.com/apache/mesos/blob/0.26.0/3rdparty/libprocess/src/process.cpp#L2146 for details. This should be OK for a normal machine which has no much cores (e.g., 16, 32), but for a powerful machine which may have a large number of cores (e.g., an IBM Power machine may have 192 cores), this will cause too much worker threads which are not necessary. And since libprocess is widely used in Mesos (master, agent, scheduler, executor), it may also cause some performance issue. For example, when user creates a Docker container via Mesos in a Mesos agent which is running on a powerful machine with 192 cores, the DockerContainerizer in Mesos agent will create a dedicated executor for the container, and there will be 192 worker threads in that executor. And if user creates 1000 Docker containers in that machine, then there will be 1000 executors, i.e., 1000 * 192 worker threads which is a large number and may thrash the OS. ",1 MESOS-4357,"GMock warning in RoleTest.ImplicitRoleStaticReservation","{noformat} [ RUN ] RoleTest.ImplicitRoleStaticReservation GMOCK WARNING: Uninteresting mock function call - returning directly. Function call: shutdown(0x7fe37a4752f0) Stack trace: [ OK ] RoleTest.ImplicitRoleStaticReservation (52 ms) {noformat}",1 MESOS-4358,"Expose net_cls network handles in agent's state endpoint","We need to expose net_cls network handles, associated with containers, to operators and network utilities that would use these network handles to enforce network policy. In order to achieve the above we need to add a new field in the `NetworkInfo` protobuf (say NetHandles) and update this field when a container gets assigned to a net_cls cgroup. The `ContainerStatus` protobuf already has the `NetworkInfo` protobuf as a nested message, and the `ContainerStatus` itself is exposed to operators as part of TaskInfo (for tasks associated with the container) in an agent's state.json. ",2 MESOS-4359,"GMock warning in DockerContainerizerTest.ROOT_DOCKER_DockerInspectDiscard","The following GMock warning was seen on CentOS 7.1: {code} [ RUN ] DockerContainerizerTest.ROOT_DOCKER_DockerInspectDiscard GMOCK WARNING: Uninteresting mock function call - returning directly. Function call: executorLost(0x7ffdd74f73e0, @0x7f3e3c00fa20 e1, @0x7f3e3c00f4b0 cf212bb4-c8c5-4a43-b71f-c17b27458627-S0, -1) Stack trace: [ OK ] DockerContainerizerTest.ROOT_DOCKER_DockerInspectDiscard (405 ms) {code}",2 MESOS-4360,"Create common tar/untar utility function.","As part of refactoring and creating a common place to add all command utilities, add *tar* and *untar* as the first POC.",3 MESOS-4362,"Formating issues and broken links in documentation.","The online documentation has a number of bad formatting issues and broken links (e.g., mesos-provider.md).",1 MESOS-4363,"Add a roles field to FrameworkInfo","To represent multiple roles per framework a new repeated string field for roles is needed.",1 MESOS-4364,"Add roles validation code to master","A {{FrameworkInfo}} can only have one of role or roles. A natural location for this appears to be under {{validation::operation::validate}}.",5 MESOS-4365,"Add internal migration from role to roles to master","If only the {{role}} field is given, add it as single entry to {{roles}}. Add a note to {{CHANGELOG}}/release notes on deprecation of the existing {{role}} field. File a JIRA issue for removal of that migration code once the deprecation cycle is over. ",3 MESOS-4366,"Migrate all existing uses of FrameworkInfo.role to FrameworkInfo.roles",NULL,3 MESOS-4367,"Add tracking of the role a Resource was offered for","If a framework can have multiple roles, we need a way to identify for which of the framework's role a resource was offered for (e.g., for resource recovery and reconciliation).",5 MESOS-4368,"Make HierarchicalAllocatorProcess set a Resource's active role during allocation","The concrete implementation here depends on the implementation strategy used to solve MESOS-4367.",3 MESOS-4376,"Document semantics of `slaveLost`","We should clarify the semantics of this callback: * Is it always invoked, or just a hint? * Can a slave ever come back from `slaveLost`? * What happens to persistent resources on a lost slave? The new HA framework development guide might be a good place to put (some of?) this information. ",2 MESOS-4377,"Document units associated with resource types","We should document the units associated with memory and disk resources.",1 MESOS-4378,"Add Source to Resource.DiskInfo.","Source is used to describe the extra information about the source of a Disk resource. We will support 'PATH' type first and then 'BLOCK' later. {noformat} message Source { enum Type { PATH = 1; BLOCK = 2, } message Path { // Path to the folder (e.g., /mnt/raid/disk0). required string root = 1; required double total_size = 2; } message Block { // Path to the device file (e.g., /dev/sda1, /dev/vg/v1). // It can be a physical partition, or a logical volume (LVM). required string device = 1; } required Type type = 1; optional Path path = 2; optional Block block = 3; } } {noformat}",1 MESOS-4379,"Design doc for reservation IDs",NULL,3 MESOS-4380,"Adjust Resource arithmetics for DiskInfo.Source.","Since we added the Source for DiskInfo, we need to adjust the Resource arithmetics for that. That includes equality check, addable check, subtractable check, etc.",2 MESOS-4381,"Improve upgrade compatibility documentation.","Investigate and document upgrade compatibility for 0.27 release.",3 MESOS-4382,"Change the `principal` in `ReservationInfo` to optional","With the addition of HTTP endpoints for {{/reserve}} and {{/unreserve}}, it is now desirable to allow dynamic reservations without a principal, in the case where HTTP authentication is disabled. To allow for this, we will change the {{principal}} field in {{ReservationInfo}} from required to optional. For backwards-compatibility, however, the master should currently invalidate any {{ReservationInfo}} messages that do not have this field set.",1 MESOS-4383,"Support docker runtime configuration env var from image.","We need to support env var configuration returned from docker image in mesos containerizer.",2 MESOS-4385,"Offers and InverseOffers cannot be accepted in the same ACCEPT call","*Problem* * In {{Master::accept}}, {{validation::offer::validate}} returns an error when an {{InverseOffer}} is included in the list of {{OfferIDs}} in an {{ACCEPT}} call. * If an {{Offer}} is part of the same {{ACCEPT}}, the master sees {{error.isSome()}} and returns a {{TASK_LOST}} for normal offers. (https://github.com/apache/mesos/blob/fafbdca610d0a150b9fa9cb62d1c63cb7a6fdaf3/src/master/master.cpp#L3117) Here's a regression test: https://reviews.apache.org/r/42092/ *Proprosal* The question is whether we want to allow the mixing of {{Offers}} and {{InverseOffers}}. Arguments for mixing: * The design/structure of the maintenance originally intended to overload {{ACCEPT}} and {{DECLINE}} to take inverse offers. * Enforcing non-mixing may require breaking changes to {{scheduler.proto}}. Arguments against mixing: * Some semantics are difficult to explain. What does it mean to supply {{InverseOffers}} with {{Offer::Operations}}? What about {{DECLINE}} with {{Offers}} and {{InverseOffers}}, including a ""reason""? * What happens if we presumably add a third type of offer? * Does it make sense to {{TASK_LOST}} valid normal offers if {{InverseOffers}} are invalid?",2 MESOS-4386,"Deprecate 'authenticate' master flag in favor of 'authenticate_frameworks' flag","To be consistent with `authenticate_slaves` and `authenticate_http` flags, we should rename `authenticate` to `authenticate_frameworks` flag. This should be done via deprecation cycle. 1) Release X supports both `authenticate` and `authenticate_frameworks` flags 2) Release X + n supports only `authenticate_frameworks` flag. ",1 MESOS-4390,"Shared Volumes Design Doc","Review & Approve design doc",3 MESOS-4393,"Draft design document for resource revocability by default.","Create a design document for setting offered resources as ""revocable by default"". Greedy frameworks can then temporarily use resources set aside to satisfy quota. ",8 MESOS-4395,"Add persistent volume endpoint tests with no principal","There are currently no persistent volume endpoint tests that do not use a principal; they should be added.",1 MESOS-4397,"Rename ContainerPrepareInfo to ContainerLaunchInfo for isolators.","The name ""ContainerPrepareInfo"" does not really capture the purpose of this struct. ContainerLaunchInfo better captures the purpose of this struct. ContainerLaunchInfo is returned by the isolator 'prepare' function. It contains information about how a container should be launched (e.g., environment variables, namespaces, commands, etc.). The information will be used by the Mesos Containerizer when launching the container.",2 MESOS-4398,"Synchronously handle AuthZ errors for the Scheduler endpoint.","Currently, any AuthZ errors for the {{/scheduler}} endpoint are handled asynchronously as {{FrameworkErrorMessage}}. Here is an example: {code} if (authorizationError.isSome()) { LOG(INFO) << ""Refusing subscription of framework"" << "" '"" << frameworkInfo.name() << ""'"" << "": "" << authorizationError.get().message; FrameworkErrorMessage message; message.set_message(authorizationError.get().message); http.send(message); http.close(); return; } {code} We would like to handle such errors synchronously when the request is received similar to what other endpoints like {{/reserve}}/{{/quota}} do. We already have the relevant functions {{authorizeXXX}} etc in {{master.cpp}}. We should just make the requests pass through once the relevant {{Future}} from the {{authorizeXXX}} function is fulfilled.",5 MESOS-4400,"Create persistent volume directories based on DiskInfo.Source.","Currently, we always create persistent volumes from root disk, and the persistent volumes are directories. With DiskInfo.Source being added, we should create the persistent volume accordingly based on the information in DiskInfo.Source. This ticket handles the case where DiskInfo.Source.type is PATH. In that case, we should create sub-directories and use the same layout as slave.work_dir. See the relevant code here: {code} void Slave::checkpointResources(...) { // Creates persistent volumes that do not exist and schedules // releasing those persistent volumes that are no longer needed. // // TODO(jieyu): Consider introducing a volume manager once we start // to support multiple disks, or raw disks. Depending on the // DiskInfo, we may want to create either directories under a root // directory, or LVM volumes from a given device. Resources volumes = newCheckpointedResources.persistentVolumes(); foreach (const Resource& volume, volumes) { // This is validated in master. CHECK_NE(volume.role(), ""*""); string path = paths::getPersistentVolumePath( flags.work_dir, volume.role(), volume.disk().persistence().id()); if (!os::exists(path)) { CHECK_SOME(os::mkdir(path, true)) << ""Failed to create persistent volume at '"" << path << ""'""; } } } {code}",2 MESOS-4402,"Update filesystem isolators to look for persistent volume directories from the correct location.","This is related to MESOS-4400. Since persistent volume directories can be created from non root disk now. We need to adjust both posix and linux filesystem isolator to look for volumes from the correct location based on the information in DiskInfo.Source. See relevant code in: {code} Future PosixFilesystemIsolatorProcess::update(..); Future LinuxFilesystemIsolatorProcess::update(..); {code}",2 MESOS-4403,"Check paths in DiskInfo.Source.Path exist during slave initialization.","We have two options here. We can either check and fail if it does not exists. Or we can create if it does not exist like we did for slave.work_dir.",2 MESOS-4410,"Introduce protobuf for quota set request.","To document quota request JSON schema and simplify request processing, introduce a {{QuotaRequest}} protobuf wrapper.",3 MESOS-4411,"Traverse all roles for quota allocation.","There might be a bug in how resources are allocated to multiple quota'ed roles if one role's quota is met. We need to investigate this behavior.",3 MESOS-4415,"Implement stout/os/windows/rmdir.hpp",NULL,5 MESOS-4417,"Prevent allocator from crashing on successful recovery.","There might be a bug that may crash the master as pointed out by [~bmahler] in https://reviews.apache.org/r/42222/: {noformat} It looks like if we trip the resume call in addSlave, this delayed resume will crash the master due to the CHECK(paused) that currently resides in resume. {noformat}",3 MESOS-4421,"Document that /reserve, /create-volumes endpoints can return misleading ""success""","The docs for the {{/reserve}} endpoint say: {noformat} 200 OK: Success (the requested resources have been reserved). {noformat} This is not true: the master returns {{200}} when the request has been validated and a {{CheckpointResourcesMessage}} has been sent to the agent, but the master does not attempt to verify that the message has been received or that the agent successfully checkpointed. Same behavior applies to {{/unreserve}}, {{/create-volumes}}, and {{/destroy-volumes}}. We should _either_: 1. Accurately document what {{200}} return code means. 2. Change the implementation to wait for the agent's next checkpoint to succeed (and to include the effect of the operation) before returning success to the HTTP client.",3 MESOS-4425,"Introduce filtering test abstractions for HTTP events to libprocess","We need a test abstraction for {{HttpEvent}} similar to the already existing one's for {{DispatchEvent}}, {{MessageEvent}} in libprocess. The abstraction can look similar in semantics to the already existing {{FUTURE_DISPATCH}}/{{FUTURE_MESSAGE}}.",3 MESOS-4433,"Implement a callback testing interface for the Executor Library","Currently, we do not have a mocking based callback interface for the executor library. This should look similar to the ongoing work for MESOS-3339 i.e. the corresponding issue for the scheduler library. The interface should allow us to set expectations like we do for the driver. An example: {code} EXPECT_CALL(executor, connected()) .Times(1) {code}",3 MESOS-4434,"Install 3rdparty package boost, glog, protobuf and picojson when installing Mesos","Mesos modules depend on having these packages installed with the exact version as Mesos was compiled with.",3 MESOS-4435,"Update `Master::Http::stateSummary` to use `jsonify`.","Update {{state-summary}} to use {{jsonify}} to stay consistent with {{state}} HTTP endpoint.",3 MESOS-4437,"Disable the test RegistryClientTest.BadTokenServerAddress.","As we are retiring registry client, disable this test which looks flaky.",1 MESOS-4438,"Add 'dependency' message to 'AppcImageManifest' protobuf.","AppcImageManifest protobuf currently lacks 'dependencies' which is necessary for image discovery.",1 MESOS-4439,"Fix appc CachedImage image validation","Currently image validation is done assuming that the image's filename will have digest (SHA-512) information. This is not part of the spec (https://github.com/appc/spec/blob/master/spec/discovery.md). The spec specifies the tuple as unique identifier for discovering an image. ",1 MESOS-4441,"Allocate revocable resources beyond quota guarantee.","h4. Status Quo Currently resources allocated to frameworks in a role with quota (aka quota'ed role) beyond quota guarantee are marked non-revocable. This impacts our flexibility for revoking them if we decide so in the future. h4. Proposal Once quota guarantee is satisfied we must not necessarily further allocate resources as non-revocable. Instead we can mark all offers resources beyond guarantee as revocable. When in the future {{RevocableInfo}} evolves frameworks will get additional information about ""revocability"" of the resource (i.e. allocation slack) h4. Caveats Though it seems like a simple change, it has several implications. h6. Fairness Currently the hierarchical allocator considers revocable resources as regular resources when doing fairness calculations. This may prevent frameworks getting non-revocable resources as part of their role's quota guarantee if they accept some revocable resources as well. Consider the following scenario. A single framework in a role with quota set to {{10}} CPUs is allocated {{10}} CPUs as non-revocable resources as part of its quota and additionally {{2}} revocable CPUs. Now a task using {{2}} non-revocable CPUs finishes and its resources are returned. Total allocation for the role is {{8}} non-revocable + {{2}} revocable. However, the role may not be offered additional {{2}} non-revocable since its total allocation satisfies quota. h6. Resource math If we allocate non-revocable resources as revocable, we should make sure we do accounting right: either we should update total agent resources and mark them as revocable as well, or bookkeep resources as non-revocable and convert them to revocable when necessary. h6. Coarse-grained nature of allocation The hierarchical allocator performs ""coarse-grained"" allocation, meaning it always allocates the entire remaining agent resources to a single framework. This may lead to over-allocating some resources as non-revocable beyond quota guarantee. h6. Quotas smaller than fair share If a quota set for a role is smaller than its fair share, it may reduce the amount of resources offered to this role, if frameworks in it do not accept revocable resources. This is probably the most important consequence of the proposed change. Operators may set quota to get guarantees, but may observe a decrease in amount of resources a role gets, which is not intuitive.",8 MESOS-4443,"Refactor allocator recovery.","Allocator recovery code can be improved for readability. [~bmahler] left some thoughts about it in https://reviews.apache.org/r/42222/.",3 MESOS-4444,"Design doc for reservation labels",NULL,3 MESOS-4445,"Labels equality behavior is wrong","{noformat} TEST(RevocableResourceTest, LabelSemantics) { Labels labels1; Labels labels2; labels1.add_labels()->CopyFrom(createLabel(""foo"", ""bar"")); labels1.add_labels()->CopyFrom(createLabel(""foo"", ""bar"")); labels2.add_labels()->CopyFrom(createLabel(""foo"", ""bar"")); labels2.add_labels()->CopyFrom(createLabel(""baz"", ""qux"")); bool eq = (labels1 == labels2); LOG(INFO) << ""Equal? "" << (eq ? ""true"" : ""false""); } {noformat} Output: {noformat} [ RUN ] RevocableResourceTest.LabelSemantics I0120 13:15:25.207223 2078158848 resources_tests.cpp:1990] Equal? true [ OK ] RevocableResourceTest.LabelSemantics (0 ms) {noformat} This behavior seems pretty problematic.",5 MESOS-4449,"SegFault on agent during executor startup","When repeatedly performing our system tests we have found that we get a segfault on one of the agents. It probably occurs about one time in ten. I have attached the full log from that agent. I've attached the log from the agent that failed and the master (although I think this is less helpful). To reproduce - I have no idea. It seems to occur at certain times. E.g. like if a packet is created right on a minute boundary or something. But I don't think it's something caused by our code because the timestamps are stamped by mesos. I was surprised not to find a bug already open.",1 MESOS-4452,"Improve documentation around roles, principals, authz, and reservations","* What is the difference between a role and a principal? * Why do some ACL entities reference ""roles"" but others reference ""principals""? In a typical organization, what real-world entities would my roles vs. principals map to? The ACL documentation could use more information about the motivation of ACLs and examples of configuring ACLs to meet real-world security policies. * We should give some examples of making reservations when the role and principal are different, and why you would want to do that * We should add an example to the ACL page that includes setting ACLs for reservations and/or persistent volumes",2 MESOS-4454,"Create common sha512 compute utility function.","Add common utility function for computing digests. Start with `sha512` since its immediately needed by appc image fetcher. ",2 MESOS-4457,"Implement tests for the new Executor library","We need to add tests for the executor library {{src/executor/executor.cpp}}. One possible approach would be to use the existing tests in {{src/tests/scheduler_tests.cpp}} and make them use the new executor library.",3 MESOS-4459,"Implement AuthN handling on the scheduler library","Currently, we do not have the ability of passing {{Credentials}} via the scheduler library. Once the master supports AuthN handling for the {{/scheduler}} endpoint, we would need to add this support to the library.",3 MESOS-4460,"Enable Framework->Executor message optimization for HTTP API","Currently, we support sending framework->executor messages directly as an optimization. This is not currently possible with using the Scheduler HTTP API. We should think about exploring possible alternatives for supporting this optimization.",13 MESOS-4461,"Enable Executor->Framework message optimization for HTTP API","Currently, we support sending executor->framework messages directly as an optimization. This is not currently possible with using the Scheduler HTTP API. We should think about exploring possible alternatives for supporting this optimization.",13 MESOS-4466,"Implement `waitpid` in Windows",NULL,5 MESOS-4471,"Implement process querying/counting in Windows",NULL,2 MESOS-4478,"ReviewBot seemed to be crashing ReviewBoard server when posting large reviews","The bot is currently tripping on this review https://reviews.apache.org/r/42506/ (see builds #10973 to #10978). [~jfarrell] looked at the server logs and said he saw 'MySQL going away' message when the mesos bot was making these requests. I think that error is a bit misleading because it happens only for this review (which has a huge error log due to bad patch). The bot has successfully posted reviews for other review requests which had no error log (good patch). One way to fix this would be to just post a tail of the error log (and perhaps link to Jenkins Console or some other service for the longer error text).",2 MESOS-4479,"Implement reservation labels",NULL,5 MESOS-4487,"Introduce status() interface in `Containerizer`","In the Containerizer, during container isolation, the isolators end up modifying the state of the containers. Examples would be IP address allocation to a container by the 'network isolator, or net_cls handle allocation by the cgroup/net_cls isolator. Often times the state of the container, needs to be exposed to operators through the state.json end-point. For e.g. operators or frameworks might want to know the IP-address configured on a particular container, or the net_cls handle associated with a container to configure the right TC rules. However, at present, there is no clean interface for the slave to retrieve the state of a container from the Containerizer for any of the launched containers. Thus, we need to introduce a `status` interface in the `Containerizer` base class, in order for the slave to expose container state information in its state.json. ",2 MESOS-4488,"Define a CgroupInfo protobuf to expose cgroup isolator configuration.","Within `MesosContainerizer` we have an isolator associated with each linux cgroup subsystem. The isolators apply subsystem specific configuration on the containers before launching the containers. For e.g cgroup/net_cls isolator applies net_cls handles, cgroup/mem isolator applies memory quotas, cgroups/cpu-share isolator configures cpu shares. Currently, there is no message structure defined to capture the configuration information of the container, for each cgroup isolator that has been applied to the container. We therefore need to define a protobuf that can capture the cgroup configuration of each cgroup isolator that has been applied to the container. This protobuf will be filled in by the cgroup isolator and will be stored as part of `ContainerConfig` in the containerizer. ",1 MESOS-4489,"The `cgroups/net_cls` isolator needs to expose handles in the ContainerStatus","The `cgroup/net_cls` isolator is responsible for allocating network handles to containers launched within a net_cls cgroup. The `cgroup/net_cls` isolator needs to expose these handles to the containerizer as part of the `ContainerStatus` when the containerizer queries the status() method of the isolator. The information itself will go as part of a `CgroupInfo` protobuf that will be defined as part of MESOS-4488 . ",1 MESOS-4490,"Get container status information in slave. ","As part of MESOS-4487 an interface will be introduce into the `Containerizer` to allow agents to retrieve container state information. The agent needs to use this interface to retrieve container state information during status updates from the executor. The container state information can be then use by the agent to expose various isolator specific configuration (for e.g., IP address allocated by network isolators, net_cls handles allocated by `cgroups/net_cls` isolator), that has been applied to the container, in the state.json endpoint. ",3 MESOS-4493,"Add ability to create symlink on Windows",NULL,3 MESOS-4494,"Implement `size`, `usage`, and other disk metrics reporting on Windows.",NULL,3 MESOS-4495,"Delete `os::chown` on Windows",NULL,1 MESOS-4498,"Refactor os.hpp to be less monolithic, and more cross-platform compatible",NULL,1 MESOS-4499,"Docker provisioner store should reuse existing layers in the cache.","Currently, the docker provisioner store will download all the layers associated with an image if the image is not found locally, even though some layers of it might already exist in the cache. This is problematic because anytime a user deploys a new image, Mesos will fetch all layers of that new image, even though most of the layers are already cached locally. ",5 MESOS-4500,"Expose ExecutorInfo and TaskInfo for isolators.","Currently we do not have these info for isolator. Image once we have docker runtime isolator, CommandInfo is necessary to support either custom executor or command executor. ",2 MESOS-4505,"Hierarchical allocator performance is slow due to Quota","Since we do not strip the non-scalar resources during the resource arithmetic for quota, the performance can degrade significantly, as currently resource arithmetic is expensive. One approach to resolving this is to filter the resources we use to perform this arithmetic to only use scalars. This is valid as quota can currently only be set for scalar resource types.",3 MESOS-4506,"Posix disk isolator should ignore disk quota enforcement for MOUNT type disk resources.","We assume MOUNT type disk is exclusive and the underlying filesystem will enforce the quota (i.e., the application won't be able to exceed the quota, and will get a write error it the disk is full). Therefore, there's no need to enforce it's quota in posix disk isolator.",2 MESOS-4512,"Render quota status consistently with other endpoints.","Currently quota status endpoint returns a collection of {{QuotaInfo}} protos converted to JSON. An example response looks like this: {code:xml} { ""infos"": [ { ""role"": ""role1"", ""guarantee"": [ { ""name"": ""cpus"", ""role"": ""*"", ""type"": ""SCALAR"", ""scalar"": { ""value"": 12 } }, { ""name"": ""mem"", ""role"": ""*"", ""type"": ""SCALAR"", ""scalar"": { ""value"": 6144 } } ] } ] } {code} Presence of some fields, e.g. ""role"", is misleading. To address this issue and make the output more informative, we should probably introduce a {{model()}} function for {{QuotaStatus}}.",3 MESOS-4513,"Build failure when using gcc-4.9 - signed/unsigned mismatch.","When building the current master, the following happens when using gcc-4.9: {noformat} mv -f examples/.deps/persistent_volume_framework-persistent_volume_framework.Tpo examples/.deps/persistent_volume_framework-persistent_volume_framework.Po g++-4.9 -DPACKAGE_NAME=\""mesos\"" -DPACKAGE_TARNAME=\""mesos\"" -DPACKAGE_VERSION=\""0.27.0\"" -DPACKAGE_STRING=\""mesos\ 0.27.0\"" -DPACKAGE_BUGREPORT=\""\"" -DPACKAGE_URL=\""\"" -DPACKAGE=\""mesos\"" -DVERSION=\""0.27.0\"" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\"".libs/\"" -DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -I. -I../../src -Wall -Werror -DLIBDIR=\""/usr/local/lib\"" -DPKGLIBEXECDIR=\""/usr/local/libexec/mesos\"" -DPKGDATADIR=\""/usr/local/share/mesos\"" -I../../include -I../../3rdparty/libprocess/include -I../../3rdparty/libprocess/3rdparty/stout/include -I../include -I../include/mesos -isystem ../3rdparty/libprocess/3rdparty/boost-1.53.0 -I../3rdparty/libprocess/3rdparty/picojson-1.3.0 -DPICOJSON_USE_INT64 -D__STDC_FORMAT_MACROS -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include -I../3rdparty/zookeeper-3.4.5/src/c/generated -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -DSOURCE_DIR=\""/Users/till/Development/mesos-private/build/..\"" -DBUILD_DIR=\""/Users/till/Development/mesos-private/build\"" -I../3rdparty/libprocess/3rdparty/gmock-1.7.0/gtest/include -I../3rdparty/libprocess/3rdparty/gmock-1.7.0/include -I/usr/local/opt/openssl/include -I/usr/local/opt/libevent/include -I/usr/local/opt/subversion/include/subversion-1 -I/usr/include/apr-1 -I/usr/include/apr-1.0 -D_THREAD_SAFE -pthread -g1 -O0 -Wno-unused-local-typedefs -std=c++11 -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -MT tests/mesos_tests-container_logger_tests.o -MD -MP -MF tests/.deps/mesos_tests-container_logger_tests.Tpo -c -o tests/mesos_tests-container_logger_tests.o `test -f 'tests/container_logger_tests.cpp' || echo '../../src/'`tests/container_logger_tests.cpp mv -f slave/qos_controllers/.deps/mesos_tests-load.Tpo slave/qos_controllers/.deps/mesos_tests-load.Po g++-4.9 -DPACKAGE_NAME=\""mesos\"" -DPACKAGE_TARNAME=\""mesos\"" -DPACKAGE_VERSION=\""0.27.0\"" -DPACKAGE_STRING=\""mesos\ 0.27.0\"" -DPACKAGE_BUGREPORT=\""\"" -DPACKAGE_URL=\""\"" -DPACKAGE=\""mesos\"" -DVERSION=\""0.27.0\"" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\"".libs/\"" -DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -I. -I../../src -Wall -Werror -DLIBDIR=\""/usr/local/lib\"" -DPKGLIBEXECDIR=\""/usr/local/libexec/mesos\"" -DPKGDATADIR=\""/usr/local/share/mesos\"" -I../../include -I../../3rdparty/libprocess/include -I../../3rdparty/libprocess/3rdparty/stout/include -I../include -I../include/mesos -isystem ../3rdparty/libprocess/3rdparty/boost-1.53.0 -I../3rdparty/libprocess/3rdparty/picojson-1.3.0 -DPICOJSON_USE_INT64 -D__STDC_FORMAT_MACROS -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include -I../3rdparty/zookeeper-3.4.5/src/c/generated -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -DSOURCE_DIR=\""/Users/till/Development/mesos-private/build/..\"" -DBUILD_DIR=\""/Users/till/Development/mesos-private/build\"" -I../3rdparty/libprocess/3rdparty/gmock-1.7.0/gtest/include -I../3rdparty/libprocess/3rdparty/gmock-1.7.0/include -I/usr/local/opt/openssl/include -I/usr/local/opt/libevent/include -I/usr/local/opt/subversion/include/subversion-1 -I/usr/include/apr-1 -I/usr/include/apr-1.0 -D_THREAD_SAFE -pthread -g1 -O0 -Wno-unused-local-typedefs -std=c++11 -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -MT tests/mesos_tests-containerizer.o -MD -MP -MF tests/.deps/mesos_tests-containerizer.Tpo -c -o tests/mesos_tests-containerizer.o `test -f 'tests/containerizer.cpp' || echo '../../src/'`tests/containerizer.cpp In file included from ../3rdparty/libprocess/3rdparty/gmock-1.7.0/include/gmock/internal/gmock-internal-utils.h:47:0, from ../3rdparty/libprocess/3rdparty/gmock-1.7.0/include/gmock/gmock-actions.h:46, from ../3rdparty/libprocess/3rdparty/gmock-1.7.0/include/gmock/gmock.h:58, from ../../src/tests/container_logger_tests.cpp:21: ../3rdparty/libprocess/3rdparty/gmock-1.7.0/gtest/include/gtest/gtest.h: In instantiation of 'testing::AssertionResult testing::internal::CmpHelperLE(const char*, const char*, const T1&, const T2&) [with T1 = int; T2 = long long unsigned int]': ../../src/tests/container_logger_tests.cpp:467:3: required from here ../3rdparty/libprocess/3rdparty/gmock-1.7.0/gtest/include/gtest/gtest.h:1579:28: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare] GTEST_IMPL_CMP_HELPER_(LE, <=); ^ ../3rdparty/libprocess/3rdparty/gmock-1.7.0/gtest/include/gtest/gtest.h:1562:12: note: in definition of macro 'GTEST_IMPL_CMP_HELPER_' if (val1 op val2) {\ ^ ../3rdparty/libprocess/3rdparty/gmock-1.7.0/gtest/include/gtest/gtest.h: In instantiation of 'testing::AssertionResult testing::internal::CmpHelperGE(const char*, const char*, const T1&, const T2&) [with T1 = int; T2 = long long unsigned int]': ../../src/tests/container_logger_tests.cpp:468:3: required from here ../3rdparty/libprocess/3rdparty/gmock-1.7.0/gtest/include/gtest/gtest.h:1583:28: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare] GTEST_IMPL_CMP_HELPER_(GE, >=); ^ ../3rdparty/libprocess/3rdparty/gmock-1.7.0/gtest/include/gtest/gtest.h:1562:12: note: in definition of macro 'GTEST_IMPL_CMP_HELPER_' if (val1 op val2) {\ ^ mv -f tests/.deps/mesos_tests-anonymous_tests.Tpo tests/.deps/mesos_tests-anonymous_tests.Po {noformat}",1 MESOS-4515,"ContainerLoggerTest.LOGROTATE_RotateInSandbox breaks when running on Centos6.","{noformat} [17:24:58][Step 7/7] logrotate: bad argument --version: unknown error [17:24:58][Step 7/7] F0126 17:24:57.913729 4503 container_logger_tests.cpp:380] CHECK_SOME(containerizer): Failed to create container logger: Failed to create container logger module 'org_apache_mesos_LogrotateContainerLogger': Error creating Module instance for 'org_apache_mesos_LogrotateContainerLogger' [17:24:58][Step 7/7] *** Check failure stack trace: *** [17:24:58][Step 7/7] @ 0x7f11ae0d2d40 google::LogMessage::Fail() [17:24:58][Step 7/7] @ 0x7f11ae0d2c9c google::LogMessage::SendToLog() [17:24:58][Step 7/7] @ 0x7f11ae0d2692 google::LogMessage::Flush() [17:24:58][Step 7/7] @ 0x7f11ae0d544c google::LogMessageFatal::~LogMessageFatal() [17:24:58][Step 7/7] @ 0x983927 _CheckFatal::~_CheckFatal() [17:24:58][Step 7/7] @ 0xa9a18b mesos::internal::tests::ContainerLoggerTest_LOGROTATE_RotateInSandbox_Test::TestBody() [17:24:58][Step 7/7] @ 0x1623a4e testing::internal::HandleSehExceptionsInMethodIfSupported<>() [17:24:58][Step 7/7] @ 0x161eab2 testing::internal::HandleExceptionsInMethodIfSupported<>() [17:24:58][Step 7/7] @ 0x15ffdfd testing::Test::Run() [17:24:58][Step 7/7] @ 0x160058b testing::TestInfo::Run() [17:24:58][Step 7/7] @ 0x1600bc6 testing::TestCase::Run() [17:24:58][Step 7/7] @ 0x1607515 testing::internal::UnitTestImpl::RunAllTests() [17:24:58][Step 7/7] @ 0x16246dd testing::internal::HandleSehExceptionsInMethodIfSupported<>() [17:24:58][Step 7/7] @ 0x161f608 testing::internal::HandleExceptionsInMethodIfSupported<>() [17:24:58][Step 7/7] @ 0x1606245 testing::UnitTest::Run() [17:24:58][Step 7/7] @ 0xde36b6 RUN_ALL_TESTS() [17:24:58][Step 7/7] @ 0xde32cc main [17:24:58][Step 7/7] @ 0x7f11a8896d5d __libc_start_main [17:24:58][Step 7/7] @ 0x981fc9 (unknown) {noformat}",1 MESOS-4517,"Introduce docker runtime isolator.","Currently docker image default configuration are included in `ProvisionInfo`. We should grab necessary config from `ProvisionInfo` into `ContainerInfo`, and handle all these runtime informations inside of docker runtime isolator. Return a `ContainerLaunchInfo` containing `working_dir`, `env` and merged `commandInfo`, etc.",3 MESOS-4520,"Introduce a status() interface for isolators","While launching a container mesos isolators end up configuring/modifying various properties of the container. For e.g., cgroup isolators (mem, cpu, net_cls) configure/change the properties associated with their respective subsystems before launching a container. Similary network isolator (net-modules, port mapping) configure the IP address and ports associated with a container. Currently, there are not interface in the isolator to extract the run time state of these properties for a given container. Therefore a status() method needs to be implemented in the isolators to allow the containerizer to extract the container status information from the isolator. ",1 MESOS-4523,"Enable benchmark tests in ASF CI","It would be nice to enable benchmark tests in the ASF CI so that we can catch performance regressions (esp. during releases).",3 MESOS-4526,"Include the allocated portion of reserved resources in the role sorter for DRF.","Reserved resources should be accounted for fairness calculation whether they are allocated or not, since they model a long or forever running task. That is, the effect of reserving resources is equivalent to launching a task in that the resources that make up the reservation are not available to other roles as non-revocable. In the short-term, we should at least account for the allocated portion of the reservation.",1 MESOS-4527,"Include allocated portion of the reserved resources in the quota role sorter for DRF.","Similar to MESOS-4526, reserved resources should be accounted for in the quota role sorter regardless of their allocation state. In the short-term, we should at least account them if they are allocated.",1 MESOS-4528,"Account for reserved resources in the quota guarantee check.","Reserved resources should be accounted for in the quota guarantee check so that frameworks cannot continually reserve resources to pull them out of the quota pool.",2 MESOS-4529,"Update the allocator to not offer unreserved resources beyond quota.","Eventually, we will want to offer unreserved resources as revocable beyond the role's quota. Rather than offering non-revocable resources beyond the role's quota's guarantee, in the short term, we choose to not offer resources beyond a role's quota.",2 MESOS-4530,"NetClsIsolatorTest.ROOT_CGROUPS_NetClsIsolate is flaky","While running the command {noformat} sudo ./bin/mesos-tests.sh --gtest_filter=""-CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Listen:CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseRSS"" --gtest_repeat=10 --gtest_break_on_failure {noformat} One eventually gets the following output: {noformat} [ RUN ] NetClsIsolatorTest.ROOT_CGROUPS_NetClsIsolate ../../src/tests/containerizer/isolator_tests.cpp:870: Failure containerizer: Could not create isolator 'cgroups/net_cls': Unexpected subsystems found attached to the hierarchy /sys/fs/cgroup/net_cls,net_prio [ FAILED ] NetClsIsolatorTest.ROOT_CGROUPS_NetClsIsolate (75 ms) {noformat}",1 MESOS-4531,"Document multi-disk support.",NULL,2 MESOS-4534,"Resources object can be mutated through the public API","The {{Resources}} object current allows mutation of it's internal state through the public mutable iterator interface. This can cause issues when the mutation involved stripping certain qualifiers on a {{Resource}}, as they will not be summed together at the end of the mutation (even though they should be). The {{contains()}} math will not work correctly if two {{addable}} resources are not summed together on the {{lhs}} of the contains check.",3 MESOS-4535,"Logrotate ContainerLogger may not handle FD ownership correctly","One of the patches for [MESOS-4136] introduced the {{FDType::OWNED}} enum for {{Subprocess::IO::FD}}. The way the logrotate module uses this is slightly incorrect: # The module starts a subprocess with an output {{Subprocess::PIPE()}}. # That pipe's FD is passed into another subprocess via {{Subprocess::IO::FD(pipe, IO::OWNED)}}. # When the second subprocess starts, the pipe's FD is closed in the parent. # When the first subprocess terminates, the existing code will try to close the pipe again. This effectively closes a random FD.",1 MESOS-4536,"Add abstractions of ""owned"" and ""shared"" file descriptors to libprocess.","Libprocess currently manages file descriptors as plain {{int}} s. This leads to some easily missed bugs regarding duplicated or closed FDs. We should introduce an abstraction (like {{unique_ptr}} and {{shared_ptr}}) so that FD ownership can be expressed alongside the affected code.",3 MESOS-4539,"Exclude paths in Posix disk isolator should be absolute paths.","Since du --exclude uses pattern matching. A relative path might accidentally matches an irrelevant directory/file. For instance, {noformat} /tmp/testpath $ tree . ├── aaa │ └── exc │ └── file └── exc └── file 3 directories, 2 files /tmp/testpath $ du --exclude /tmp/testpath/exc /tmp/testpath/ 8 /tmp/testpath/aaa/exc 12 /tmp/testpath/aaa 16 /tmp/testpath/ /tmp/testpath $ du --exclude exc /tmp/testpath/ 4 /tmp/testpath/aaa 8 /tmp/testpath/ /tmp/testpath $ {noformat}",2 MESOS-4540,"NetClsIsolatorTest.ROOT_CGROUPS_NetClsIsolate fails on CentOS 6","This test fails in my CentOS 6 VM due to a cgroups issue: {code} [ RUN ] NetClsIsolatorTest.ROOT_CGROUPS_NetClsIsolate I0127 19:15:06.637328 25347 exec.cpp:134] Version: 0.28.0 I0127 19:15:06.648378 25378 exec.cpp:208] Executor registered on slave 6edafba0-9dbd-4e6e-b10e-c6f935e58d41-S0 Registered executor on localhost Starting task b745d88e-3fbe-4af9-80b3-e43484e37acf sh -c 'sleep 1000' Forked command at 25385 ../../src/tests/containerizer/isolator_tests.cpp:926: Failure pids: Failed to read cgroups control 'cgroup.procs': '/sys/fs/cgroup/net_cls' is not a valid hierarchy I0127 19:15:06.662083 25376 exec.cpp:381] Executor asked to shutdown Shutting down Sending SIGTERM to process tree at pid 25385 [ FAILED ] NetClsIsolatorTest.ROOT_CGROUPS_NetClsIsolate (335 ms) {code}",1 MESOS-4542,"MasterQuotaTest.AvailableResourcesAfterRescinding is flaky.","Can be reproduced by running {{GLOG_v=1 GTEST_FILTER=""MasterQuotaTest.AvailableResourcesAfterRescinding"" ./bin/mesos-tests.sh --gtest_shuffle --gtest_break_on_failure --gtest_repeat=1000 --verbose}}. h5. Verbose log from a bad run: {code} [ RUN ] MasterQuotaTest.AvailableResourcesAfterRescinding I0128 12:20:27.568657 2080858880 resources.cpp:564] Parsing resources as JSON failed: cpus:2;mem:1024;disk:1024;ports:[31000-32000] Trying semicolon-delimited string format instead I0128 12:20:27.570142 2080858880 resources.cpp:564] Parsing resources as JSON failed: cpus:2;mem:1024;disk:1024;ports:[31000-32000] Trying semicolon-delimited string format instead I0128 12:20:27.583225 2080858880 leveldb.cpp:174] Opened db in 6241us I0128 12:20:27.584353 2080858880 leveldb.cpp:181] Compacted db in 1026us I0128 12:20:27.584429 2080858880 leveldb.cpp:196] Created db iterator in 12us I0128 12:20:27.584442 2080858880 leveldb.cpp:202] Seeked to beginning of db in 7us I0128 12:20:27.584453 2080858880 leveldb.cpp:271] Iterated through 0 keys in the db in 6us I0128 12:20:27.584475 2080858880 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0128 12:20:27.584918 300445696 recover.cpp:447] Starting replica recovery I0128 12:20:27.585113 300445696 recover.cpp:473] Replica is in EMPTY status I0128 12:20:27.585916 297226240 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (18274)@192.168.178.24:51278 I0128 12:20:27.586086 297762816 recover.cpp:193] Received a recover response from a replica in EMPTY status I0128 12:20:27.586449 297226240 recover.cpp:564] Updating replica status to STARTING I0128 12:20:27.587204 300445696 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 624us I0128 12:20:27.587242 300445696 replica.cpp:320] Persisted replica status to STARTING I0128 12:20:27.587376 299372544 recover.cpp:473] Replica is in STARTING status I0128 12:20:27.588050 300982272 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (18275)@192.168.178.24:51278 I0128 12:20:27.588235 300445696 recover.cpp:193] Received a recover response from a replica in STARTING status I0128 12:20:27.588572 297762816 recover.cpp:564] Updating replica status to VOTING I0128 12:20:27.588850 297226240 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 140us I0128 12:20:27.588879 297226240 replica.cpp:320] Persisted replica status to VOTING I0128 12:20:27.588975 299909120 recover.cpp:578] Successfully joined the Paxos group I0128 12:20:27.589154 299909120 recover.cpp:462] Recover process terminated I0128 12:20:27.599486 298835968 master.cpp:374] Master 531344bd-56f4-4e4f-8f6f-a6a9d36058c7 (alexr.fritz.box) started on 192.168.178.24:51278 I0128 12:20:27.599520 298835968 master.cpp:376] Flags at startup: --acls="""" --allocation_interval=""50ms"" --allocator=""HierarchicalDRF"" --authenticate=""true"" --authenticate_http=""true"" --authenticate_slaves=""true"" --authenticators=""crammd5"" --authorizers=""local"" --credentials=""/private/tmp/NlzPSo/credentials"" --framework_sorter=""drf"" --help=""false"" --hostname_lookup=""true"" --http_authenticators=""basic"" --initialize_driver_logging=""true"" --log_auto_initialize=""true"" --logbufsecs=""0"" --logging_level=""INFO"" --max_completed_frameworks=""50"" --max_completed_tasks_per_framework=""1000"" --max_slave_ping_timeouts=""5"" --quiet=""false"" --recovery_slave_removal_limit=""100%"" --registry=""replicated_log"" --registry_fetch_timeout=""1mins"" --registry_store_timeout=""25secs"" --registry_strict=""true"" --roles=""role1,role2"" --root_submissions=""true"" --slave_ping_timeout=""15secs"" --slave_reregister_timeout=""10mins"" --user_sorter=""drf"" --version=""false"" --webui_dir=""/usr/local/share/mesos/webui"" --work_dir=""/private/tmp/NlzPSo/master"" --zk_session_timeout=""10secs"" I0128 12:20:27.599753 298835968 master.cpp:421] Master only allowing authenticated frameworks to register I0128 12:20:27.599769 298835968 master.cpp:426] Master only allowing authenticated slaves to register I0128 12:20:27.599781 298835968 credentials.hpp:35] Loading credentials for authentication from '/private/tmp/NlzPSo/credentials' I0128 12:20:27.600082 298835968 master.cpp:466] Using default 'crammd5' authenticator I0128 12:20:27.600163 298835968 master.cpp:535] Using default 'basic' HTTP authenticator I0128 12:20:27.600327 298835968 master.cpp:569] Authorization enabled W0128 12:20:27.600345 298835968 master.cpp:629] The '--roles' flag is deprecated. This flag will be removed in the future. See the Mesos 0.27 upgrade notes for more information I0128 12:20:27.600497 297762816 whitelist_watcher.cpp:77] No whitelist given I0128 12:20:27.600503 297226240 hierarchical.cpp:144] Initialized hierarchical allocator process I0128 12:20:27.601965 297226240 master.cpp:1710] The newly elected leader is master@192.168.178.24:51278 with id 531344bd-56f4-4e4f-8f6f-a6a9d36058c7 I0128 12:20:27.601995 297226240 master.cpp:1723] Elected as the leading master! I0128 12:20:27.602007 297226240 master.cpp:1468] Recovering from registrar I0128 12:20:27.602083 300445696 registrar.cpp:307] Recovering registrar I0128 12:20:27.602460 297226240 log.cpp:659] Attempting to start the writer I0128 12:20:27.603514 299909120 replica.cpp:493] Replica received implicit promise request from (18277)@192.168.178.24:51278 with proposal 1 I0128 12:20:27.603734 299909120 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 205us I0128 12:20:27.603768 299909120 replica.cpp:342] Persisted promised to 1 I0128 12:20:27.604194 299909120 coordinator.cpp:238] Coordinator attempting to fill missing positions I0128 12:20:27.605311 299372544 replica.cpp:388] Replica received explicit promise request from (18278)@192.168.178.24:51278 for position 0 with proposal 2 I0128 12:20:27.605468 299372544 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 133us I0128 12:20:27.605494 299372544 replica.cpp:712] Persisted action at 0 I0128 12:20:27.606441 298835968 replica.cpp:537] Replica received write request for position 0 from (18279)@192.168.178.24:51278 I0128 12:20:27.606492 298835968 leveldb.cpp:436] Reading position from leveldb took 29us I0128 12:20:27.606665 298835968 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 151us I0128 12:20:27.606688 298835968 replica.cpp:712] Persisted action at 0 I0128 12:20:27.607244 297226240 replica.cpp:691] Replica received learned notice for position 0 from @0.0.0.0:0 I0128 12:20:27.607409 297226240 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 152us I0128 12:20:27.607441 297226240 replica.cpp:712] Persisted action at 0 I0128 12:20:27.607457 297226240 replica.cpp:697] Replica learned NOP action at position 0 I0128 12:20:27.607853 297226240 log.cpp:675] Writer started with ending position 0 I0128 12:20:27.608649 299372544 leveldb.cpp:436] Reading position from leveldb took 158us I0128 12:20:27.609539 298835968 registrar.cpp:340] Successfully fetched the registry (0B) in 7.426816ms I0128 12:20:27.609763 298835968 registrar.cpp:439] Applied 1 operations in 54us; attempting to update the 'registry' I0128 12:20:27.610216 300982272 log.cpp:683] Attempting to append 186 bytes to the log I0128 12:20:27.610297 298835968 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 1 I0128 12:20:27.611016 299909120 replica.cpp:537] Replica received write request for position 1 from (18280)@192.168.178.24:51278 I0128 12:20:27.611188 299909120 leveldb.cpp:341] Persisting action (205 bytes) to leveldb took 153us I0128 12:20:27.611222 299909120 replica.cpp:712] Persisted action at 1 I0128 12:20:27.611843 299909120 replica.cpp:691] Replica received learned notice for position 1 from @0.0.0.0:0 I0128 12:20:27.612004 299909120 leveldb.cpp:341] Persisting action (207 bytes) to leveldb took 147us I0128 12:20:27.612035 299909120 replica.cpp:712] Persisted action at 1 I0128 12:20:27.612052 299909120 replica.cpp:697] Replica learned APPEND action at position 1 I0128 12:20:27.612742 300982272 registrar.cpp:484] Successfully updated the 'registry' in 2.924032ms I0128 12:20:27.612846 300982272 registrar.cpp:370] Successfully recovered registrar I0128 12:20:27.612936 298835968 log.cpp:702] Attempting to truncate the log to 1 I0128 12:20:27.613005 297762816 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 2 I0128 12:20:27.613323 298299392 master.cpp:1520] Recovered 0 slaves from the Registry (147B) ; allowing 10mins for slaves to re-register I0128 12:20:27.613364 298835968 hierarchical.cpp:171] Skipping recovery of hierarchical allocator: nothing to recover I0128 12:20:27.613966 300445696 replica.cpp:537] Replica received write request for position 2 from (18281)@192.168.178.24:51278 I0128 12:20:27.614131 300445696 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 151us I0128 12:20:27.614166 300445696 replica.cpp:712] Persisted action at 2 I0128 12:20:27.614660 299372544 replica.cpp:691] Replica received learned notice for position 2 from @0.0.0.0:0 I0128 12:20:27.614828 299372544 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 158us I0128 12:20:27.614876 299372544 leveldb.cpp:399] Deleting ~1 keys from leveldb took 28us I0128 12:20:27.614898 299372544 replica.cpp:712] Persisted action at 2 I0128 12:20:27.614915 299372544 replica.cpp:697] Replica learned TRUNCATE action at position 2 I0128 12:20:27.625591 2080858880 containerizer.cpp:143] Using isolation: posix/cpu,posix/mem,filesystem/posix I0128 12:20:27.629758 298299392 slave.cpp:192] Slave started on 871)@192.168.178.24:51278 I0128 12:20:27.629791 298299392 slave.cpp:193] Flags at startup: --appc_store_dir=""/tmp/mesos/store/appc"" --authenticatee=""crammd5"" --container_disk_watch_interval=""15secs"" --containerizers=""mesos"" --credential=""/tmp/MasterQuotaTest_AvailableResourcesAfterRescinding_gS9Qcf/credential"" --default_role=""*"" --disk_watch_interval=""1mins"" --docker=""docker"" --docker_auth_server=""https://auth.docker.io"" --docker_kill_orphans=""true"" --docker_puller_timeout=""60"" --docker_registry=""https://registry-1.docker.io"" --docker_remove_delay=""6hrs"" --docker_socket=""/var/run/docker.sock"" --docker_stop_timeout=""0ns"" --docker_store_dir=""/tmp/mesos/store/docker"" --enforce_container_disk_quota=""false"" --executor_registration_timeout=""1mins"" --executor_shutdown_grace_period=""5secs"" --fetcher_cache_dir=""/tmp/MasterQuotaTest_AvailableResourcesAfterRescinding_gS9Qcf/fetch"" --fetcher_cache_size=""2GB"" --frameworks_home="""" --gc_delay=""1weeks"" --gc_disk_headroom=""0.1"" --hadoop_home="""" --help=""false"" --hostname_lookup=""true"" --image_provisioner_backend=""copy"" --initialize_driver_logging=""true"" --isolation=""posix/cpu,posix/mem"" --launcher_dir=""/Users/alex/Projects/mesos/build/default/src"" --logbufsecs=""0"" --logging_level=""INFO"" --oversubscribed_resources_interval=""15secs"" --qos_correction_interval_min=""0ns"" --quiet=""false"" --recover=""reconnect"" --recovery_timeout=""15mins"" --registration_backoff_factor=""10ms"" --resources=""cpus:2;mem:1024;disk:1024;ports:[31000-32000]"" --sandbox_directory=""/mnt/mesos/sandbox"" --strict=""true"" --switch_user=""true"" --version=""false"" --work_dir=""/tmp/MasterQuotaTest_AvailableResourcesAfterRescinding_gS9Qcf"" I0128 12:20:27.630067 298299392 credentials.hpp:83] Loading credential for authentication from '/tmp/MasterQuotaTest_AvailableResourcesAfterRescinding_gS9Qcf/credential' I0128 12:20:27.630223 298299392 slave.cpp:323] Slave using credential for: test-principal I0128 12:20:27.630360 298299392 resources.cpp:564] Parsing resources as JSON failed: cpus:2;mem:1024;disk:1024;ports:[31000-32000] Trying semicolon-delimited string format instead I0128 12:20:27.630818 298299392 slave.cpp:463] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0128 12:20:27.630869 298299392 slave.cpp:471] Slave attributes: [ ] I0128 12:20:27.630882 298299392 slave.cpp:476] Slave hostname: alexr.fritz.box I0128 12:20:27.631352 300982272 state.cpp:58] Recovering state from '/tmp/MasterQuotaTest_AvailableResourcesAfterRescinding_gS9Qcf/meta' I0128 12:20:27.631515 299909120 status_update_manager.cpp:200] Recovering status update manager I0128 12:20:27.631702 298835968 containerizer.cpp:390] Recovering containerizer I0128 12:20:27.632589 297226240 provisioner.cpp:245] Provisioner recovery complete I0128 12:20:27.632807 298835968 slave.cpp:4495] Finished recovery I0128 12:20:27.633539 298835968 slave.cpp:4667] Querying resource estimator for oversubscribable resources I0128 12:20:27.633752 300445696 status_update_manager.cpp:174] Pausing sending status updates I0128 12:20:27.633754 298835968 slave.cpp:795] New master detected at master@192.168.178.24:51278 I0128 12:20:27.633806 298835968 slave.cpp:858] Authenticating with master master@192.168.178.24:51278 I0128 12:20:27.633824 298835968 slave.cpp:863] Using default CRAM-MD5 authenticatee I0128 12:20:27.633903 298835968 slave.cpp:831] Detecting new master I0128 12:20:27.633913 299372544 authenticatee.cpp:121] Creating new client SASL connection I0128 12:20:27.634016 298835968 slave.cpp:4681] Received oversubscribable resources from the resource estimator I0128 12:20:27.634076 297226240 master.cpp:5521] Authenticating slave(871)@192.168.178.24:51278 I0128 12:20:27.634130 299372544 authenticator.cpp:413] Starting authentication session for crammd5_authenticatee(1741)@192.168.178.24:51278 I0128 12:20:27.634255 297226240 authenticator.cpp:98] Creating new server SASL connection I0128 12:20:27.634348 300982272 authenticatee.cpp:212] Received SASL authentication mechanisms: CRAM-MD5 I0128 12:20:27.634367 300982272 authenticatee.cpp:238] Attempting to authenticate with mechanism 'CRAM-MD5' I0128 12:20:27.634454 298835968 authenticator.cpp:203] Received SASL authentication start I0128 12:20:27.634515 298835968 authenticator.cpp:325] Authentication requires more steps I0128 12:20:27.634572 298835968 authenticatee.cpp:258] Received SASL authentication step I0128 12:20:27.634706 297226240 authenticator.cpp:231] Received SASL authentication step I0128 12:20:27.634757 297226240 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: 'alexr.fritz.box' server FQDN: 'alexr.fritz.box' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0128 12:20:27.634771 297226240 auxprop.cpp:179] Looking up auxiliary property '*userPassword' I0128 12:20:27.634793 297226240 auxprop.cpp:179] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0128 12:20:27.634809 297226240 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: 'alexr.fritz.box' server FQDN: 'alexr.fritz.box' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0128 12:20:27.634819 297226240 auxprop.cpp:129] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0128 12:20:27.634827 297226240 auxprop.cpp:129] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0128 12:20:27.634893 297226240 authenticator.cpp:317] Authentication success I0128 12:20:27.634958 298835968 authenticatee.cpp:298] Authentication success I0128 12:20:27.635030 298299392 master.cpp:5551] Successfully authenticated principal 'test-principal' at slave(871)@192.168.178.24:51278 I0128 12:20:27.635079 300445696 authenticator.cpp:431] Authentication session cleanup for crammd5_authenticatee(1741)@192.168.178.24:51278 I0128 12:20:27.635195 299372544 slave.cpp:926] Successfully authenticated with master master@192.168.178.24:51278 I0128 12:20:27.635273 299372544 slave.cpp:1320] Will retry registration in 5.823453ms if necessary I0128 12:20:27.635365 299909120 master.cpp:4235] Registering slave at slave(871)@192.168.178.24:51278 (alexr.fritz.box) with id 531344bd-56f4-4e4f-8f6f-a6a9d36058c7-S0 I0128 12:20:27.635542 297762816 registrar.cpp:439] Applied 1 operations in 41us; attempting to update the 'registry' I0128 12:20:27.635889 299372544 log.cpp:683] Attempting to append 358 bytes to the log I0128 12:20:27.636011 298299392 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 3 I0128 12:20:27.636693 300982272 replica.cpp:537] Replica received write request for position 3 from (18295)@192.168.178.24:51278 I0128 12:20:27.636860 300982272 leveldb.cpp:341] Persisting action (377 bytes) to leveldb took 139us I0128 12:20:27.636885 300982272 replica.cpp:712] Persisted action at 3 I0128 12:20:27.637380 299909120 replica.cpp:691] Replica received learned notice for position 3 from @0.0.0.0:0 I0128 12:20:27.637547 299909120 leveldb.cpp:341] Persisting action (379 bytes) to leveldb took 132us I0128 12:20:27.637573 299909120 replica.cpp:712] Persisted action at 3 I0128 12:20:27.637589 299909120 replica.cpp:697] Replica learned APPEND action at position 3 I0128 12:20:27.638362 298835968 registrar.cpp:484] Successfully updated the 'registry' in 2.77504ms I0128 12:20:27.638589 300445696 log.cpp:702] Attempting to truncate the log to 3 I0128 12:20:27.638684 298299392 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 4 I0128 12:20:27.638825 300445696 slave.cpp:3435] Received ping from slave-observer(871)@192.168.178.24:51278 I0128 12:20:27.639081 300982272 hierarchical.cpp:473] Added slave 531344bd-56f4-4e4f-8f6f-a6a9d36058c7-S0 (alexr.fritz.box) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (allocated: ) I0128 12:20:27.639117 299909120 master.cpp:4303] Registered slave 531344bd-56f4-4e4f-8f6f-a6a9d36058c7-S0 at slave(871)@192.168.178.24:51278 (alexr.fritz.box) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0128 12:20:27.639165 300982272 hierarchical.cpp:1403] No resources available to allocate! I0128 12:20:27.639168 297226240 slave.cpp:970] Registered with master master@192.168.178.24:51278; given slave ID 531344bd-56f4-4e4f-8f6f-a6a9d36058c7-S0 I0128 12:20:27.639189 297226240 fetcher.cpp:81] Clearing fetcher cache I0128 12:20:27.639183 300982272 hierarchical.cpp:1116] Performed allocation for slave 531344bd-56f4-4e4f-8f6f-a6a9d36058c7-S0 in 77us I0128 12:20:27.639348 297762816 status_update_manager.cpp:181] Resuming sending status updates I0128 12:20:27.639519 298835968 replica.cpp:537] Replica received write request for position 4 from (18296)@192.168.178.24:51278 I0128 12:20:27.639678 298835968 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 142us I0128 12:20:27.639708 298835968 replica.cpp:712] Persisted action at 4 I0128 12:20:27.640115 300982272 replica.cpp:691] Replica received learned notice for position 4 from @0.0.0.0:0 I0128 12:20:27.640276 300982272 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 137us I0128 12:20:27.640312 300982272 leveldb.cpp:399] Deleting ~2 keys from leveldb took 21us I0128 12:20:27.640326 300982272 replica.cpp:712] Persisted action at 4 I0128 12:20:27.640336 300982272 replica.cpp:697] Replica learned TRUNCATE action at position 4 I0128 12:20:27.642145 297226240 slave.cpp:993] Checkpointing SlaveInfo to '/tmp/MasterQuotaTest_AvailableResourcesAfterRescinding_gS9Qcf/meta/slaves/531344bd-56f4-4e4f-8f6f-a6a9d36058c7-S0/slave.info' I0128 12:20:27.643354 297226240 slave.cpp:1029] Forwarding total oversubscribed resources I0128 12:20:27.643458 300445696 master.cpp:4644] Received update of slave 531344bd-56f4-4e4f-8f6f-a6a9d36058c7-S0 at slave(871)@192.168.178.24:51278 (alexr.fritz.box) with total oversubscribed resources I0128 12:20:27.643710 298299392 hierarchical.cpp:531] Slave 531344bd-56f4-4e4f-8f6f-a6a9d36058c7-S0 (alexr.fritz.box) updated with oversubscribed resources (total: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000], allocated: ) I0128 12:20:27.643769 298299392 hierarchical.cpp:1403] No resources available to allocate! I0128 12:20:27.643805 2...",3 MESOS-4544,"Propose design doc for agent partitioning behavior",NULL,8 MESOS-4545,"Propose design doc for reliable floating point behavior",NULL,3 MESOS-4546,"Mesos Agents needs to re-resolve hosts in zk string on leader change / failure to connect","Sample Mesos Agent log: https://gist.github.com/brndnmtthws/fb846fa988487250a809 Note, zookeeper has a function to change the list of servers at runtime: https://github.com/apache/zookeeper/blob/735ea78909e67c648a4978c8d31d63964986af73/src/c/src/zookeeper.c#L1207-L1232 This comes up when using an AWS AutoScalingGroup for managing the set of masters. The agent when it comes up the first time, resolves the zk:// string. Once all the hosts that were in the original string fail (Each fails, is replaced by a new machine, which has the same DNS name), the agent just keeps spinning in an internal loop, never re-resolving the DNS names. Two solutions I see are 1. Update the list of servers / re-resolve 2. Have the agent detect it hasn't connected recently, and kill itself (Which will force a re-resolution when the agent starts back up)",3 MESOS-4554,"Investigate test suite crashes after ZK socket disconnections.","Showed up on ASF CI: https://builds.apache.org/job/Mesos/COMPILER=clang,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/1579/console The test crashed with the following logs: {code} [ RUN ] ContentType/ExecutorHttpApiTest.DefaultAccept/1 I0129 02:00:35.137161 31926 leveldb.cpp:174] Opened db in 118.902333ms I0129 02:00:35.187021 31926 leveldb.cpp:181] Compacted db in 49.836241ms I0129 02:00:35.187088 31926 leveldb.cpp:196] Created db iterator in 33825ns I0129 02:00:35.187109 31926 leveldb.cpp:202] Seeked to beginning of db in 7965ns I0129 02:00:35.187121 31926 leveldb.cpp:271] Iterated through 0 keys in the db in 6350ns I0129 02:00:35.187165 31926 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0129 02:00:35.188433 31950 recover.cpp:447] Starting replica recovery I0129 02:00:35.188796 31950 recover.cpp:473] Replica is in EMPTY status I0129 02:00:35.190021 31949 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (11817)@172.17.0.3:60904 I0129 02:00:35.190569 31958 recover.cpp:193] Received a recover response from a replica in EMPTY status I0129 02:00:35.190994 31959 recover.cpp:564] Updating replica status to STARTING I0129 02:00:35.191522 31953 master.cpp:374] Master 823f2212-bf28-4dd6-959d-796029d32afb (90665f991b70) started on 172.17.0.3:60904 I0129 02:00:35.191640 31953 master.cpp:376] Flags at startup: --acls="""" --allocation_interval=""1secs"" --allocator=""HierarchicalDRF"" --authenticate=""true"" --authenticate_http=""true"" --authenticate_slaves=""true"" --authenticators=""crammd5"" --authorizers=""local"" --credentials=""/tmp/B9O6zq/credentials"" --framework_sorter=""drf"" --help=""false"" --hostname_lookup=""true"" --http_authenticators=""basic"" --initialize_driver_logging=""true"" --log_auto_initialize=""true"" --logbufsecs=""0"" --logging_level=""INFO"" --max_completed_frameworks=""50"" --max_completed_tasks_per_framework=""1000"" --max_slave_ping_timeouts=""5"" --quiet=""false"" --recovery_slave_removal_limit=""100%"" --registry=""replicated_log"" --registry_fetch_timeout=""1mins"" --registry_store_timeout=""25secs"" --registry_strict=""true"" --root_submissions=""true"" --slave_ping_timeout=""15secs"" --slave_reregister_timeout=""10mins"" --user_sorter=""drf"" --version=""false"" --webui_dir=""/mesos/mesos-0.28.0/_inst/share/mesos/webui"" --work_dir=""/tmp/B9O6zq/master"" --zk_session_timeout=""10secs"" I0129 02:00:35.191926 31953 master.cpp:421] Master only allowing authenticated frameworks to register I0129 02:00:35.191936 31953 master.cpp:426] Master only allowing authenticated slaves to register I0129 02:00:35.191943 31953 credentials.hpp:35] Loading credentials for authentication from '/tmp/B9O6zq/credentials' I0129 02:00:35.192229 31953 master.cpp:466] Using default 'crammd5' authenticator I0129 02:00:35.192366 31953 master.cpp:535] Using default 'basic' HTTP authenticator I0129 02:00:35.192530 31953 master.cpp:569] Authorization enabled I0129 02:00:35.192719 31950 whitelist_watcher.cpp:77] No whitelist given I0129 02:00:35.192756 31957 hierarchical.cpp:144] Initialized hierarchical allocator process I0129 02:00:35.194291 31955 master.cpp:1710] The newly elected leader is master@172.17.0.3:60904 with id 823f2212-bf28-4dd6-959d-796029d32afb I0129 02:00:35.194335 31955 master.cpp:1723] Elected as the leading master! I0129 02:00:35.194350 31955 master.cpp:1468] Recovering from registrar I0129 02:00:35.194545 31958 registrar.cpp:307] Recovering registrar I0129 02:00:35.220226 31948 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 29.150097ms I0129 02:00:35.220262 31948 replica.cpp:320] Persisted replica status to STARTING I0129 02:00:35.220484 31959 recover.cpp:473] Replica is in STARTING status I0129 02:00:35.221220 31954 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (11819)@172.17.0.3:60904 I0129 02:00:35.221539 31959 recover.cpp:193] Received a recover response from a replica in STARTING status I0129 02:00:35.221871 31954 recover.cpp:564] Updating replica status to VOTING I0129 02:00:35.245329 31949 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 23.326002ms I0129 02:00:35.245367 31949 replica.cpp:320] Persisted replica status to VOTING I0129 02:00:35.245522 31955 recover.cpp:578] Successfully joined the Paxos group I0129 02:00:35.245800 31955 recover.cpp:462] Recover process terminated I0129 02:00:35.246181 31951 log.cpp:659] Attempting to start the writer I0129 02:00:35.247228 31953 replica.cpp:493] Replica received implicit promise request from (11820)@172.17.0.3:60904 with proposal 1 I0129 02:00:35.270472 31953 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 23.225846ms I0129 02:00:35.270510 31953 replica.cpp:342] Persisted promised to 1 I0129 02:00:35.271306 31957 coordinator.cpp:238] Coordinator attempting to fill missing positions I0129 02:00:35.272373 31949 replica.cpp:388] Replica received explicit promise request from (11821)@172.17.0.3:60904 for position 0 with proposal 2 I0129 02:00:35.295600 31949 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 23.181008ms I0129 02:00:35.295639 31949 replica.cpp:712] Persisted action at 0 I0129 02:00:35.296815 31950 replica.cpp:537] Replica received write request for position 0 from (11822)@172.17.0.3:60904 I0129 02:00:35.296879 31950 leveldb.cpp:436] Reading position from leveldb took 43203ns I0129 02:00:35.320659 31950 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 23.753935ms I0129 02:00:35.320699 31950 replica.cpp:712] Persisted action at 0 I0129 02:00:35.321394 31950 replica.cpp:691] Replica received learned notice for position 0 from @0.0.0.0:0 I0129 02:00:35.345837 31950 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 24.358655ms I0129 02:00:35.345877 31950 replica.cpp:712] Persisted action at 0 I0129 02:00:35.345898 31950 replica.cpp:697] Replica learned NOP action at position 0 I0129 02:00:35.346683 31950 log.cpp:675] Writer started with ending position 0 I0129 02:00:35.347913 31957 leveldb.cpp:436] Reading position from leveldb took 55621ns I0129 02:00:35.349047 31947 registrar.cpp:340] Successfully fetched the registry (0B) in 154.395904ms I0129 02:00:35.349185 31947 registrar.cpp:439] Applied 1 operations in 46347ns; attempting to update the 'registry' I0129 02:00:35.350008 31952 log.cpp:683] Attempting to append 170 bytes to the log I0129 02:00:35.350132 31957 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 1 I0129 02:00:35.351042 31953 replica.cpp:537] Replica received write request for position 1 from (11823)@172.17.0.3:60904 I0129 02:00:35.370906 31953 leveldb.cpp:341] Persisting action (189 bytes) to leveldb took 19.829257ms I0129 02:00:35.370946 31953 replica.cpp:712] Persisted action at 1 I0129 02:00:35.371840 31952 replica.cpp:691] Replica received learned notice for position 1 from @0.0.0.0:0 I0129 02:00:35.396082 31952 leveldb.cpp:341] Persisting action (191 bytes) to leveldb took 24.218894ms I0129 02:00:35.396122 31952 replica.cpp:712] Persisted action at 1 I0129 02:00:35.396144 31952 replica.cpp:697] Replica learned APPEND action at position 1 I0129 02:00:35.397250 31954 registrar.cpp:484] Successfully updated the 'registry' in 47.99104ms I0129 02:00:35.397452 31954 registrar.cpp:370] Successfully recovered registrar I0129 02:00:35.397678 31946 log.cpp:702] Attempting to truncate the log to 1 I0129 02:00:35.397881 31956 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 2 I0129 02:00:35.398066 31951 master.cpp:1520] Recovered 0 slaves from the Registry (131B) ; allowing 10mins for slaves to re-register I0129 02:00:35.398111 31957 hierarchical.cpp:171] Skipping recovery of hierarchical allocator: nothing to recover I0129 02:00:35.398982 31955 replica.cpp:537] Replica received write request for position 2 from (11824)@172.17.0.3:60904 I0129 02:00:35.421293 31955 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 22.286476ms I0129 02:00:35.421339 31955 replica.cpp:712] Persisted action at 2 I0129 02:00:35.422046 31944 replica.cpp:691] Replica received learned notice for position 2 from @0.0.0.0:0 I0129 02:00:35.446316 31944 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 24.246177ms I0129 02:00:35.446406 31944 leveldb.cpp:399] Deleting ~1 keys from leveldb took 84415ns I0129 02:00:35.446466 31944 replica.cpp:712] Persisted action at 2 I0129 02:00:35.446491 31944 replica.cpp:697] Replica learned TRUNCATE action at position 2 I0129 02:00:35.452579 31957 slave.cpp:192] Slave started on 372)@172.17.0.3:60904 I0129 02:00:35.452620 31957 slave.cpp:193] Flags at startup: --appc_store_dir=""/tmp/mesos/store/appc"" --authenticatee=""crammd5"" --cgroups_cpu_enable_pids_and_tids_count=""false"" --cgroups_enable_cfs=""false"" --cgroups_hierarchy=""/sys/fs/cgroup"" --cgroups_limit_swap=""false"" --cgroups_root=""mesos"" --container_disk_watch_interval=""15secs"" --containerizers=""mesos"" --credential=""/tmp/ContentType_ExecutorHttpApiTest_DefaultAccept_1_r4GUhM/credential"" --default_role=""*"" --disk_watch_interval=""1mins"" --docker=""docker"" --docker_auth_server=""https://auth.docker.io"" --docker_kill_orphans=""true"" --docker_puller_timeout=""60"" --docker_registry=""https://registry-1.docker.io"" --docker_remove_delay=""6hrs"" --docker_socket=""/var/run/docker.sock"" --docker_stop_timeout=""0ns"" --docker_store_dir=""/tmp/mesos/store/docker"" --enforce_container_disk_quota=""false"" --executor_registration_timeout=""1mins"" --executor_shutdown_grace_period=""5secs"" --fetcher_cache_dir=""/tmp/ContentType_ExecutorHttpApiTest_DefaultAccept_1_r4GUhM/fetch"" --fetcher_cache_size=""2GB"" --frameworks_home="""" --gc_delay=""1weeks"" --gc_disk_headroom=""0.1"" --hadoop_home="""" --help=""false"" --hostname_lookup=""true"" --image_provisioner_backend=""copy"" --initialize_driver_logging=""true"" --isolation=""posix/cpu,posix/mem"" --launcher_dir=""/mesos/mesos-0.28.0/_build/src"" --logbufsecs=""0"" --logging_level=""INFO"" --oversubscribed_resources_interval=""15secs"" --perf_duration=""10secs"" --perf_interval=""1mins"" --qos_correction_interval_min=""0ns"" --quiet=""false"" --recover=""reconnect"" --recovery_timeout=""15mins"" --registration_backoff_factor=""10ms"" --resources=""cpus:2;mem:1024;disk:1024;ports:[31000-32000]"" --revocable_cpu_low_priority=""true"" --sandbox_directory=""/mnt/mesos/sandbox"" --strict=""true"" --switch_user=""true"" --systemd_runtime_directory=""/run/systemd/system"" --version=""false"" --work_dir=""/tmp/ContentType_ExecutorHttpApiTest_DefaultAccept_1_r4GUhM"" I0129 02:00:35.453012 31957 credentials.hpp:83] Loading credential for authentication from '/tmp/ContentType_ExecutorHttpApiTest_DefaultAccept_1_r4GUhM/credential' I0129 02:00:35.453191 31957 slave.cpp:323] Slave using credential for: test-principal I0129 02:00:35.453368 31957 resources.cpp:564] Parsing resources as JSON failed: cpus:2;mem:1024;disk:1024;ports:[31000-32000] Trying semicolon-delimited string format instead I0129 02:00:35.453853 31957 slave.cpp:463] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0129 02:00:35.453938 31957 slave.cpp:471] Slave attributes: [ ] I0129 02:00:35.453953 31957 slave.cpp:476] Slave hostname: 90665f991b70 I0129 02:00:35.454794 31950 state.cpp:58] Recovering state from '/tmp/ContentType_ExecutorHttpApiTest_DefaultAccept_1_r4GUhM/meta' I0129 02:00:35.455080 31948 status_update_manager.cpp:200] Recovering status update manager I0129 02:00:35.455225 31926 sched.cpp:222] Version: 0.28.0 I0129 02:00:35.455535 31956 slave.cpp:4495] Finished recovery I0129 02:00:35.455798 31945 sched.cpp:326] New master detected at master@172.17.0.3:60904 I0129 02:00:35.455879 31945 sched.cpp:382] Authenticating with master master@172.17.0.3:60904 I0129 02:00:35.455904 31945 sched.cpp:389] Using default CRAM-MD5 authenticatee I0129 02:00:35.455943 31956 slave.cpp:4667] Querying resource estimator for oversubscribable resources I0129 02:00:35.456167 31950 authenticatee.cpp:121] Creating new client SASL connection I0129 02:00:35.456218 31953 status_update_manager.cpp:174] Pausing sending status updates I0129 02:00:35.456219 31956 slave.cpp:795] New master detected at master@172.17.0.3:60904 I0129 02:00:35.456298 31956 slave.cpp:858] Authenticating with master master@172.17.0.3:60904 I0129 02:00:35.456323 31956 slave.cpp:863] Using default CRAM-MD5 authenticatee I0129 02:00:35.456490 31948 authenticatee.cpp:121] Creating new client SASL connection I0129 02:00:35.456492 31956 slave.cpp:831] Detecting new master I0129 02:00:35.456588 31946 master.cpp:5521] Authenticating scheduler-93e745f0-0e48-4a8f-b227-93569976c5e8@172.17.0.3:60904 I0129 02:00:35.456686 31956 slave.cpp:4681] Received oversubscribable resources from the resource estimator I0129 02:00:35.456805 31953 authenticator.cpp:413] Starting authentication session for crammd5_authenticatee(804)@172.17.0.3:60904 I0129 02:00:35.456878 31946 master.cpp:5521] Authenticating slave(372)@172.17.0.3:60904 I0129 02:00:35.457124 31953 authenticator.cpp:413] Starting authentication session for crammd5_authenticatee(805)@172.17.0.3:60904 I0129 02:00:35.457157 31948 authenticator.cpp:98] Creating new server SASL connection I0129 02:00:35.457373 31946 authenticatee.cpp:212] Received SASL authentication mechanisms: CRAM-MD5 I0129 02:00:35.457381 31951 authenticator.cpp:98] Creating new server SASL connection I0129 02:00:35.457491 31946 authenticatee.cpp:238] Attempting to authenticate with mechanism 'CRAM-MD5' I0129 02:00:35.457598 31946 authenticatee.cpp:212] Received SASL authentication mechanisms: CRAM-MD5 I0129 02:00:35.457612 31951 authenticator.cpp:203] Received SASL authentication start I0129 02:00:35.457635 31946 authenticatee.cpp:238] Attempting to authenticate with mechanism 'CRAM-MD5' I0129 02:00:35.457680 31951 authenticator.cpp:325] Authentication requires more steps I0129 02:00:35.457767 31954 authenticator.cpp:203] Received SASL authentication start I0129 02:00:35.457768 31948 authenticatee.cpp:258] Received SASL authentication step I0129 02:00:35.457830 31954 authenticator.cpp:325] Authentication requires more steps I0129 02:00:35.457885 31948 authenticator.cpp:231] Received SASL authentication step I0129 02:00:35.457918 31948 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: '90665f991b70' server FQDN: '90665f991b70' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0129 02:00:35.457933 31948 auxprop.cpp:179] Looking up auxiliary property '*userPassword' I0129 02:00:35.457954 31959 authenticatee.cpp:258] Received SASL authentication step I0129 02:00:35.457993 31948 auxprop.cpp:179] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0129 02:00:35.458031 31948 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: '90665f991b70' server FQDN: '90665f991b70' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0129 02:00:35.458050 31948 auxprop.cpp:129] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0129 02:00:35.458065 31948 auxprop.cpp:129] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0129 02:00:35.458096 31948 authenticator.cpp:317] Authentication success I0129 02:00:35.458112 31944 authenticator.cpp:231] Received SASL authentication step I0129 02:00:35.458142 31944 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: '90665f991b70' server FQDN: '90665f991b70' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0129 02:00:35.458173 31944 auxprop.cpp:179] Looking up auxiliary property '*userPassword' I0129 02:00:35.458206 31954 authenticatee.cpp:298] Authentication success I0129 02:00:35.458256 31957 master.cpp:5551] Successfully authenticated principal 'test-principal' at scheduler-93e745f0-0e48-4a8f-b227-93569976c5e8@172.17.0.3:60904 I0129 02:00:35.458206 31944 auxprop.cpp:179] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0129 02:00:35.458360 31944 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: '90665f991b70' server FQDN: '90665f991b70' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0129 02:00:35.458382 31944 auxprop.cpp:129] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0129 02:00:35.458397 31944 auxprop.cpp:129] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0129 02:00:35.458489 31944 authenticator.cpp:317] Authentication success I0129 02:00:35.458623 31953 sched.cpp:471] Successfully authenticated with master master@172.17.0.3:60904 I0129 02:00:35.458649 31953 sched.cpp:780] Sending SUBSCRIBE call to master@172.17.0.3:60904 I0129 02:00:35.458653 31956 authenticator.cpp:431] Authentication session cleanup for crammd5_authenticatee(804)@172.17.0.3:60904 I0129 02:00:35.458673 31951 authenticatee.cpp:298] Authentication success I0129 02:00:35.458709 31952 master.cpp:5551] Successfully authenticated principal 'test-principal' at slave(372)@172.17.0.3:60904 I0129 02:00:35.458906 31955 slave.cpp:926] Successfully authenticated with master master@172.17.0.3:60904 I0129 02:00:35.458983 31956 authenticator.cpp:431] Authentication session cleanup for crammd5_authenticatee(805)@172.17.0.3:60904 I0129 02:00:35.459033 31955 slave.cpp:1320] Will retry registration in 7.075135ms if necessary I0129 02:00:35.459128 31953 sched.cpp:813] Will retry registration in 86.579738ms if necessary I0129 02:00:35.459193 31950 master.cpp:4235] Registering slave at slave(372)@172.17.0.3:60904 (90665f991b70) with id 823f2212-bf28-4dd6-959d-796029d32afb-S0 I0129 02:00:35.459489 31950 master.cpp:2278] Received SUBSCRIBE call for framework 'default' at scheduler-93e745f0-0e48-4a8f-b227-93569976c5e8@172.17.0.3:60904 I0129 02:00:35.459513 31950 master.cpp:1749] Authorizing framework principal 'test-principal' to receive offers for role '*' I0129 02:00:35.459516 31959 registrar.cpp:439] Applied 1 operations in 62499ns; attempting to update the 'registry' I0129 02:00:35.459766 31956 master.cpp:2349] Subscribing framework default with checkpointing disabled and capabilities [ ] I0129 02:00:35.460095 31955 log.cpp:683] Attempting to append 339 bytes to the log I0129 02:00:35.460192 31948 hierarchical.cpp:265] Added framework 823f2212-bf28-4dd6-959d-796029d32afb-0000 I0129 02:00:35.460247 31956 sched.cpp:707] Framework registered with 823f2212-bf28-4dd6-959d-796029d32afb-0000 I0129 02:00:35.460314 31958 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 3 I0129 02:00:35.460388 31948 hierarchical.cpp:1403] No resources available to allocate! I0129 02:00:35.460449 31948 hierarchical.cpp:1498] No inverse offers to send out! I0129 02:00:35.460402 31956 sched.cpp:721] Scheduler::registered took 136519ns I0129 02:00:35.460482 31948 hierarchical.cpp:1096] Performed allocation for 0 slaves in 158218ns I0129 02:00:35.461187 31944 replica.cpp:537] Replica received write request for position 3 from (11829)@172.17.0.3:60904 I0129 02:00:35.467929 31954 slave.cpp:1320] Will retry registration in 14.701381ms if necessary I0129 02:00:35.468183 31952 master.cpp:4223] Ignoring register slave message from slave(372)@172.17.0.3:60904 (90665f991b70) as admission is already in progress I0129 02:00:35.483300 31959 slave.cpp:1320] Will retry registration in 8.003223ms if necessary I0129 02:00:35.483500 31946 master.cpp:4223] Ignoring register slave message from slave(372)@172.17.0.3:60904 (90665f991b70) as admission is already in progress I0129 02:00:35.491843 31945 slave.cpp:1320] Will retry registration in 52.952447ms if necessary I0129 02:00:35.491962 31948 master.cpp:4223]...",3 MESOS-4557,"Automatically generate command-line flag documentation","To ensure that the command-line flag documentation in {{configuration.md}} stays in sync with the help strings in the various {{flags.cpp}} files, it could be beneficial to automate the generation of those docs. Such a script could be run as part of the build process, ensuring that changes to the help strings would show up in the documentation as well. In addition to parsing and formatting the help strings for display as HTML, this could also involve specifying collections of flags to be grouped together in order to provide logical structure to the {{configuration.md}} documentation.",3 MESOS-4558,"Reduce the running time of benchmark tests.","Currently benchmark tests take a long time (>5 hours). It would be nice to reduce the total time taken by the benchmark tests to enable us to run them on ASF CI. Command to run only benchmark tests {code} MESOS_BENCHMARK=1 GTEST_FILTER=""*BENCHMARK*"" make check {code}",2 MESOS-4559,"Run benchmark tests in ASF CI","The build job is already created on ASF CI (https://builds.apache.org/job/Mesos-Benchmarks/) but is currently disabled due to MESOS-4558.",2 MESOS-4562,"Mesos UI shows wrong count for ""started"" tasks","The task started field shows the number of tasks in state ""TASKS_STARTING"" as opposed to those in ""TASK_RUNNING"" state.",2 MESOS-4564,"Separate Appc protobuf messages to its own file.","It would be cleaner to keep the Appc protobuf messages separate from other mesos messages.",2 MESOS-4566,"Avoid unnecessary temporary `std::string` constructions and copies in `jsonify`.","A few of the critical code paths in {{jsonify}} involve unnecessary temporary string construction and copies (inherited from the {{JSON::*}}). For example, {{strings::trim}} is used to remove trailing 0s from printing {{double}}s. We print {{double}}s a lot, and therefore constructing a temporary {{std::string}} on printing of every double is extremely costly. This ticket captures the work involved in avoiding them.",1 MESOS-4567,"Deprecate TASK_STARTING state","We currently have the following task stages: * TASK_STAGING -> set by slave * TASK_STARTING -> set by the executor (?) * TASK_RUNNING -> set by the executor when the task is running * TASK_XXX -> task termination statuses The confusion here is about TASK_STARTING. This is the state between TASK_STAGING and TASK_RUNNING and is somewhat non-intuitive for the reader. Further, looks like no where in the source code, we are setting the TASK_STARTING state. Why shouldn't we just deprecate/remove it?",2 MESOS-4570,"DockerFetcherPluginTest.INTERNET_CURL_FetchImage seems flaky.","{noformat} ../configure --enable-ssl --enable-libevent && make check {noformat} {noformat} --gtest_repeat=-1 --gtest_break_on_failure --gtest_filter=DockerFetcherPluginTest.INTERNET_CURL_FetchImage {noformat} Failed at the 22nd run. {noformat} [ RUN ] DockerFetcherPluginTest.INTERNET_CURL_FetchImage ../../src/tests/uri_fetcher_tests.cpp:276: Failure Failed to wait 15secs for fetcher.get()->fetch(uri, dir) *** Aborted at 1454207653 (unix time) try ""date -d @1454207653"" if you are using GNU date *** PC: @ 0x167023a testing::UnitTest::AddTestPartResult() *** SIGSEGV (@0x0) received by PID 19868 (TID 0x7f500fc877c0) from PID 0; stack trace: *** @ 0x7f5008f368d0 (unknown) @ 0x167023a testing::UnitTest::AddTestPartResult() @ 0x1664c73 testing::internal::AssertHelper::operator=() @ 0x146ac6f mesos::internal::tests::DockerFetcherPluginTest_INTERNET_CURL_FetchImage_Test::TestBody() @ 0x168dc70 testing::internal::HandleSehExceptionsInMethodIfSupported<>() @ 0x1688cc8 testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x166a013 testing::Test::Run() @ 0x166a7a1 testing::TestInfo::Run() @ 0x166addc testing::TestCase::Run() @ 0x167172b testing::internal::UnitTestImpl::RunAllTests() @ 0x168e8ff testing::internal::HandleSehExceptionsInMethodIfSupported<>() @ 0x168981e testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x167045b testing::UnitTest::Run() @ 0xe2d476 RUN_ALL_TESTS() @ 0xe2d08c main @ 0x7f5008b9fb45 (unknown) @ 0x9c6bf9 (unknown) {noformat}",1 MESOS-4573,"Design doc for scheduler HTTP Stream IDs","This ticket is for the design of HTTP stream IDs, for use with HTTP schedulers. These IDs allow Mesos to distinguish between different instances of HTTP framework schedulers.",5 MESOS-4575,"Fix Appc image caching to share with image fetcher","As Appc image fetcher is being developed, Image cache needs to be shared between store and the image fetcher.",3 MESOS-4576,"Introduce a stout helper for ""which""","We may want to add a helper to {{stout/os.hpp}} that will natively emulate the functionality of the Linux utility {{which}}. i.e. {code} Option which(const string& command) { Option path = os::getenv(""PATH""); // Loop through path and return the first one which os::exists(...). return None(); } {code} This helper may be useful: * for test filters in {{src/tests/environment.cpp}} * a few tests in {{src/tests/containerizer/port_mapping_tests.cpp}} * the {{sha512}} utility in {{src/common/command_utils.cpp}} * as runtime checks in the {{LogrotateContainerLogger}} * etc.",2 MESOS-4582,"state.json serving duplicate ""active"" fields","state.json is serving duplicate ""active"" fields in frameworks. See the framework ""47df96c2-3f85-4bc5-b781-709b2c30c752-0000"" In the attached file",1 MESOS-4583,"Rename `examples/event_call_framework.cpp` to `examples/test_http_framework.cpp`","We already have {{examples/test_framework.cpp}} for testing {{PID}} based frameworks. We would ideally want to rename {{event_call_framework}} to correctly reflect that it's an example for HTTP based framework.",1 MESOS-4584,"Update Rakefile for mesos site generation","The stuff in site/ directory needs some updates to make it easier to generate updates for mesos.apache.org site.",2 MESOS-4590,"Add test case for reservations with same role, different principals","We don't have a test case that covers $SUBJECT; we probably should.",2 MESOS-4591,"`/reserve` and `/create-volumes` endpoints allow operations for any role","When frameworks reserve resources, the validation of the operation ensures that the {{role}} of the reservation matches the {{role}} of the framework. For the case of the {{/reserve}} operator endpoint, however, the operator has no role to validate, so this check isn't performed. This means that if an ACL exists which authorizes a framework's principal to reserve resources, that same principal can be used to reserve resources for _any_ role through the operator endpoint. We should restrict reservations made through the operator endpoint to specified roles. A few possibilities: * The {{object}} of the {{reserve_resources}} ACL could be changed from {{resources}} to {{roles}} * A second ACL could be added for authorization of {{reserve}} operations, with an {{object}} of {{role}} * Our conception of the {{resources}} object in the {{reserve_resources}} ACL could be expanded to include role information, i.e., {{disk(role1);mem(role1)}}",3 MESOS-4596,"Add common Appc spec utilities."," Add common utility functions such as : - validating image information against actual data in the image directory. - getting list of dependencies at depth 1 for an image. - getting image path simple image discovery. ",2 MESOS-4598,"Logrotate ContainerLogger should not remove IP from environment.","The {{LogrotateContainerLogger}} starts libprocess-using subprocesses. Libprocess initialization will attempt to resolve the IP from the hostname. If a DNS service is not available, this step will fail, which terminates the logger subprocess prematurely. Since the logger subprocesses live on the agent, they should use the same {{LIBPROCESS_IP}} supplied to the agent.",1 MESOS-4600,"Use `std::quoted` for strings in error messages","We'd like to have a consistent format for error strings through the code base. As per this comment: [MESOS-3772|https://issues.apache.org/jira/browse/MESOS-3772?focusedCommentId=14965652&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14965652] We can then overload the stream operator to make sur strings are quoted as needed. Note: We need to first require compilers that support C++14. For now we have to wait for MSVC to be part of that list.",3 MESOS-4604,"ROOT_DOCKER_DockerHealthyTask is flaky.","Log from Teamcity that is running {{sudo ./bin/mesos-tests.sh}} on AWS EC2 instances: {noformat} [18:27:14][Step 8/8] [----------] 8 tests from HealthCheckTest [18:27:14][Step 8/8] [ RUN ] HealthCheckTest.HealthyTask [18:27:17][Step 8/8] [ OK ] HealthCheckTest.HealthyTask (2222 ms) [18:27:17][Step 8/8] [ RUN ] HealthCheckTest.ROOT_DOCKER_DockerHealthyTask [18:27:36][Step 8/8] ../../src/tests/health_check_tests.cpp:388: Failure [18:27:36][Step 8/8] Failed to wait 15secs for termination [18:27:36][Step 8/8] F0204 18:27:35.981302 23085 logging.cpp:64] RAW: Pure virtual method called [18:27:36][Step 8/8] @ 0x7f7077055e1c google::LogMessage::Fail() [18:27:36][Step 8/8] @ 0x7f707705ba6f google::RawLog__() [18:27:36][Step 8/8] @ 0x7f70760f76c9 __cxa_pure_virtual [18:27:36][Step 8/8] @ 0xa9423c mesos::internal::tests::Cluster::Slaves::shutdown() [18:27:36][Step 8/8] @ 0x1074e45 mesos::internal::tests::MesosTest::ShutdownSlaves() [18:27:36][Step 8/8] @ 0x1074de4 mesos::internal::tests::MesosTest::Shutdown() [18:27:36][Step 8/8] @ 0x1070ec7 mesos::internal::tests::MesosTest::TearDown() [18:27:36][Step 8/8] @ 0x16eb7b2 testing::internal::HandleSehExceptionsInMethodIfSupported<>() [18:27:36][Step 8/8] @ 0x16e61a9 testing::internal::HandleExceptionsInMethodIfSupported<>() [18:27:36][Step 8/8] @ 0x16c56aa testing::Test::Run() [18:27:36][Step 8/8] @ 0x16c5e89 testing::TestInfo::Run() [18:27:36][Step 8/8] @ 0x16c650a testing::TestCase::Run() [18:27:36][Step 8/8] @ 0x16cd1f6 testing::internal::UnitTestImpl::RunAllTests() [18:27:36][Step 8/8] @ 0x16ec513 testing::internal::HandleSehExceptionsInMethodIfSupported<>() [18:27:36][Step 8/8] @ 0x16e6df1 testing::internal::HandleExceptionsInMethodIfSupported<>() [18:27:36][Step 8/8] @ 0x16cbe26 testing::UnitTest::Run() [18:27:36][Step 8/8] @ 0xe54c84 RUN_ALL_TESTS() [18:27:36][Step 8/8] @ 0xe54867 main [18:27:36][Step 8/8] @ 0x7f7071560a40 (unknown) [18:27:36][Step 8/8] @ 0x9b52d9 _start [18:27:36][Step 8/8] Aborted (core dumped) [18:27:36][Step 8/8] Process exited with code 134 {noformat} Happens with Ubuntu 15.04, CentOS 6, CentOS 7 _quite_ often. ",2 MESOS-4609,"Subprocess should be more intelligent about setting/inheriting libprocess environment variables ","Mostly copied from [this comment|https://issues.apache.org/jira/browse/MESOS-4598?focusedCommentId=15133497&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15133497] A subprocess inheriting the environment variables {{LIBPROCESS_*}} may run into some accidental fatalities: | || Subprocess uses libprocess || Subprocess is something else || || Subprocess sets/inherits the same {{PORT}} by accident | Bind failure -> exit | Nothing happens (?) | || Subprocess sets a different {{PORT}} on purpose | Bind success (?) | Nothing happens (?) | (?) = means this is usually the case, but not 100%. A complete fix would look something like: * If the {{subprocess}} call gets {{environment = None()}}, we should automatically remove {{LIBPROCESS_PORT}} from the inherited environment. * The parts of [{{executorEnvironment}}|https://github.com/apache/mesos/blame/master/src/slave/containerizer/containerizer.cpp#L265] dealing with libprocess & libmesos should be refactored into libprocess as a helper. We would use this helper for the Containerizer, Fetcher, and ContainerLogger module. * If the {{subprocess}} call is given {{LIBPROCESS_PORT == os::getenv(""LIBPROCESS_PORT"")}}, we can LOG(WARN) and unset the env var locally.",2 MESOS-4611,"Passing a lambda to dispatch() always matches the template returning void","The following idiom does not currently compile: {code} Future initialized = dispatch(pid, [] () -> Nothing { return Nothing(); }); {code} This seems non-intuitive because the following template exists for dispatch: {code} template Future dispatch(const UPID& pid, const std::function& f) { std::shared_ptr> promise(new Promise()); std::shared_ptr> f_( new std::function( [=](ProcessBase*) { promise->set(f()); })); internal::dispatch(pid, f_); return promise->future(); } {code} However, lambdas cannot be implicitly cast to a corresponding std::function type. To make this work, you have to explicitly type the lambda before passing it to dispatch. {code} std::function f = []() { return Nothing(); }; Future initialized = dispatch(pid, f); {code} We should add template support to allow lambdas to be passed to dispatch() without explicit typing. ",5 MESOS-4612,"Update vendored ZooKeeper to 3.4.8","See: http://zookeeper.apache.org/doc/r3.4.8/releasenotes.html for improvements / bug fixes Added a new patch that solved [ZOOKEEPER-1643](https://issues.apache.org/jira/browse/ZOOKEEPER-1643) The original patch: ",3 MESOS-4614,"SlaveRecoveryTest/0.CleanupHTTPExecutor is flaky","Just saw this failure on the ASF CI: {code} [ RUN ] SlaveRecoveryTest/0.CleanupHTTPExecutor I0206 00:22:44.791671 2824 leveldb.cpp:174] Opened db in 2.539372ms I0206 00:22:44.792459 2824 leveldb.cpp:181] Compacted db in 740473ns I0206 00:22:44.792510 2824 leveldb.cpp:196] Created db iterator in 24164ns I0206 00:22:44.792532 2824 leveldb.cpp:202] Seeked to beginning of db in 1831ns I0206 00:22:44.792548 2824 leveldb.cpp:271] Iterated through 0 keys in the db in 342ns I0206 00:22:44.792605 2824 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0206 00:22:44.793256 2847 recover.cpp:447] Starting replica recovery I0206 00:22:44.793480 2847 recover.cpp:473] Replica is in EMPTY status I0206 00:22:44.794538 2847 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (9472)@172.17.0.2:43484 I0206 00:22:44.795040 2848 recover.cpp:193] Received a recover response from a replica in EMPTY status I0206 00:22:44.795644 2848 recover.cpp:564] Updating replica status to STARTING I0206 00:22:44.796519 2850 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 752810ns I0206 00:22:44.796545 2850 replica.cpp:320] Persisted replica status to STARTING I0206 00:22:44.796725 2848 recover.cpp:473] Replica is in STARTING status I0206 00:22:44.797828 2857 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (9473)@172.17.0.2:43484 I0206 00:22:44.798355 2850 recover.cpp:193] Received a recover response from a replica in STARTING status I0206 00:22:44.799193 2850 recover.cpp:564] Updating replica status to VOTING I0206 00:22:44.799583 2855 master.cpp:376] Master 0b206a40-a9c3-4d44-a5bd-8032d60a32ca (6632562f1ade) started on 172.17.0.2:43484 I0206 00:22:44.799609 2855 master.cpp:378] Flags at startup: --acls="""" --allocation_interval=""1secs"" --allocator=""HierarchicalDRF"" --authenticate=""true"" --authenticate_http=""true"" --authenticate_slaves=""true"" --authenticators=""crammd5"" --authorizers=""local"" --credentials=""/tmp/n2FxQV/credentials"" --framework_sorter=""drf"" --help=""false"" --hostname_lookup=""true"" --http_authenticators=""basic"" --initialize_driver_logging=""true"" --log_auto_initialize=""true"" --logbufsecs=""0"" --logging_level=""INFO"" --max_completed_frameworks=""50"" --max_completed_tasks_per_framework=""1000"" --max_slave_ping_timeouts=""5"" --quiet=""false"" --recovery_slave_removal_limit=""100%"" --registry=""replicated_log"" --registry_fetch_timeout=""1mins"" --registry_store_timeout=""100secs"" --registry_strict=""true"" --root_submissions=""true"" --slave_ping_timeout=""15secs"" --slave_reregister_timeout=""10mins"" --user_sorter=""drf"" --version=""false"" --webui_dir=""/mesos/mesos-0.28.0/_inst/share/mesos/webui"" --work_dir=""/tmp/n2FxQV/master"" --zk_session_timeout=""10secs"" I0206 00:22:44.799991 2855 master.cpp:423] Master only allowing authenticated frameworks to register I0206 00:22:44.800009 2855 master.cpp:428] Master only allowing authenticated slaves to register I0206 00:22:44.800020 2855 credentials.hpp:35] Loading credentials for authentication from '/tmp/n2FxQV/credentials' I0206 00:22:44.800245 2850 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 679345ns I0206 00:22:44.800370 2850 replica.cpp:320] Persisted replica status to VOTING I0206 00:22:44.800397 2855 master.cpp:468] Using default 'crammd5' authenticator I0206 00:22:44.800693 2855 master.cpp:537] Using default 'basic' HTTP authenticator I0206 00:22:44.800815 2855 master.cpp:571] Authorization enabled I0206 00:22:44.801216 2850 recover.cpp:578] Successfully joined the Paxos group I0206 00:22:44.801604 2850 recover.cpp:462] Recover process terminated I0206 00:22:44.801759 2856 whitelist_watcher.cpp:77] No whitelist given I0206 00:22:44.801725 2847 hierarchical.cpp:144] Initialized hierarchical allocator process I0206 00:22:44.803982 2855 master.cpp:1712] The newly elected leader is master@172.17.0.2:43484 with id 0b206a40-a9c3-4d44-a5bd-8032d60a32ca I0206 00:22:44.804026 2855 master.cpp:1725] Elected as the leading master! I0206 00:22:44.804059 2855 master.cpp:1470] Recovering from registrar I0206 00:22:44.804424 2855 registrar.cpp:307] Recovering registrar I0206 00:22:44.805202 2855 log.cpp:659] Attempting to start the writer I0206 00:22:44.806782 2856 replica.cpp:493] Replica received implicit promise request from (9475)@172.17.0.2:43484 with proposal 1 I0206 00:22:44.807368 2856 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 547939ns I0206 00:22:44.807395 2856 replica.cpp:342] Persisted promised to 1 I0206 00:22:44.808375 2856 coordinator.cpp:238] Coordinator attempting to fill missing positions I0206 00:22:44.809460 2848 replica.cpp:388] Replica received explicit promise request from (9476)@172.17.0.2:43484 for position 0 with proposal 2 I0206 00:22:44.809929 2848 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 427561ns I0206 00:22:44.809967 2848 replica.cpp:712] Persisted action at 0 I0206 00:22:44.811035 2850 replica.cpp:537] Replica received write request for position 0 from (9477)@172.17.0.2:43484 I0206 00:22:44.811149 2850 leveldb.cpp:436] Reading position from leveldb took 36452ns I0206 00:22:44.811532 2850 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 318924ns I0206 00:22:44.811615 2850 replica.cpp:712] Persisted action at 0 I0206 00:22:44.812532 2850 replica.cpp:691] Replica received learned notice for position 0 from @0.0.0.0:0 I0206 00:22:44.813117 2850 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 476530ns I0206 00:22:44.813143 2850 replica.cpp:712] Persisted action at 0 I0206 00:22:44.813166 2850 replica.cpp:697] Replica learned NOP action at position 0 I0206 00:22:44.813984 2848 log.cpp:675] Writer started with ending position 0 I0206 00:22:44.815549 2848 leveldb.cpp:436] Reading position from leveldb took 31800ns I0206 00:22:44.817061 2848 registrar.cpp:340] Successfully fetched the registry (0B) in 12.591104ms I0206 00:22:44.817319 2848 registrar.cpp:439] Applied 1 operations in 63480ns; attempting to update the 'registry' I0206 00:22:44.818780 2845 log.cpp:683] Attempting to append 170 bytes to the log I0206 00:22:44.818981 2845 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 1 I0206 00:22:44.819941 2845 replica.cpp:537] Replica received write request for position 1 from (9478)@172.17.0.2:43484 I0206 00:22:44.820582 2845 leveldb.cpp:341] Persisting action (189 bytes) to leveldb took 600949ns I0206 00:22:44.820608 2845 replica.cpp:712] Persisted action at 1 I0206 00:22:44.821552 2845 replica.cpp:691] Replica received learned notice for position 1 from @0.0.0.0:0 I0206 00:22:44.821934 2845 leveldb.cpp:341] Persisting action (191 bytes) to leveldb took 352813ns I0206 00:22:44.821960 2845 replica.cpp:712] Persisted action at 1 I0206 00:22:44.821979 2845 replica.cpp:697] Replica learned APPEND action at position 1 I0206 00:22:44.823447 2845 registrar.cpp:484] Successfully updated the 'registry' in 5.987072ms I0206 00:22:44.823580 2845 registrar.cpp:370] Successfully recovered registrar I0206 00:22:44.823833 2845 log.cpp:702] Attempting to truncate the log to 1 I0206 00:22:44.824203 2845 master.cpp:1522] Recovered 0 slaves from the Registry (131B) ; allowing 10mins for slaves to re-register I0206 00:22:44.824291 2845 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 2 I0206 00:22:44.824645 2845 hierarchical.cpp:171] Skipping recovery of hierarchical allocator: nothing to recover I0206 00:22:44.825222 2850 replica.cpp:537] Replica received write request for position 2 from (9479)@172.17.0.2:43484 I0206 00:22:44.825742 2850 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 481617ns I0206 00:22:44.825772 2850 replica.cpp:712] Persisted action at 2 I0206 00:22:44.826748 2852 replica.cpp:691] Replica received learned notice for position 2 from @0.0.0.0:0 I0206 00:22:44.827368 2852 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 588591ns I0206 00:22:44.827432 2852 leveldb.cpp:399] Deleting ~1 keys from leveldb took 33059ns I0206 00:22:44.827450 2852 replica.cpp:712] Persisted action at 2 I0206 00:22:44.827468 2852 replica.cpp:697] Replica learned TRUNCATE action at position 2 I0206 00:22:44.838011 2824 containerizer.cpp:149] Using isolation: posix/cpu,posix/mem,filesystem/posix W0206 00:22:44.838873 2824 backend.cpp:48] Failed to create 'bind' backend: BindBackend requires root privileges I0206 00:22:44.843785 2857 slave.cpp:193] Slave started on 172.17.0.2:43484 I0206 00:22:44.843819 2857 slave.cpp:194] Flags at startup: --appc_store_dir=""/tmp/mesos/store/appc"" --authenticatee=""crammd5"" --cgroups_cpu_enable_pids_and_tids_count=""false"" --cgroups_enable_cfs=""false"" --cgroups_hierarchy=""/sys/fs/cgroup"" --cgroups_limit_swap=""false"" --cgroups_root=""mesos"" --container_disk_watch_interval=""15secs"" --containerizers=""mesos"" --credential=""/tmp/SlaveRecoveryTest_0_CleanupHTTPExecutor_kAXwvw/credential"" --default_role=""*"" --disk_watch_interval=""1mins"" --docker=""docker"" --docker_auth_server=""https://auth.docker.io"" --docker_kill_orphans=""true"" --docker_puller_timeout=""60"" --docker_registry=""https://registry-1.docker.io"" --docker_remove_delay=""6hrs"" --docker_socket=""/var/run/docker.sock"" --docker_stop_timeout=""0ns"" --docker_store_dir=""/tmp/mesos/store/docker"" --enforce_container_disk_quota=""false"" --executor_registration_timeout=""1mins"" --executor_shutdown_grace_period=""5secs"" --fetcher_cache_dir=""/tmp/SlaveRecoveryTest_0_CleanupHTTPExecutor_kAXwvw/fetch"" --fetcher_cache_size=""2GB"" --frameworks_home="""" --gc_delay=""1weeks"" --gc_disk_headroom=""0.1"" --hadoop_home="""" --help=""false"" --hostname_lookup=""true"" --image_provisioner_backend=""copy"" --initialize_driver_logging=""true"" --isolation=""posix/cpu,posix/mem"" --launcher_dir=""/mesos/mesos-0.28.0/_build/src"" --logbufsecs=""0"" --logging_level=""INFO"" --oversubscribed_resources_interval=""15secs"" --perf_duration=""10secs"" --perf_interval=""1mins"" --qos_correction_interval_min=""0ns"" --quiet=""false"" --recover=""reconnect"" --recovery_timeout=""15mins"" --registration_backoff_factor=""10ms"" --resources=""cpus:2;mem:1024;disk:1024;ports:[31000-32000]"" --revocable_cpu_low_priority=""true"" --sandbox_directory=""/mnt/mesos/sandbox"" --strict=""true"" --switch_user=""true"" --systemd_runtime_directory=""/run/systemd/system"" --version=""false"" --work_dir=""/tmp/SlaveRecoveryTest_0_CleanupHTTPExecutor_kAXwvw"" I0206 00:22:44.844292 2857 credentials.hpp:83] Loading credential for authentication from '/tmp/SlaveRecoveryTest_0_CleanupHTTPExecutor_kAXwvw/credential' I0206 00:22:44.844518 2857 slave.cpp:324] Slave using credential for: test-principal I0206 00:22:44.844696 2857 resources.cpp:564] Parsing resources as JSON failed: cpus:2;mem:1024;disk:1024;ports:[31000-32000] Trying semicolon-delimited string format instead I0206 00:22:44.845243 2857 slave.cpp:464] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0206 00:22:44.845326 2857 slave.cpp:472] Slave attributes: [ ] I0206 00:22:44.845342 2857 slave.cpp:477] Slave hostname: 6632562f1ade I0206 00:22:44.845953 2824 sched.cpp:222] Version: 0.28.0 I0206 00:22:44.846853 2848 sched.cpp:326] New master detected at master@172.17.0.2:43484 I0206 00:22:44.846936 2848 sched.cpp:382] Authenticating with master master@172.17.0.2:43484 I0206 00:22:44.846958 2848 sched.cpp:389] Using default CRAM-MD5 authenticatee I0206 00:22:44.847692 2858 state.cpp:58] Recovering state from '/tmp/SlaveRecoveryTest_0_CleanupHTTPExecutor_kAXwvw/meta' I0206 00:22:44.848108 2850 status_update_manager.cpp:200] Recovering status update manager I0206 00:22:44.848325 2852 containerizer.cpp:397] Recovering containerizer I0206 00:22:44.848603 2845 authenticatee.cpp:121] Creating new client SASL connection I0206 00:22:44.849719 2845 master.cpp:5523] Authenticating scheduler-63899759-d7fc-42b2-8371-57484f352895@172.17.0.2:43484 I0206 00:22:44.850052 2852 authenticator.cpp:413] Starting authentication session for crammd5_authenticatee(662)@172.17.0.2:43484 I0206 00:22:44.850227 2854 provisioner.cpp:245] Provisioner recovery complete I0206 00:22:44.850410 2852 authenticator.cpp:98] Creating new server SASL connection I0206 00:22:44.850692 2852 authenticatee.cpp:212] Received SASL authentication mechanisms: CRAM-MD5 I0206 00:22:44.850720 2852 authenticatee.cpp:238] Attempting to authenticate with mechanism 'CRAM-MD5' I0206 00:22:44.850805 2852 authenticator.cpp:203] Received SASL authentication start I0206 00:22:44.850862 2852 authenticator.cpp:325] Authentication requires more steps I0206 00:22:44.850939 2852 authenticatee.cpp:258] Received SASL authentication step I0206 00:22:44.851027 2852 authenticator.cpp:231] Received SASL authentication step I0206 00:22:44.851052 2852 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: '6632562f1ade' server FQDN: '6632562f1ade' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0206 00:22:44.851063 2852 auxprop.cpp:179] Looking up auxiliary property '*userPassword' I0206 00:22:44.851102 2852 auxprop.cpp:179] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0206 00:22:44.851121 2852 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: '6632562f1ade' server FQDN: '6632562f1ade' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0206 00:22:44.851130 2852 auxprop.cpp:129] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0206 00:22:44.851136 2852 auxprop.cpp:129] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0206 00:22:44.851150 2852 authenticator.cpp:317] Authentication success I0206 00:22:44.851219 2850 authenticatee.cpp:298] Authentication success I0206 00:22:44.851310 2850 master.cpp:5553] Successfully authenticated principal 'test-principal' at scheduler-63899759-d7fc-42b2-8371-57484f352895@172.17.0.2:43484 I0206 00:22:44.851485 2849 slave.cpp:4496] Finished recovery I0206 00:22:44.852154 2843 sched.cpp:471] Successfully authenticated with master master@172.17.0.2:43484 I0206 00:22:44.852175 2843 sched.cpp:776] Sending SUBSCRIBE call to master@172.17.0.2:43484 I0206 00:22:44.852262 2843 sched.cpp:809] Will retry registration in 939.183679ms if necessary I0206 00:22:44.852375 2844 master.cpp:2280] Received SUBSCRIBE call for framework 'default' at scheduler-63899759-d7fc-42b2-8371-57484f352895@172.17.0.2:43484 I0206 00:22:44.852448 2844 master.cpp:1751] Authorizing framework principal 'test-principal' to receive offers for role '*' I0206 00:22:44.852699 2852 authenticator.cpp:431] Authentication session cleanup for crammd5_authenticatee(662)@172.17.0.2:43484 I0206 00:22:44.852782 2844 master.cpp:2351] Subscribing framework default with checkpointing enabled and capabilities [ ] I0206 00:22:44.853056 2849 slave.cpp:4668] Querying resource estimator for oversubscribable resources I0206 00:22:44.853421 2856 hierarchical.cpp:265] Added framework 0b206a40-a9c3-4d44-a5bd-8032d60a32ca-0000 I0206 00:22:44.853513 2856 hierarchical.cpp:1403] No resources available to allocate! I0206 00:22:44.853582 2844 sched.cpp:703] Framework registered with 0b206a40-a9c3-4d44-a5bd-8032d60a32ca-0000 I0206 00:22:44.853613 2852 slave.cpp:4682] Received oversubscribable resources from the resource estimator I0206 00:22:44.853663 2844 sched.cpp:717] Scheduler::registered took 53762ns I0206 00:22:44.853899 2843 slave.cpp:796] New master detected at master@172.17.0.2:43484 I0206 00:22:44.853955 2854 status_update_manager.cpp:174] Pausing sending status updates I0206 00:22:44.853997 2856 hierarchical.cpp:1498] No inverse offers to send out! I0206 00:22:44.853960 2843 slave.cpp:859] Authenticating with master master@172.17.0.2:43484 I0206 00:22:44.854035 2843 slave.cpp:864] Using default CRAM-MD5 authenticatee I0206 00:22:44.854030 2856 hierarchical.cpp:1096] Performed allocation for 0 slaves in 581355ns I0206 00:22:44.854182 2843 slave.cpp:832] Detecting new master I0206 00:22:44.854277 2854 authenticatee.cpp:121] Creating new client SASL connection I0206 00:22:44.854517 2843 master.cpp:5523] Authenticating slave@172.17.0.2:43484 I0206 00:22:44.854603 2854 authenticator.cpp:413] Starting authentication session for crammd5_authenticatee(663)@172.17.0.2:43484 I0206 00:22:44.854836 2855 authenticator.cpp:98] Creating new server SASL connection I0206 00:22:44.855013 2852 authenticatee.cpp:212] Received SASL authentication mechanisms: CRAM-MD5 I0206 00:22:44.855044 2852 authenticatee.cpp:238] Attempting to authenticate with mechanism 'CRAM-MD5' I0206 00:22:44.855139 2855 authenticator.cpp:203] Received SASL authentication start I0206 00:22:44.855186 2855 authenticator.cpp:325] Authentication requires more steps I0206 00:22:44.855263 2855 authenticatee.cpp:258] Received SASL authentication step I0206 00:22:44.855352 2855 authenticator.cpp:231] Received SASL authentication step I0206 00:22:44.855381 2855 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: '6632562f1ade' server FQDN: '6632562f1ade' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0206 00:22:44.855389 2855 auxprop.cpp:179] Looking up auxiliary property '*userPassword' I0206 00:22:44.855419 2855 auxprop.cpp:179] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0206 00:22:44.855438 2855 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: '6632562f1ade' server FQDN: '6632562f1ade' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0206 00:22:44.855448 2855 auxprop.cpp:129] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0206 00:22:44.855453 2855 auxprop.cpp:129] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0206 00:22:44.855464 2855 authenticator.cpp:317] Authentication success I0206 00:22:44.855540 2851 authenticatee.cpp:298] Authentication success I0206 00:22:44.855721 2851 authenticator.cpp:431] Authentication session cleanup for crammd5_authenticatee(663)@172.17.0.2:43484 I0206 00:22:44.855832 2852 slave.cpp:927] Successfully authenticated with master master@172.17.0.2:43484 I0206 00:22:44.855615 2855 master.cpp:5553] Successfully authenticated principal 'test-principal' at slave@172.17.0.2:43484 I0206 00:22:44.855973 2852 slave.cpp:1321] Will retry registration in 9.327708ms if necessary I0206 00:22:44.856145 2854 master.cpp:4237] Registering slave at slave@172.17.0.2:43484 (6632562f1ade) with id 0b206a40-a9c3-4d44-a5bd-8032d60a32ca-S0 I0206 00:22:44.856598 2851 registrar.cpp:439] Applied 1 operations in 59112ns; attempting to update the 'registry' I0206 00:22:44.857403 2851 log.cpp:683] Attempting to append 339 bytes to the log I0206 00:22:44.857525 2855 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 3 I0206 00:22:44.858482 2844 replica.cpp:537] Replica received write request for position 3 from (9493)@172.17.0.2:43484 I0206 00:22:44.858755 2844 leveldb.cpp:341] Persisting action (358 bytes) to leveldb took 228484ns I0206 00:22:44.858855 2844 replica.cpp:712] Persisted action at 3 I0206 00:22:44.859751 2852 replica.cpp:691] Replica received learned notice for position 3 from @0.0.0.0:0 I0206 00:22:44.860332 2852 leveldb.cpp:341] Persisting action (360 bytes) to leveldb took 549638ns I0206 00:22:44.860358 2852 replica.cpp:712] Persisted action at 3 I0206 00:22:44.860411 2852 replica.cpp:697] Replica learned APPEND action at position 3 I0206 00:22:44.862709 2856 registrar.cpp:484] Succe...",3 MESOS-4615,"ContainerLoggerTest.DefaultToSandbox is flaky","Just saw this failure on the ASF CI: {code} [ RUN ] ContainerLoggerTest.DefaultToSandbox I0206 01:25:03.766458 2824 leveldb.cpp:174] Opened db in 72.979786ms I0206 01:25:03.811712 2824 leveldb.cpp:181] Compacted db in 45.162067ms I0206 01:25:03.811810 2824 leveldb.cpp:196] Created db iterator in 26090ns I0206 01:25:03.811828 2824 leveldb.cpp:202] Seeked to beginning of db in 3173ns I0206 01:25:03.811839 2824 leveldb.cpp:271] Iterated through 0 keys in the db in 497ns I0206 01:25:03.811900 2824 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0206 01:25:03.812785 2849 recover.cpp:447] Starting replica recovery I0206 01:25:03.813043 2849 recover.cpp:473] Replica is in EMPTY status I0206 01:25:03.814668 2854 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (371)@172.17.0.8:37843 I0206 01:25:03.815210 2849 recover.cpp:193] Received a recover response from a replica in EMPTY status I0206 01:25:03.815732 2854 recover.cpp:564] Updating replica status to STARTING I0206 01:25:03.819664 2857 master.cpp:376] Master 914b62f9-95f6-4c57-a7e3-9b06e2c1c8de (74ef606c4063) started on 172.17.0.8:37843 I0206 01:25:03.819703 2857 master.cpp:378] Flags at startup: --acls="""" --allocation_interval=""1secs"" --allocator=""HierarchicalDRF"" --authenticate=""true"" --authenticate_http=""true"" --authenticate_slaves=""true"" --authenticators=""crammd5"" --authorizers=""local"" --credentials=""/tmp/h5vu5I/credentials"" --framework_sorter=""drf"" --help=""false"" --hostname_lookup=""true"" --http_authenticators=""basic"" --initialize_driver_logging=""true"" --log_auto_initialize=""true"" --logbufsecs=""0"" --logging_level=""INFO"" --max_completed_frameworks=""50"" --max_completed_tasks_per_framework=""1000"" --max_slave_ping_timeouts=""5"" --quiet=""false"" --recovery_slave_removal_limit=""100%"" --registry=""replicated_log"" --registry_fetch_timeout=""1mins"" --registry_store_timeout=""100secs"" --registry_strict=""true"" --root_submissions=""true"" --slave_ping_timeout=""15secs"" --slave_reregister_timeout=""10mins"" --user_sorter=""drf"" --version=""false"" --webui_dir=""/mesos/mesos-0.28.0/_inst/share/mesos/webui"" --work_dir=""/tmp/h5vu5I/master"" --zk_session_timeout=""10secs"" I0206 01:25:03.820241 2857 master.cpp:423] Master only allowing authenticated frameworks to register I0206 01:25:03.820257 2857 master.cpp:428] Master only allowing authenticated slaves to register I0206 01:25:03.820269 2857 credentials.hpp:35] Loading credentials for authentication from '/tmp/h5vu5I/credentials' I0206 01:25:03.821110 2857 master.cpp:468] Using default 'crammd5' authenticator I0206 01:25:03.821311 2857 master.cpp:537] Using default 'basic' HTTP authenticator I0206 01:25:03.821636 2857 master.cpp:571] Authorization enabled I0206 01:25:03.821979 2846 hierarchical.cpp:144] Initialized hierarchical allocator process I0206 01:25:03.822057 2846 whitelist_watcher.cpp:77] No whitelist given I0206 01:25:03.825460 2847 master.cpp:1712] The newly elected leader is master@172.17.0.8:37843 with id 914b62f9-95f6-4c57-a7e3-9b06e2c1c8de I0206 01:25:03.825512 2847 master.cpp:1725] Elected as the leading master! I0206 01:25:03.825533 2847 master.cpp:1470] Recovering from registrar I0206 01:25:03.825835 2847 registrar.cpp:307] Recovering registrar I0206 01:25:03.848212 2854 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 32.226093ms I0206 01:25:03.848299 2854 replica.cpp:320] Persisted replica status to STARTING I0206 01:25:03.848702 2854 recover.cpp:473] Replica is in STARTING status I0206 01:25:03.850728 2858 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (373)@172.17.0.8:37843 I0206 01:25:03.851230 2854 recover.cpp:193] Received a recover response from a replica in STARTING status I0206 01:25:03.852018 2854 recover.cpp:564] Updating replica status to VOTING I0206 01:25:03.881681 2854 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 29.184163ms I0206 01:25:03.881772 2854 replica.cpp:320] Persisted replica status to VOTING I0206 01:25:03.882058 2854 recover.cpp:578] Successfully joined the Paxos group I0206 01:25:03.882258 2854 recover.cpp:462] Recover process terminated I0206 01:25:03.883076 2854 log.cpp:659] Attempting to start the writer I0206 01:25:03.885040 2854 replica.cpp:493] Replica received implicit promise request from (374)@172.17.0.8:37843 with proposal 1 I0206 01:25:03.915132 2854 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 29.980589ms I0206 01:25:03.915215 2854 replica.cpp:342] Persisted promised to 1 I0206 01:25:03.916038 2856 coordinator.cpp:238] Coordinator attempting to fill missing positions I0206 01:25:03.917659 2856 replica.cpp:388] Replica received explicit promise request from (375)@172.17.0.8:37843 for position 0 with proposal 2 I0206 01:25:03.948698 2856 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 30.974607ms I0206 01:25:03.948786 2856 replica.cpp:712] Persisted action at 0 I0206 01:25:03.950920 2849 replica.cpp:537] Replica received write request for position 0 from (376)@172.17.0.8:37843 I0206 01:25:03.951011 2849 leveldb.cpp:436] Reading position from leveldb took 44263ns I0206 01:25:03.982026 2849 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 30.947321ms I0206 01:25:03.982225 2849 replica.cpp:712] Persisted action at 0 I0206 01:25:03.983867 2849 replica.cpp:691] Replica received learned notice for position 0 from @0.0.0.0:0 I0206 01:25:04.015499 2849 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 30.957888ms I0206 01:25:04.015591 2849 replica.cpp:712] Persisted action at 0 I0206 01:25:04.015682 2849 replica.cpp:697] Replica learned NOP action at position 0 I0206 01:25:04.016666 2849 log.cpp:675] Writer started with ending position 0 I0206 01:25:04.017881 2855 leveldb.cpp:436] Reading position from leveldb took 56779ns I0206 01:25:04.018934 2852 registrar.cpp:340] Successfully fetched the registry (0B) in 193.048064ms I0206 01:25:04.019076 2852 registrar.cpp:439] Applied 1 operations in 38180ns; attempting to update the 'registry' I0206 01:25:04.020100 2844 log.cpp:683] Attempting to append 170 bytes to the log I0206 01:25:04.020288 2855 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 1 I0206 01:25:04.021323 2844 replica.cpp:537] Replica received write request for position 1 from (377)@172.17.0.8:37843 I0206 01:25:04.054726 2844 leveldb.cpp:341] Persisting action (189 bytes) to leveldb took 33.309419ms I0206 01:25:04.054818 2844 replica.cpp:712] Persisted action at 1 I0206 01:25:04.055933 2844 replica.cpp:691] Replica received learned notice for position 1 from @0.0.0.0:0 I0206 01:25:04.088142 2844 leveldb.cpp:341] Persisting action (191 bytes) to leveldb took 32.116643ms I0206 01:25:04.088230 2844 replica.cpp:712] Persisted action at 1 I0206 01:25:04.088265 2844 replica.cpp:697] Replica learned APPEND action at position 1 I0206 01:25:04.090070 2856 registrar.cpp:484] Successfully updated the 'registry' in 70.90816ms I0206 01:25:04.090338 2851 log.cpp:702] Attempting to truncate the log to 1 I0206 01:25:04.090358 2856 registrar.cpp:370] Successfully recovered registrar I0206 01:25:04.090507 2847 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 2 I0206 01:25:04.090867 2858 master.cpp:1522] Recovered 0 slaves from the Registry (131B) ; allowing 10mins for slaves to re-register I0206 01:25:04.091449 2858 hierarchical.cpp:171] Skipping recovery of hierarchical allocator: nothing to recover I0206 01:25:04.092280 2857 replica.cpp:537] Replica received write request for position 2 from (378)@172.17.0.8:37843 I0206 01:25:04.125702 2857 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 33.192265ms I0206 01:25:04.125804 2857 replica.cpp:712] Persisted action at 2 I0206 01:25:04.127400 2857 replica.cpp:691] Replica received learned notice for position 2 from @0.0.0.0:0 I0206 01:25:04.157727 2857 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 30.268594ms I0206 01:25:04.157905 2857 leveldb.cpp:399] Deleting ~1 keys from leveldb took 88436ns I0206 01:25:04.157941 2857 replica.cpp:712] Persisted action at 2 I0206 01:25:04.157984 2857 replica.cpp:697] Replica learned TRUNCATE action at position 2 I0206 01:25:04.166174 2824 containerizer.cpp:149] Using isolation: posix/cpu,posix/mem,filesystem/posix W0206 01:25:04.166954 2824 backend.cpp:48] Failed to create 'bind' backend: BindBackend requires root privileges I0206 01:25:04.172008 2844 slave.cpp:193] Slave started on 9)@172.17.0.8:37843 I0206 01:25:04.172046 2844 slave.cpp:194] Flags at startup: --appc_store_dir=""/tmp/mesos/store/appc"" --authenticatee=""crammd5"" --cgroups_cpu_enable_pids_and_tids_count=""false"" --cgroups_enable_cfs=""false"" --cgroups_hierarchy=""/sys/fs/cgroup"" --cgroups_limit_swap=""false"" --cgroups_root=""mesos"" --container_disk_watch_interval=""15secs"" --containerizers=""mesos"" --credential=""/tmp/ContainerLoggerTest_DefaultToSandbox_FMaKSw/credential"" --default_role=""*"" --disk_watch_interval=""1mins"" --docker=""docker"" --docker_auth_server=""https://auth.docker.io"" --docker_kill_orphans=""true"" --docker_puller_timeout=""60"" --docker_registry=""https://registry-1.docker.io"" --docker_remove_delay=""6hrs"" --docker_socket=""/var/run/docker.sock"" --docker_stop_timeout=""0ns"" --docker_store_dir=""/tmp/mesos/store/docker"" --enforce_container_disk_quota=""false"" --executor_registration_timeout=""1mins"" --executor_shutdown_grace_period=""5secs"" --fetcher_cache_dir=""/tmp/ContainerLoggerTest_DefaultToSandbox_FMaKSw/fetch"" --fetcher_cache_size=""2GB"" --frameworks_home="""" --gc_delay=""1weeks"" --gc_disk_headroom=""0.1"" --hadoop_home="""" --help=""false"" --hostname_lookup=""true"" --image_provisioner_backend=""copy"" --initialize_driver_logging=""true"" --isolation=""posix/cpu,posix/mem"" --launcher_dir=""/mesos/mesos-0.28.0/_build/src"" --logbufsecs=""0"" --logging_level=""INFO"" --oversubscribed_resources_interval=""15secs"" --perf_duration=""10secs"" --perf_interval=""1mins"" --qos_correction_interval_min=""0ns"" --quiet=""false"" --recover=""reconnect"" --recovery_timeout=""15mins"" --registration_backoff_factor=""10ms"" --resources=""cpus:2;mem:1024;disk:1024;ports:[31000-32000]"" --revocable_cpu_low_priority=""true"" --sandbox_directory=""/mnt/mesos/sandbox"" --strict=""true"" --switch_user=""true"" --systemd_runtime_directory=""/run/systemd/system"" --version=""false"" --work_dir=""/tmp/ContainerLoggerTest_DefaultToSandbox_FMaKSw"" I0206 01:25:04.172569 2844 credentials.hpp:83] Loading credential for authentication from '/tmp/ContainerLoggerTest_DefaultToSandbox_FMaKSw/credential' I0206 01:25:04.172886 2844 slave.cpp:324] Slave using credential for: test-principal I0206 01:25:04.173141 2844 resources.cpp:564] Parsing resources as JSON failed: cpus:2;mem:1024;disk:1024;ports:[31000-32000] Trying semicolon-delimited string format instead I0206 01:25:04.173620 2844 slave.cpp:464] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0206 01:25:04.173686 2844 slave.cpp:472] Slave attributes: [ ] I0206 01:25:04.173702 2844 slave.cpp:477] Slave hostname: 74ef606c4063 I0206 01:25:04.174816 2847 state.cpp:58] Recovering state from '/tmp/ContainerLoggerTest_DefaultToSandbox_FMaKSw/meta' I0206 01:25:04.175441 2847 status_update_manager.cpp:200] Recovering status update manager I0206 01:25:04.175678 2858 containerizer.cpp:397] Recovering containerizer I0206 01:25:04.177573 2858 provisioner.cpp:245] Provisioner recovery complete I0206 01:25:04.178231 2847 slave.cpp:4496] Finished recovery I0206 01:25:04.178834 2847 slave.cpp:4668] Querying resource estimator for oversubscribable resources I0206 01:25:04.179405 2847 slave.cpp:796] New master detected at master@172.17.0.8:37843 I0206 01:25:04.179500 2847 slave.cpp:859] Authenticating with master master@172.17.0.8:37843 I0206 01:25:04.179525 2847 slave.cpp:864] Using default CRAM-MD5 authenticatee I0206 01:25:04.179656 2858 status_update_manager.cpp:174] Pausing sending status updates I0206 01:25:04.179798 2847 slave.cpp:832] Detecting new master I0206 01:25:04.179891 2852 authenticatee.cpp:121] Creating new client SASL connection I0206 01:25:04.179916 2847 slave.cpp:4682] Received oversubscribable resources from the resource estimator I0206 01:25:04.180286 2847 master.cpp:5523] Authenticating slave(9)@172.17.0.8:37843 I0206 01:25:04.180569 2847 authenticator.cpp:413] Starting authentication session for crammd5_authenticatee(32)@172.17.0.8:37843 I0206 01:25:04.181000 2847 authenticator.cpp:98] Creating new server SASL connection I0206 01:25:04.181315 2847 authenticatee.cpp:212] Received SASL authentication mechanisms: CRAM-MD5 I0206 01:25:04.181387 2847 authenticatee.cpp:238] Attempting to authenticate with mechanism 'CRAM-MD5' I0206 01:25:04.181562 2847 authenticator.cpp:203] Received SASL authentication start I0206 01:25:04.181648 2847 authenticator.cpp:325] Authentication requires more steps I0206 01:25:04.181843 2847 authenticatee.cpp:258] Received SASL authentication step I0206 01:25:04.182034 2853 authenticator.cpp:231] Received SASL authentication step I0206 01:25:04.182071 2853 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: '74ef606c4063' server FQDN: '74ef606c4063' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0206 01:25:04.182093 2853 auxprop.cpp:179] Looking up auxiliary property '*userPassword' I0206 01:25:04.182145 2853 auxprop.cpp:179] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0206 01:25:04.182173 2853 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: '74ef606c4063' server FQDN: '74ef606c4063' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0206 01:25:04.182185 2853 auxprop.cpp:129] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0206 01:25:04.182193 2853 auxprop.cpp:129] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0206 01:25:04.182211 2853 authenticator.cpp:317] Authentication success I0206 01:25:04.182333 2849 authenticatee.cpp:298] Authentication success I0206 01:25:04.182422 2853 master.cpp:5553] Successfully authenticated principal 'test-principal' at slave(9)@172.17.0.8:37843 I0206 01:25:04.182510 2853 authenticator.cpp:431] Authentication session cleanup for crammd5_authenticatee(32)@172.17.0.8:37843 I0206 01:25:04.182945 2849 slave.cpp:927] Successfully authenticated with master master@172.17.0.8:37843 I0206 01:25:04.183178 2849 slave.cpp:1321] Will retry registration in 9.87937ms if necessary I0206 01:25:04.183466 2852 master.cpp:4237] Registering slave at slave(9)@172.17.0.8:37843 (74ef606c4063) with id 914b62f9-95f6-4c57-a7e3-9b06e2c1c8de-S0 I0206 01:25:04.184039 2845 registrar.cpp:439] Applied 1 operations in 89453ns; attempting to update the 'registry' I0206 01:25:04.185288 2856 log.cpp:683] Attempting to append 339 bytes to the log I0206 01:25:04.185672 2850 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 3 I0206 01:25:04.186674 2846 replica.cpp:537] Replica received write request for position 3 from (392)@172.17.0.8:37843 I0206 01:25:04.195863 2856 slave.cpp:1321] Will retry registration in 11.038094ms if necessary I0206 01:25:04.196233 2856 master.cpp:4225] Ignoring register slave message from slave(9)@172.17.0.8:37843 (74ef606c4063) as admission is already in progress I0206 01:25:04.208094 2856 slave.cpp:1321] Will retry registration in 27.881223ms if necessary I0206 01:25:04.208472 2856 master.cpp:4225] Ignoring register slave message from slave(9)@172.17.0.8:37843 (74ef606c4063) as admission is already in progress I0206 01:25:04.216698 2846 leveldb.cpp:341] Persisting action (358 bytes) to leveldb took 29.961291ms I0206 01:25:04.216789 2846 replica.cpp:712] Persisted action at 3 I0206 01:25:04.218246 2845 replica.cpp:691] Replica received learned notice for position 3 from @0.0.0.0:0 I0206 01:25:04.237861 2846 slave.cpp:1321] Will retry registration in 1.006941ms if necessary I0206 01:25:04.238221 2846 master.cpp:4225] Ignoring register slave message from slave(9)@172.17.0.8:37843 (74ef606c4063) as admission is already in progress I0206 01:25:04.239858 2856 slave.cpp:1321] Will retry registration in 167.305686ms if necessary I0206 01:25:04.240044 2856 master.cpp:4225] Ignoring register slave message from slave(9)@172.17.0.8:37843 (74ef606c4063) as admission is already in progress I0206 01:25:04.241482 2845 leveldb.cpp:341] Persisting action (360 bytes) to leveldb took 23.193162ms I0206 01:25:04.241524 2845 replica.cpp:712] Persisted action at 3 I0206 01:25:04.241557 2845 replica.cpp:697] Replica learned APPEND action at position 3 I0206 01:25:04.243746 2844 registrar.cpp:484] Successfully updated the 'registry' in 59.587072ms I0206 01:25:04.244210 2857 log.cpp:702] Attempting to truncate the log to 3 I0206 01:25:04.244344 2845 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 4 I0206 01:25:04.244597 2856 master.cpp:4305] Registered slave 914b62f9-95f6-4c57-a7e3-9b06e2c1c8de-S0 at slave(9)@172.17.0.8:37843 (74ef606c4063) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0206 01:25:04.244746 2843 slave.cpp:3436] Received ping from slave-observer(8)@172.17.0.8:37843 I0206 01:25:04.244976 2845 hierarchical.cpp:473] Added slave 914b62f9-95f6-4c57-a7e3-9b06e2c1c8de-S0 (74ef606c4063) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (allocated: ) I0206 01:25:04.245072 2843 slave.cpp:971] Registered with master master@172.17.0.8:37843; given slave ID 914b62f9-95f6-4c57-a7e3-9b06e2c1c8de-S0 I0206 01:25:04.245121 2843 fetcher.cpp:81] Clearing fetcher cache I0206 01:25:04.245146 2845 hierarchical.cpp:1403] No resources available to allocate! I0206 01:25:04.245178 2845 hierarchical.cpp:1116] Performed allocation for slave 914b62f9-95f6-4c57-a7e3-9b06e2c1c8de-S0 in 159744ns I0206 01:25:04.245465 2846 status_update_manager.cpp:181] Resuming sending status updates I0206 01:25:04.245776 2843 slave.cpp:994] Checkpointing SlaveInfo to '/tmp/ContainerLoggerTest_DefaultToSandbox_FMaKSw/meta/slaves/914b62f9-95f6-4c57-a7e3-9b06e2c1c8de-S0/slave.info' I0206 01:25:04.245745 2846 replica.cpp:537] Replica received write request for position 4 from (393)@172.17.0.8:37843 I0206 01:25:04.246273 2843 slave.cpp:1030] Forwarding total oversubscribed resources I0206 01:25:04.246507 2850 master.cpp:4646] Received update of slave 914b62f9-95f6-4c57-a7e3-9b06e2c1c8de-S0 at slave(9)@172.17.0.8:37843 (74ef606c4063) with total oversubscribed resources I0206 01:25:04.247180 2824 sched.cpp:222] Version: 0.28.0 I0206 01:25:04.247155 2850 hierarchical.cpp:531] Slave 914b62f9-95f6-4c57-a7e3-9b06e2c1c8de-S0 (74ef606c4063) updated with oversubscribed resources (total: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000], allocated: ) I0206 01:25:04.247357 2850 hierarchical.cpp:1403] No resources available to allocate! I0206 01:25:04.247406 2850 hierarchical.cpp:1116] Performed allocation for slave 914b62f9-95f6-4c57-a7e3-9b06e2c1c8de-S0 in 183250ns I0206 01:25:04.247938 2854 sched.cpp:326] New master detected at master@172.17.0.8:37843 I0206 01:25:04.248157 2854 sched.cpp:382] Authenticating with master master@172.17.0.8:37843 I0206 01:25:04.248265 2854 sched.cpp:389] Using default CRAM-MD5 authenticatee I0206 01:25:04.248769 2854 authenticatee.cpp:121] Creating new client SASL connection I0206 01:25:04.249311 2854 master.cpp:5523] Authenticating scheduler-f50aad75-78d0-4d9f-b1a4-488d5ab932d6@172.17.0.8:37843 I0206 01:25:04.249646 2854 authenticator.cpp:413] Starting authentication sess...",1 MESOS-4619,"Remove markdown files from doxygen pages","The doxygen html pages corresponding to doc/* markdown files are redundant and have broken links. They don't serve any reasonable purpose in doxygen site.",1 MESOS-4622,"Update configuration.md with `--cgroups_net_cls_primary_handle` agent flag.","As part of the net_cls epic, we introduce an agent flag called `--cgroup_net_cls_primary_handle` . We need to update configuration.md with the corresponding help string. ",1 MESOS-4623,"Add a stub Nvidia GPU isolator.","We'll first wire up a skeleton Nvidia GPU isolator, which needs to be guarded by a configure flag due to the dependency on NVML.",3 MESOS-4624,"Add allocation metrics for ""gpus"" resources.","Allocation metrics are currently hard-coded to include only {{\[""cpus"", ""mem"", ""disk""\]}} resources. We'll need to add ""gpus"" to the list to start, possibly following up on the TODO to remove the hard-coding. See: https://github.com/apache/mesos/blob/0.27.0/src/master/metrics.cpp#L266-L269 https://github.com/apache/mesos/blob/0.27.0/src/slave/metrics.cpp#L123-L126 ",1 MESOS-4625,"Implement Nvidia GPU isolation w/o filesystem isolation enabled.","The Nvidia GPU isolator will need to use the device cgroup to restrict access to GPU resources, and will need to recover this information after agent failover. For now this will require that the operator specifies the GPU devices via a flag. To handle filesystem isolation requires that we provide mechanisms for operators to inject volumes with the necessary libraries into all containers using GPU resources, we'll tackle this in a separate ticket.",5 MESOS-4626,"Support Nvidia GPUs with filesystem isolation enabled in mesos containerizer.","When filesystem isolation is enabled in the mesos containerizer, containers that use Nvidia GPU resources need access to GPU libraries residing on the host. We'll need to provide a means for operators to inject the necessary volumes into *all* containers that use ""gpus"" resources. See the nvidia-docker project for more details: [nvidia-docker/tools/src/nvidia/volumes.go|https://github.com/NVIDIA/nvidia-docker/blob/fda10b2d27bf5578cc5337c23877f827e4d1ed77/tools/src/nvidia/volumes.go#L50-L103]",13 MESOS-4629,"Implement fault tolerance tests for the HTTP Scheduler API.","Currently, the HTTP V1 API does not have fault tolerance tests similar to the one in {{src/tests/fault_tolerance_tests.cpp}}. For more information see MESOS-3355.",5 MESOS-4630,"Implement partition tests for the HTTP Scheduler API.","Currently, the HTTP V1 API does not have partition tests similar to the one in src/tests/partition_tests.cpp. For more information see MESOS-3355.",5 MESOS-4633,"Tests will dereference stack allocated agent objects upon assertion/expectation failure.","Tests that use the {{StartSlave}} test helper are generally fragile when the test fails an assert/expect in the middle of the test. This is because the {{StartSlave}} helper takes raw pointer arguments, which may be stack-allocated. In case of an assert failure, the test immediately exits (destroying stack allocated objects) and proceeds onto test cleanup. The test cleanup may dereference some of these destroyed objects, leading to a test crash like: {code} [18:27:36][Step 8/8] F0204 18:27:35.981302 23085 logging.cpp:64] RAW: Pure virtual method called [18:27:36][Step 8/8] @ 0x7f7077055e1c google::LogMessage::Fail() [18:27:36][Step 8/8] @ 0x7f707705ba6f google::RawLog__() [18:27:36][Step 8/8] @ 0x7f70760f76c9 __cxa_pure_virtual [18:27:36][Step 8/8] @ 0xa9423c mesos::internal::tests::Cluster::Slaves::shutdown() [18:27:36][Step 8/8] @ 0x1074e45 mesos::internal::tests::MesosTest::ShutdownSlaves() [18:27:36][Step 8/8] @ 0x1074de4 mesos::internal::tests::MesosTest::Shutdown() [18:27:36][Step 8/8] @ 0x1070ec7 mesos::internal::tests::MesosTest::TearDown() {code} The {{StartSlave}} helper should take {{shared_ptr}} arguments instead. This also means that we can remove the {{Shutdown}} helper from most of these tests.",5 MESOS-4634,"Tests will dereference stack allocated master objects upon assertion/expectation failure.","Tests that use the {{StartMaster}} test helper are generally fragile when the test fails an assert/expect in the middle of the test. This is because the {{StartMaster}} helper takes raw pointer arguments, which may be stack-allocated. In case of an assert failure, the test immediately exits (destroying stack allocated objects) and proceeds onto test cleanup. The test cleanup may dereference some of these destroyed objects, leading to a test crash like: {code} [18:27:36][Step 8/8] F0204 18:27:35.981302 23085 logging.cpp:64] RAW: Pure virtual method called [18:27:36][Step 8/8] @ 0x7f7077055e1c google::LogMessage::Fail() [18:27:36][Step 8/8] @ 0x7f707705ba6f google::RawLog__() [18:27:36][Step 8/8] @ 0x7f70760f76c9 __cxa_pure_virtual [18:27:36][Step 8/8] @ 0xa9423c mesos::internal::tests::Cluster::Slaves::shutdown() [18:27:36][Step 8/8] @ 0x1074e45 mesos::internal::tests::MesosTest::ShutdownSlaves() [18:27:36][Step 8/8] @ 0x1074de4 mesos::internal::tests::MesosTest::Shutdown() [18:27:36][Step 8/8] @ 0x1070ec7 mesos::internal::tests::MesosTest::TearDown() {code} The {{StartMaster}} helper should take {{shared_ptr}} arguments instead. This also means that we can remove the {{Shutdown}} helper from most of these tests.",5 MESOS-4636,"Add parent hook to subprocess.",NULL,3 MESOS-4637,"Docker process executor can die with agent unit on systemd.",NULL,1 MESOS-4639,"Posix process executor can die with agent unit on systemd.",NULL,1 MESOS-4640,"Logrotate container logger can die with agent unit on systemd.",NULL,1 MESOS-4657,"Add LOG(INFO) in `cgroups/net_cls` for debugging allocation of net_cls handles.","We need to add LOG(INFO) during the prepare phase of `cgroups/net_cls` for debugging management of `net_cls` handles within the isolator. ",1 MESOS-4660,"Document net_cls isolator in docs/mesos-containerizer.md.","We need to add a section in the doc to describe how to use cgroups/net_cls isolator.",1 MESOS-4667,"Expose persistent volume information in HTTP endpoints","The per-slave {{reserved_resources}} information returned by {{/state}} does not seem to include information about persistent volumes. This makes it hard for operators to use the {{/destroy-volumes}} endpoint.",3 MESOS-4669,"Add common compression utility","We need GZIP uncompress utility for Appc image fetching functionality. The images are tar + gzip'ed and they needs to be first uncompressed so that we can compute sha 512 checksum on it.",2 MESOS-4670,"`cgroup_info` not being exposed in state.json when ComposingContainerizer is used.","The ComposingContainerizer currently does not have a `status` method. This results in no `ContainerStatus` being updated in the agent, when uses `ComposingContainerizer` to launch containers. This would specifically happen when the agent is launched with `--containerizer=docker,mesos`",1 MESOS-4671,"Status updates from executor can be forwarded out of order by the Agent.","Previously, all status update messages from the executor were forwarded by the agent to the master in the order that they had been received. However, that seems to be no longer valid due to a recently introduced change in the agent: {code} // Before sending update, we need to retrieve the container status. containerizer->status(executor->containerId) .onAny(defer(self(), &Slave::_statusUpdate, update, pid, executor->id, lambda::_1)); {code} This can sometimes lead to status updates being sent out of order depending on the order the {{Future}} is fulfilled from the call to {{status(...)}}.",1 MESOS-4674,"Linux filesystem isolator tests are flaky.","LinuxFilesystemIsolatorTest.ROOT_ImageInVolumeWithRootFilesystem sometimes fails on CentOS 7 with this kind of output: {noformat} ../../src/tests/containerizer/filesystem_isolator_tests.cpp:1054: Failure Failed to wait 2mins for launch {noformat} LinuxFilesystemIsolatorTest.ROOT_MultipleContainers often has this output: {noformat} ../../src/tests/containerizer/filesystem_isolator_tests.cpp:1138: Failure Failed to wait 1mins for launch1 {noformat} Whether SSL is configured makes no difference. This test may also fail on other platforms, but more rarely. ",3 MESOS-4675,"Cannot disable systemd support","On certain platforms the systemd init system is available, but not used. Not being able to disable the mesos systemd integration on these platforms makes it hard to operate using a different init / monit system.",1 MESOS-4676,"ROOT_DOCKER_Logs is flaky.","{noformat} [18:06:25][Step 8/8] [ RUN ] DockerContainerizerTest.ROOT_DOCKER_Logs [18:06:25][Step 8/8] I0215 17:06:25.256103 1740 leveldb.cpp:174] Opened db in 6.548327ms [18:06:25][Step 8/8] I0215 17:06:25.258002 1740 leveldb.cpp:181] Compacted db in 1.837816ms [18:06:25][Step 8/8] I0215 17:06:25.258059 1740 leveldb.cpp:196] Created db iterator in 22044ns [18:06:25][Step 8/8] I0215 17:06:25.258076 1740 leveldb.cpp:202] Seeked to beginning of db in 2347ns [18:06:25][Step 8/8] I0215 17:06:25.258091 1740 leveldb.cpp:271] Iterated through 0 keys in the db in 571ns [18:06:25][Step 8/8] I0215 17:06:25.258152 1740 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned [18:06:25][Step 8/8] I0215 17:06:25.258936 1758 recover.cpp:447] Starting replica recovery [18:06:25][Step 8/8] I0215 17:06:25.259177 1758 recover.cpp:473] Replica is in EMPTY status [18:06:25][Step 8/8] I0215 17:06:25.260327 1757 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (13608)@172.30.2.239:39785 [18:06:25][Step 8/8] I0215 17:06:25.260545 1758 recover.cpp:193] Received a recover response from a replica in EMPTY status [18:06:25][Step 8/8] I0215 17:06:25.261065 1757 master.cpp:376] Master 112363e2-c680-4946-8fee-d0626ed8b21e (ip-172-30-2-239.mesosphere.io) started on 172.30.2.239:39785 [18:06:25][Step 8/8] I0215 17:06:25.261209 1761 recover.cpp:564] Updating replica status to STARTING [18:06:25][Step 8/8] I0215 17:06:25.261086 1757 master.cpp:378] Flags at startup: --acls="""" --allocation_interval=""1secs"" --allocator=""HierarchicalDRF"" --authenticate=""true"" --authenticate_http=""true"" --authenticate_slaves=""true"" --authenticators=""crammd5"" --authorizers=""local"" --credentials=""/tmp/HncLLj/credentials"" --framework_sorter=""drf"" --help=""false"" --hostname_lookup=""true"" --http_authenticators=""basic"" --initialize_driver_logging=""true"" --log_auto_initialize=""true"" --logbufsecs=""0"" --logging_level=""INFO"" --max_completed_frameworks=""50"" --max_completed_tasks_per_framework=""1000"" --max_slave_ping_timeouts=""5"" --quiet=""false"" --recovery_slave_removal_limit=""100%"" --registry=""replicated_log"" --registry_fetch_timeout=""1mins"" --registry_store_timeout=""100secs"" --registry_strict=""true"" --root_submissions=""true"" --slave_ping_timeout=""15secs"" --slave_reregister_timeout=""10mins"" --user_sorter=""drf"" --version=""false"" --webui_dir=""/usr/local/share/mesos/webui"" --work_dir=""/tmp/HncLLj/master"" --zk_session_timeout=""10secs"" [18:06:25][Step 8/8] I0215 17:06:25.261446 1757 master.cpp:423] Master only allowing authenticated frameworks to register [18:06:25][Step 8/8] I0215 17:06:25.261456 1757 master.cpp:428] Master only allowing authenticated slaves to register [18:06:25][Step 8/8] I0215 17:06:25.261462 1757 credentials.hpp:35] Loading credentials for authentication from '/tmp/HncLLj/credentials' [18:06:25][Step 8/8] I0215 17:06:25.261723 1757 master.cpp:468] Using default 'crammd5' authenticator [18:06:25][Step 8/8] I0215 17:06:25.261855 1757 master.cpp:537] Using default 'basic' HTTP authenticator [18:06:25][Step 8/8] I0215 17:06:25.262022 1757 master.cpp:571] Authorization enabled [18:06:25][Step 8/8] I0215 17:06:25.262177 1755 hierarchical.cpp:144] Initialized hierarchical allocator process [18:06:25][Step 8/8] I0215 17:06:25.262177 1758 whitelist_watcher.cpp:77] No whitelist given [18:06:25][Step 8/8] I0215 17:06:25.262899 1760 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 1.517992ms [18:06:25][Step 8/8] I0215 17:06:25.262924 1760 replica.cpp:320] Persisted replica status to STARTING [18:06:25][Step 8/8] I0215 17:06:25.263144 1754 recover.cpp:473] Replica is in STARTING status [18:06:25][Step 8/8] I0215 17:06:25.264010 1757 master.cpp:1712] The newly elected leader is master@172.30.2.239:39785 with id 112363e2-c680-4946-8fee-d0626ed8b21e [18:06:25][Step 8/8] I0215 17:06:25.264044 1757 master.cpp:1725] Elected as the leading master! [18:06:25][Step 8/8] I0215 17:06:25.264061 1757 master.cpp:1470] Recovering from registrar [18:06:25][Step 8/8] I0215 17:06:25.264117 1760 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (13610)@172.30.2.239:39785 [18:06:25][Step 8/8] I0215 17:06:25.264197 1758 registrar.cpp:307] Recovering registrar [18:06:25][Step 8/8] I0215 17:06:25.264827 1756 recover.cpp:193] Received a recover response from a replica in STARTING status [18:06:25][Step 8/8] I0215 17:06:25.265219 1757 recover.cpp:564] Updating replica status to VOTING [18:06:25][Step 8/8] I0215 17:06:25.267302 1754 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 1.887739ms [18:06:25][Step 8/8] I0215 17:06:25.267326 1754 replica.cpp:320] Persisted replica status to VOTING [18:06:25][Step 8/8] I0215 17:06:25.267453 1759 recover.cpp:578] Successfully joined the Paxos group [18:06:25][Step 8/8] I0215 17:06:25.267632 1759 recover.cpp:462] Recover process terminated [18:06:25][Step 8/8] I0215 17:06:25.268007 1757 log.cpp:659] Attempting to start the writer [18:06:25][Step 8/8] I0215 17:06:25.269055 1759 replica.cpp:493] Replica received implicit promise request from (13611)@172.30.2.239:39785 with proposal 1 [18:06:25][Step 8/8] I0215 17:06:25.270488 1759 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 1.406068ms [18:06:25][Step 8/8] I0215 17:06:25.270511 1759 replica.cpp:342] Persisted promised to 1 [18:06:25][Step 8/8] I0215 17:06:25.271078 1761 coordinator.cpp:238] Coordinator attempting to fill missing positions [18:06:25][Step 8/8] I0215 17:06:25.272146 1756 replica.cpp:388] Replica received explicit promise request from (13612)@172.30.2.239:39785 for position 0 with proposal 2 [18:06:25][Step 8/8] I0215 17:06:25.273478 1756 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 1.297217ms [18:06:25][Step 8/8] I0215 17:06:25.273500 1756 replica.cpp:712] Persisted action at 0 [18:06:25][Step 8/8] I0215 17:06:25.274355 1757 replica.cpp:537] Replica received write request for position 0 from (13613)@172.30.2.239:39785 [18:06:25][Step 8/8] I0215 17:06:25.274405 1757 leveldb.cpp:436] Reading position from leveldb took 25294ns [18:06:25][Step 8/8] I0215 17:06:25.275800 1757 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 1.362978ms [18:06:25][Step 8/8] I0215 17:06:25.275823 1757 replica.cpp:712] Persisted action at 0 [18:06:25][Step 8/8] I0215 17:06:25.276348 1755 replica.cpp:691] Replica received learned notice for position 0 from @0.0.0.0:0 [18:06:25][Step 8/8] I0215 17:06:25.277765 1755 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 1.391531ms [18:06:25][Step 8/8] I0215 17:06:25.277788 1755 replica.cpp:712] Persisted action at 0 [18:06:25][Step 8/8] I0215 17:06:25.277802 1755 replica.cpp:697] Replica learned NOP action at position 0 [18:06:25][Step 8/8] I0215 17:06:25.278336 1754 log.cpp:675] Writer started with ending position 0 [18:06:25][Step 8/8] I0215 17:06:25.279371 1755 leveldb.cpp:436] Reading position from leveldb took 29214ns [18:06:25][Step 8/8] I0215 17:06:25.280272 1758 registrar.cpp:340] Successfully fetched the registry (0B) in 16.02688ms [18:06:25][Step 8/8] I0215 17:06:25.280385 1758 registrar.cpp:439] Applied 1 operations in 31040ns; attempting to update the 'registry' [18:06:25][Step 8/8] I0215 17:06:25.281054 1755 log.cpp:683] Attempting to append 210 bytes to the log [18:06:25][Step 8/8] I0215 17:06:25.281165 1757 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 1 [18:06:25][Step 8/8] I0215 17:06:25.281780 1757 replica.cpp:537] Replica received write request for position 1 from (13614)@172.30.2.239:39785 [18:06:25][Step 8/8] I0215 17:06:25.283159 1757 leveldb.cpp:341] Persisting action (229 bytes) to leveldb took 1.348041ms [18:06:25][Step 8/8] I0215 17:06:25.283184 1757 replica.cpp:712] Persisted action at 1 [18:06:25][Step 8/8] I0215 17:06:25.283695 1759 replica.cpp:691] Replica received learned notice for position 1 from @0.0.0.0:0 [18:06:25][Step 8/8] I0215 17:06:25.285059 1759 leveldb.cpp:341] Persisting action (231 bytes) to leveldb took 1.334577ms [18:06:25][Step 8/8] I0215 17:06:25.285084 1759 replica.cpp:712] Persisted action at 1 [18:06:25][Step 8/8] I0215 17:06:25.285099 1759 replica.cpp:697] Replica learned APPEND action at position 1 [18:06:25][Step 8/8] I0215 17:06:25.285910 1758 registrar.cpp:484] Successfully updated the 'registry' in 5.46816ms [18:06:25][Step 8/8] I0215 17:06:25.286043 1758 registrar.cpp:370] Successfully recovered registrar [18:06:25][Step 8/8] I0215 17:06:25.286121 1755 log.cpp:702] Attempting to truncate the log to 1 [18:06:25][Step 8/8] I0215 17:06:25.286301 1756 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 2 [18:06:25][Step 8/8] I0215 17:06:25.286478 1759 hierarchical.cpp:171] Skipping recovery of hierarchical allocator: nothing to recover [18:06:25][Step 8/8] I0215 17:06:25.286476 1754 master.cpp:1522] Recovered 0 slaves from the Registry (171B) ; allowing 10mins for slaves to re-register [18:06:25][Step 8/8] I0215 17:06:25.287137 1755 replica.cpp:537] Replica received write request for position 2 from (13615)@172.30.2.239:39785 [18:06:25][Step 8/8] I0215 17:06:25.289104 1755 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 1.938609ms [18:06:25][Step 8/8] I0215 17:06:25.289127 1755 replica.cpp:712] Persisted action at 2 [18:06:25][Step 8/8] I0215 17:06:25.289667 1759 replica.cpp:691] Replica received learned notice for position 2 from @0.0.0.0:0 [18:06:25][Step 8/8] I0215 17:06:25.290956 1759 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 1.256421ms [18:06:25][Step 8/8] I0215 17:06:25.291007 1759 leveldb.cpp:399] Deleting ~1 keys from leveldb took 28064ns [18:06:25][Step 8/8] I0215 17:06:25.291021 1759 replica.cpp:712] Persisted action at 2 [18:06:25][Step 8/8] I0215 17:06:25.291038 1759 replica.cpp:697] Replica learned TRUNCATE action at position 2 [18:06:25][Step 8/8] I0215 17:06:25.300550 1760 slave.cpp:193] Slave started on 393)@172.30.2.239:39785 [18:06:25][Step 8/8] I0215 17:06:25.300573 1760 slave.cpp:194] Flags at startup: --appc_store_dir=""/tmp/mesos/store/appc"" --authenticatee=""crammd5"" --cgroups_cpu_enable_pids_and_tids_count=""false"" --cgroups_enable_cfs=""false"" --cgroups_hierarchy=""/sys/fs/cgroup"" --cgroups_limit_swap=""false"" --cgroups_root=""mesos"" --container_disk_watch_interval=""15secs"" --containerizers=""mesos"" --credential=""/tmp/DockerContainerizerTest_ROOT_DOCKER_Logs_a4NS2N/credential"" --default_role=""*"" --disk_watch_interval=""1mins"" --docker=""docker"" --docker_auth_server=""https://auth.docker.io"" --docker_kill_orphans=""true"" --docker_puller_timeout=""60"" --docker_registry=""https://registry-1.docker.io"" --docker_remove_delay=""6hrs"" --docker_socket=""/var/run/docker.sock"" --docker_stop_timeout=""0ns"" --docker_store_dir=""/tmp/mesos/store/docker"" --enforce_container_disk_quota=""false"" --executor_registration_timeout=""1mins"" --executor_shutdown_grace_period=""5secs"" --fetcher_cache_dir=""/tmp/DockerContainerizerTest_ROOT_DOCKER_Logs_a4NS2N/fetch"" --fetcher_cache_size=""2GB"" --frameworks_home="""" --gc_delay=""1weeks"" --gc_disk_headroom=""0.1"" --hadoop_home="""" --help=""false"" --hostname_lookup=""true"" --image_provisioner_backend=""copy"" --initialize_driver_logging=""true"" --isolation=""posix/cpu,posix/mem"" --launcher_dir=""/mnt/teamcity/work/4240ba9ddd0997c3/build/src"" --logbufsecs=""0"" --logging_level=""INFO"" --oversubscribed_resources_interval=""15secs"" --perf_duration=""10secs"" --perf_interval=""1mins"" --qos_correction_interval_min=""0ns"" --quiet=""false"" --recover=""reconnect"" --recovery_timeout=""15mins"" --registration_backoff_factor=""10ms"" --resources=""cpus:2;mem:1024;disk:1024;ports:[31000-32000]"" --revocable_cpu_low_priority=""true"" --sandbox_directory=""/mnt/mesos/sandbox"" --strict=""true"" --switch_user=""true"" --systemd_runtime_directory=""/run/systemd/system"" --version=""false"" --work_dir=""/tmp/DockerContainerizerTest_ROOT_DOCKER_Logs_a4NS2N"" [18:06:25][Step 8/8] I0215 17:06:25.300868 1760 credentials.hpp:83] Loading credential for authentication from '/tmp/DockerContainerizerTest_ROOT_DOCKER_Logs_a4NS2N/credential' [18:06:25][Step 8/8] I0215 17:06:25.301030 1760 slave.cpp:324] Slave using credential for: test-principal [18:06:25][Step 8/8] I0215 17:06:25.301180 1760 resources.cpp:576] Parsing resources as JSON failed: cpus:2;mem:1024;disk:1024;ports:[31000-32000] [18:06:25][Step 8/8] Trying semicolon-delimited string format instead [18:06:25][Step 8/8] I0215 17:06:25.301553 1760 slave.cpp:464] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] [18:06:25][Step 8/8] I0215 17:06:25.301609 1760 slave.cpp:472] Slave attributes: [ ] [18:06:25][Step 8/8] I0215 17:06:25.301620 1760 slave.cpp:477] Slave hostname: ip-172-30-2-239.mesosphere.io [18:06:25][Step 8/8] I0215 17:06:25.302417 1757 state.cpp:58] Recovering state from '/tmp/DockerContainerizerTest_ROOT_DOCKER_Logs_a4NS2N/meta' [18:06:25][Step 8/8] I0215 17:06:25.302515 1740 sched.cpp:222] Version: 0.28.0 [18:06:25][Step 8/8] I0215 17:06:25.302772 1755 status_update_manager.cpp:200] Recovering status update manager [18:06:25][Step 8/8] I0215 17:06:25.302956 1758 docker.cpp:559] Recovering Docker containers [18:06:25][Step 8/8] I0215 17:06:25.303050 1761 sched.cpp:326] New master detected at master@172.30.2.239:39785 [18:06:25][Step 8/8] I0215 17:06:25.303133 1754 slave.cpp:4565] Finished recovery [18:06:25][Step 8/8] I0215 17:06:25.303154 1761 sched.cpp:382] Authenticating with master master@172.30.2.239:39785 [18:06:25][Step 8/8] I0215 17:06:25.303169 1761 sched.cpp:389] Using default CRAM-MD5 authenticatee [18:06:25][Step 8/8] I0215 17:06:25.303364 1759 authenticatee.cpp:121] Creating new client SASL connection [18:06:25][Step 8/8] I0215 17:06:25.303467 1754 slave.cpp:4737] Querying resource estimator for oversubscribable resources [18:06:25][Step 8/8] I0215 17:06:25.303668 1756 master.cpp:5523] Authenticating scheduler-806c70e3-1cf6-418f-aa30-6bb26db42d18@172.30.2.239:39785 [18:06:25][Step 8/8] I0215 17:06:25.303707 1760 status_update_manager.cpp:174] Pausing sending status updates [18:06:25][Step 8/8] I0215 17:06:25.303707 1754 slave.cpp:796] New master detected at master@172.30.2.239:39785 [18:06:25][Step 8/8] I0215 17:06:25.303767 1755 authenticator.cpp:413] Starting authentication session for crammd5_authenticatee(829)@172.30.2.239:39785 [18:06:25][Step 8/8] I0215 17:06:25.303791 1754 slave.cpp:859] Authenticating with master master@172.30.2.239:39785 [18:06:25][Step 8/8] I0215 17:06:25.303805 1754 slave.cpp:864] Using default CRAM-MD5 authenticatee [18:06:25][Step 8/8] I0215 17:06:25.303956 1754 slave.cpp:832] Detecting new master [18:06:25][Step 8/8] I0215 17:06:25.303971 1761 authenticatee.cpp:121] Creating new client SASL connection [18:06:25][Step 8/8] I0215 17:06:25.303984 1760 authenticator.cpp:98] Creating new server SASL connection [18:06:25][Step 8/8] I0215 17:06:25.304131 1754 slave.cpp:4751] Received oversubscribable resources from the resource estimator [18:06:25][Step 8/8] I0215 17:06:25.304275 1757 master.cpp:5523] Authenticating slave(393)@172.30.2.239:39785 [18:06:25][Step 8/8] I0215 17:06:25.304344 1754 authenticatee.cpp:212] Received SASL authentication mechanisms: CRAM-MD5 [18:06:25][Step 8/8] I0215 17:06:25.304369 1754 authenticatee.cpp:238] Attempting to authenticate with mechanism 'CRAM-MD5' [18:06:25][Step 8/8] I0215 17:06:25.304373 1761 authenticator.cpp:413] Starting authentication session for crammd5_authenticatee(830)@172.30.2.239:39785 [18:06:25][Step 8/8] I0215 17:06:25.304440 1757 authenticator.cpp:203] Received SASL authentication start [18:06:25][Step 8/8] I0215 17:06:25.304491 1757 authenticator.cpp:325] Authentication requires more steps [18:06:25][Step 8/8] I0215 17:06:25.304548 1754 authenticator.cpp:98] Creating new server SASL connection [18:06:25][Step 8/8] I0215 17:06:25.304582 1761 authenticatee.cpp:258] Received SASL authentication step [18:06:25][Step 8/8] I0215 17:06:25.304688 1761 authenticator.cpp:231] Received SASL authentication step [18:06:25][Step 8/8] I0215 17:06:25.304714 1761 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: 'ip-172-30-2-239.mesosphere.io' server FQDN: 'ip-172-30-2-239.mesosphere.io' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false [18:06:25][Step 8/8] I0215 17:06:25.304723 1761 auxprop.cpp:179] Looking up auxiliary property '*userPassword' [18:06:25][Step 8/8] I0215 17:06:25.304767 1761 auxprop.cpp:179] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' [18:06:25][Step 8/8] I0215 17:06:25.304805 1761 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: 'ip-172-30-2-239.mesosphere.io' server FQDN: 'ip-172-30-2-239.mesosphere.io' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true [18:06:25][Step 8/8] I0215 17:06:25.304817 1761 auxprop.cpp:129] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true [18:06:25][Step 8/8] I0215 17:06:25.304824 1761 auxprop.cpp:129] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true [18:06:25][Step 8/8] I0215 17:06:25.304836 1761 authenticator.cpp:317] Authentication success [18:06:25][Step 8/8] I0215 17:06:25.304841 1758 authenticatee.cpp:212] Received SASL authentication mechanisms: CRAM-MD5 [18:06:25][Step 8/8] I0215 17:06:25.304870 1758 authenticatee.cpp:238] Attempting to authenticate with mechanism 'CRAM-MD5' [18:06:25][Step 8/8] I0215 17:06:25.304909 1757 authenticatee.cpp:298] Authentication success [18:06:25][Step 8/8] I0215 17:06:25.304983 1756 authenticator.cpp:203] Received SASL authentication start [18:06:25][Step 8/8] I0215 17:06:25.305033 1756 authenticator.cpp:325] Authentication requires more steps [18:06:25][Step 8/8] I0215 17:06:25.305042 1759 master.cpp:5553] Successfully authenticated principal 'test-principal' at scheduler-806c70e3-1cf6-418f-aa30-6bb26db42d18@172.30.2.239:39785 [18:06:25][Step 8/8] I0215 17:06:25.305071 1755 authenticator.cpp:431] Authentication session cleanup for crammd5_authenticatee(829)@172.30.2.239:39785 [18:06:25][Step 8/8] I0215 17:06:25.305124 1756 authenticatee.cpp:258] Received SASL authentication step [18:06:25][Step 8/8] I0215 17:06:25.305222 1758 sched.cpp:471] Successfully authenticated with master master@172.30.2.239:39785 [18:06:25][Step 8/8] I0215 17:06:25.305246 1758 sched.cpp:776] Sending SUBSCRIBE call to master@172.30.2.239:39785 [18:06:25][Step 8/8] I0215 17:06:25.305286 1760 authenticator.cpp:231] Received SASL authentication step [18:06:25][Step 8/8] I0215 17:06:25.305310 1760 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: 'ip-172-30-2-239.mesosphere.io' server FQDN: 'ip-172-30-2-239.mesosphere.io' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false [18:06:25][Step 8/8] I0215 17:06:25.305318 1760 auxprop.cpp:179] Looking up auxiliary property '*userPassword' [18:06:25][Step 8/8] I0215 17:06:25.305344 1760 auxprop.cpp:179] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' [18:06:25][Step 8/8] I0215 17:06:25.305363 1758 sched.cpp:809] Will retry registration in 1.888777185secs if necessary [18:06:25][Step 8/8] I0215 17:06:25.305379 1760 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: 'ip-172-30-2-239.mesosphere.io' server FQDN: 'ip-172-30-2-239.mesosphere.io' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true [18:06:25][Step 8/8] I0215 17:06:25.305397 1760 auxprop.cpp:129] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true [18:06:25][Step 8/...",2 MESOS-4678,"Upgrade vendored Protobuf to 2.6.1","We currently vendor Protobuf 2.5.0. We should upgrade to Protobuf 2.6.1. This introduces various bugfixes, performance improvements, and at least one new feature we might want to eventually take advantage of ({{map}} data type). AFAIK there should be no backward compatibility concerns.",3 MESOS-4683,"Document docker runtime isolator.","Should include the following information: *What features are currently supported in docker runtime isolator. *How to use the docker runtime isolator (user manual). *Compare the different semantics v.s. docker containerizer, and explain why.",2 MESOS-4684,"Create base docker image for test suite.","This should be widely used for unified containerizer testing. Should basically include: *at least one layer. *repositories. For each layer: *root file system as a layer tar ball. *docker image json (manifest). *docker version.",3 MESOS-4686,"Implement master failover tests for the scheduler library.","Currently, the scheduler library creates its own {{MasterDetector}} object internally. We would need to create a standalone detector and create new tests for testing that callbacks are invoked correctly in the event of a master failover.",3 MESOS-4687,"Implement reliable floating point for scalar resources","Design doc: https://docs.google.com/document/d/14qLxjZsfIpfynbx0USLJR0GELSq8hdZJUWw6kaY_DXc/edit?usp=sharing",5 MESOS-4689,"Design doc for v1 Operator API","We need to design how the v1 operator API (all the HTTP endpoints exposed by master/agent that are not for scheduler/executor interactions) looks and works.",8 MESOS-4690,"Reorganize 3rdparty directory","This issues is currently being discussed in the dev mailing list: http://www.mail-archive.com/dev@mesos.apache.org/msg34349.html",5 MESOS-4691,"Add a HierarchicalAllocator benchmark with reservation labels.","With {{Labels}} being part of the {{ReservationInfo}}, we should ensure that we don't observe a significant performance degradation in the allocator.",3 MESOS-4695,"SlaveTest.StateEndpoint is flaky","{code} [ RUN ] SlaveTest.StateEndpoint ../../src/tests/slave_tests.cpp:1220: Failure Value of: state.values[""start_time""].as().as() Actual: 1458159086 Expected: static_cast(Clock::now().secs()) Which is: 1458159085 [ FAILED ] SlaveTest.StateEndpoint (193 ms) {code} Even though this test does {{Clock::pause()}} before starting the agent, there's a possibility that a numified-stringified double to not equal itself, even after rounding to the nearest int.",1 MESOS-4696,"Allow Reserve operations by a principal without `ReservationInfo.principal`","Currently, we require a framework or operator to specify `ReservationInfo.principal` when they reserve resources. This isn't necessary, however; we already know the principal and can fill in the field if it isn't set already.",2 MESOS-4702,"Document default value of ""offer_timeout""","There isn't a default value (i.e., offers do not timeout by default), but we should clarify this in {{flags.cpp}} and {{configuration.md}}.",1 MESOS-4703,"Make Stout configuration modular and consumable by downstream (e.g., libprocess and agent)","Stout configuration is replicated in at least 3 configuration files -- stout itself, libprocess, and agent. More will follow in the future. We should make a StoutConfigure.cmake that can be included by any package downstream.",1 MESOS-4704,"Enable zlib on Windows.",NULL,1 MESOS-4712,"Remove 'force' field from the Subscribe Call in v1 Scheduler API","We/I introduced the `force` field in SUBSCRIBE call to deal with scheduler partition cases. Having thought a bit more and discussing with few other folks ([~anandmazumdar], [~greggomann]), I think we can get away from not having that field in the v1 API. The obvious advantage of removing the field is that framework devs don't have to think about how/when to set the field (the current semantics are a bit confusing). The new workflow when a master receives a SUBSCRIBE call is that master always accepts this call and closes any existing connection (after sending ERROR event) from the same scheduler (identified by framework id). The expectation from schedulers is that they must close the old subscribe connection before resending a new SUBSCRIBE call. Lets look at some tricky scenarios and see how this works and why it is safe. 1) Connection disconnection @ the scheduler but not @ the master Scheduler sees the disconnection and sends a new SUBSCRIBE call. Master sends ERROR on the old connection (won't be received by the scheduler because the connection is already closed) and closes it. 2) Connection disconnection @ master but not @ scheduler Scheduler realizes this from lack of HEARTBEAT events. It then closes its existing connection and sends a new SUBSCRIBE call. Master accepts the new SUBSCRIBE call. There is no old connection to close on the master as it is already closed. 3) Scheduler failover but no disconnection @ master Newly elected scheduler sends a SUBSCRIBE call. Master sends ERROR event and closes the old connection (won't be received because the old scheduler failed over). 4) If Scheduler A got partitioned (but is alive and connected with master) and Scheduler B got elected as new leader. When Scheduler B sends SUBSCRIBE, master sends ERROR and closes the connection from Scheduler A. Master accepts Scheduler B's connection. Typically Scheduler A aborts after receiving ERROR and gets restarted. After restart it won't become the leader because Scheduler B is already elected. 5) Scheduler sends SUBSCRIBE, times out, closes the SUBSCRIBE connection (A) and sends a new SUBSCRIBE (B). Master receives SUBSCRIBE (B) and then receives SUBSCRIBE (A) but doesn't see A's disconnection yet. Master first accepts SUBSCRIBE (B). After it receives SUBSCRIBE (A), it sends ERROR to SUBSCRIBE (B) and closes that connection. When it accepts SUBSCRIBE (A) and tries to send SUBSCRIBED event the connection closure is detected. Scheduler retries the SUBSCRIBE connection after a backoff. I think this is a rare enough race for it to happen continuously in a loop. ",5 MESOS-4713,"ReviewBot should not fail hard if there are circular dependencies in a review chain","Instead of failing hard, ReviewBot should post an error to the review that a circular dependency is detected.",2 MESOS-4714,"""make DESTDIR= install"" broken","There is a missing '$(DESTDIR)' prefix in the install-data-hook that causes DESTDIR builds to be broken.",2 MESOS-4718,"Add allocator metric for number of completed allocation runs",NULL,1 MESOS-4719,"Add allocator metric for number of offers each framework received","A counter for the number of allocations to a framework can be used to monitor allocation progress, e.g., when agents are added to a cluster, and as other frameworks are added or removed. Currently, an offer by the hierarchical allocator to a framework consists of a list of resources on possibly many agents. Resources might be offered in order to satisfy outstanding quota or for fairness. To capture allocations on fine granularity we should not count the number of offers, but instead the pieces making up that offer, as such a metric would better resolve the effect of changes (e.g., adding/removing a framework). ",2 MESOS-4720,"Add allocator metrics for total vs offered/allocated resources.","Exposing the current allocation breakdown as seen by the allocator will allow us to correlated the corresponding metrics in the master with what the allocator sees. We should expose at least allocated or available, and total.",2 MESOS-4721,"Expose allocation algorithm latency via a metric.","The allocation algorithm has grown to become fairly expensive, gaining visibility into its latency enables monitoring and alerting. Similar allocator timing-related information is already exposed in the log, but should also be exposed via an endpoint.",1 MESOS-4722,"Add allocator metric for number of active offer filters","To diagnose scenarios where frameworks unexpectedly do not receive offers information on currently active filters are needed.",1 MESOS-4723,"Add allocator metric for currently satisfied quotas","We currently expose information on set quotas via dedicated quota endpoints. To diagnose allocator problems one additionally needs information about used quotas.",2 MESOS-4724,"Add allocator metric for currrent dominant shares of frameworks and roles",NULL,5 MESOS-4726,"Document scheduler driver calls in framework development guide.","The interface examples are slightly out of sync with scheduler.hpp, most notably missing the new acceptOffers call.",2 MESOS-4731,"Update /frameworks to use jsonify","This should let us remove the duplicated code in {{http.cpp}} between {{model(Framework)}} and {{json(Full)}}.",3 MESOS-4736,"DockerContainerizerTest.ROOT_DOCKER_LaunchWithPersistentVolumes fails on CentOS 6","This test passes consistently on other OS's, but fails consistently on CentOS 6. Verbose logs from test failure: {code} [ RUN ] DockerContainerizerTest.ROOT_DOCKER_LaunchWithPersistentVolumes I0222 18:16:12.327957 26681 leveldb.cpp:174] Opened db in 7.466102ms I0222 18:16:12.330528 26681 leveldb.cpp:181] Compacted db in 2.540139ms I0222 18:16:12.330580 26681 leveldb.cpp:196] Created db iterator in 16908ns I0222 18:16:12.330592 26681 leveldb.cpp:202] Seeked to beginning of db in 1403ns I0222 18:16:12.330600 26681 leveldb.cpp:271] Iterated through 0 keys in the db in 315ns I0222 18:16:12.330634 26681 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0222 18:16:12.331082 26698 recover.cpp:447] Starting replica recovery I0222 18:16:12.331289 26698 recover.cpp:473] Replica is in EMPTY status I0222 18:16:12.332162 26703 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (13761)@172.30.2.148:35274 I0222 18:16:12.332701 26701 recover.cpp:193] Received a recover response from a replica in EMPTY status I0222 18:16:12.333230 26699 recover.cpp:564] Updating replica status to STARTING I0222 18:16:12.334102 26698 master.cpp:376] Master 652149b4-3932-4d8b-ba6f-8c9d9045be70 (ip-172-30-2-148.mesosphere.io) started on 172.30.2.148:35274 I0222 18:16:12.334116 26698 master.cpp:378] Flags at startup: --acls="""" --allocation_interval=""1secs"" --allocator=""HierarchicalDRF"" --authenticate=""true"" --authenticate_http=""true"" --authenticate_slaves=""true"" --authenticators=""crammd5"" --authorizers=""local"" --credentials=""/tmp/QEhLBS/credentials"" --framework_sorter=""drf"" --help=""false"" --hostname_lookup=""true"" --http_authenticators=""basic"" --initialize_driver_logging=""true"" --log_auto_initialize=""true"" --logbufsecs=""0"" --logging_level=""INFO"" --max_completed_frameworks=""50"" --max_completed_tasks_per_framework=""1000"" --max_slave_ping_timeouts=""5"" --quiet=""false"" --recovery_slave_removal_limit=""100%"" --registry=""replicated_log"" --registry_fetch_timeout=""1mins"" --registry_store_timeout=""100secs"" --registry_strict=""true"" --root_submissions=""true"" --slave_ping_timeout=""15secs"" --slave_reregister_timeout=""10mins"" --user_sorter=""drf"" --version=""false"" --webui_dir=""/usr/local/share/mesos/webui"" --work_dir=""/tmp/QEhLBS/master"" --zk_session_timeout=""10secs"" I0222 18:16:12.334354 26698 master.cpp:423] Master only allowing authenticated frameworks to register I0222 18:16:12.334363 26698 master.cpp:428] Master only allowing authenticated slaves to register I0222 18:16:12.334369 26698 credentials.hpp:35] Loading credentials for authentication from '/tmp/QEhLBS/credentials' I0222 18:16:12.335366 26698 master.cpp:468] Using default 'crammd5' authenticator I0222 18:16:12.335492 26698 master.cpp:537] Using default 'basic' HTTP authenticator I0222 18:16:12.335623 26698 master.cpp:571] Authorization enabled I0222 18:16:12.335752 26703 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 2.314693ms I0222 18:16:12.335769 26700 whitelist_watcher.cpp:77] No whitelist given I0222 18:16:12.335778 26703 replica.cpp:320] Persisted replica status to STARTING I0222 18:16:12.335821 26697 hierarchical.cpp:144] Initialized hierarchical allocator process I0222 18:16:12.335965 26701 recover.cpp:473] Replica is in STARTING status I0222 18:16:12.336771 26703 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (13763)@172.30.2.148:35274 I0222 18:16:12.337191 26696 recover.cpp:193] Received a recover response from a replica in STARTING status I0222 18:16:12.337635 26700 recover.cpp:564] Updating replica status to VOTING I0222 18:16:12.337671 26703 master.cpp:1712] The newly elected leader is master@172.30.2.148:35274 with id 652149b4-3932-4d8b-ba6f-8c9d9045be70 I0222 18:16:12.337698 26703 master.cpp:1725] Elected as the leading master! I0222 18:16:12.337713 26703 master.cpp:1470] Recovering from registrar I0222 18:16:12.337828 26696 registrar.cpp:307] Recovering registrar I0222 18:16:12.339972 26702 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 2.06039ms I0222 18:16:12.339994 26702 replica.cpp:320] Persisted replica status to VOTING I0222 18:16:12.340082 26700 recover.cpp:578] Successfully joined the Paxos group I0222 18:16:12.340267 26700 recover.cpp:462] Recover process terminated I0222 18:16:12.340591 26699 log.cpp:659] Attempting to start the writer I0222 18:16:12.341594 26698 replica.cpp:493] Replica received implicit promise request from (13764)@172.30.2.148:35274 with proposal 1 I0222 18:16:12.343598 26698 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 1.97941ms I0222 18:16:12.343619 26698 replica.cpp:342] Persisted promised to 1 I0222 18:16:12.344182 26698 coordinator.cpp:238] Coordinator attempting to fill missing positions I0222 18:16:12.345285 26702 replica.cpp:388] Replica received explicit promise request from (13765)@172.30.2.148:35274 for position 0 with proposal 2 I0222 18:16:12.347275 26702 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 1.960198ms I0222 18:16:12.347296 26702 replica.cpp:712] Persisted action at 0 I0222 18:16:12.348201 26703 replica.cpp:537] Replica received write request for position 0 from (13766)@172.30.2.148:35274 I0222 18:16:12.348247 26703 leveldb.cpp:436] Reading position from leveldb took 21399ns I0222 18:16:12.350667 26703 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 2.39166ms I0222 18:16:12.350690 26703 replica.cpp:712] Persisted action at 0 I0222 18:16:12.351191 26696 replica.cpp:691] Replica received learned notice for position 0 from @0.0.0.0:0 I0222 18:16:12.353152 26696 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 1.935798ms I0222 18:16:12.353173 26696 replica.cpp:712] Persisted action at 0 I0222 18:16:12.353188 26696 replica.cpp:697] Replica learned NOP action at position 0 I0222 18:16:12.353639 26696 log.cpp:675] Writer started with ending position 0 I0222 18:16:12.354508 26697 leveldb.cpp:436] Reading position from leveldb took 25625ns I0222 18:16:12.355274 26696 registrar.cpp:340] Successfully fetched the registry (0B) in 17.406976ms I0222 18:16:12.355357 26696 registrar.cpp:439] Applied 1 operations in 20977ns; attempting to update the 'registry' I0222 18:16:12.355929 26697 log.cpp:683] Attempting to append 210 bytes to the log I0222 18:16:12.356032 26703 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 1 I0222 18:16:12.356657 26698 replica.cpp:537] Replica received write request for position 1 from (13767)@172.30.2.148:35274 I0222 18:16:12.358566 26698 leveldb.cpp:341] Persisting action (229 bytes) to leveldb took 1.881945ms I0222 18:16:12.358588 26698 replica.cpp:712] Persisted action at 1 I0222 18:16:12.359081 26697 replica.cpp:691] Replica received learned notice for position 1 from @0.0.0.0:0 I0222 18:16:12.361002 26697 leveldb.cpp:341] Persisting action (231 bytes) to leveldb took 1.894331ms I0222 18:16:12.361023 26697 replica.cpp:712] Persisted action at 1 I0222 18:16:12.361038 26697 replica.cpp:697] Replica learned APPEND action at position 1 I0222 18:16:12.361883 26697 registrar.cpp:484] Successfully updated the 'registry' in 6.482944ms I0222 18:16:12.361981 26697 registrar.cpp:370] Successfully recovered registrar I0222 18:16:12.362052 26701 log.cpp:702] Attempting to truncate the log to 1 I0222 18:16:12.362167 26703 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 2 I0222 18:16:12.362421 26696 master.cpp:1522] Recovered 0 slaves from the Registry (171B) ; allowing 10mins for slaves to re-register I0222 18:16:12.362447 26698 hierarchical.cpp:171] Skipping recovery of hierarchical allocator: nothing to recover I0222 18:16:12.362911 26701 replica.cpp:537] Replica received write request for position 2 from (13768)@172.30.2.148:35274 I0222 18:16:12.364760 26701 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 1.819954ms I0222 18:16:12.364783 26701 replica.cpp:712] Persisted action at 2 I0222 18:16:12.365384 26697 replica.cpp:691] Replica received learned notice for position 2 from @0.0.0.0:0 I0222 18:16:12.367961 26697 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 2.55143ms I0222 18:16:12.368015 26697 leveldb.cpp:399] Deleting ~1 keys from leveldb took 28196ns I0222 18:16:12.368028 26697 replica.cpp:712] Persisted action at 2 I0222 18:16:12.368044 26697 replica.cpp:697] Replica learned TRUNCATE action at position 2 I0222 18:16:12.376824 26703 slave.cpp:193] Slave started on 396)@172.30.2.148:35274 I0222 18:16:12.376838 26703 slave.cpp:194] Flags at startup: --appc_simple_discovery_uri_prefix=""http://"" --appc_store_dir=""/tmp/mesos/store/appc"" --authenticatee=""crammd5"" --cgroups_cpu_enable_pids_and_tids_count=""false"" --cgroups_enable_cfs=""false"" --cgroups_hierarchy=""/sys/fs/cgroup"" --cgroups_limit_swap=""false"" --cgroups_root=""mesos"" --container_disk_watch_interval=""15secs"" --containerizers=""mesos"" --credential=""/tmp/DockerContainerizerTest_ROOT_DOCKER_LaunchWithPersistentVolumes_U5vZX1/credential"" --default_role=""*"" --disk_watch_interval=""1mins"" --docker=""docker"" --docker_auth_server=""https://auth.docker.io"" --docker_kill_orphans=""true"" --docker_puller_timeout=""60"" --docker_registry=""https://registry-1.docker.io"" --docker_remove_delay=""6hrs"" --docker_socket=""/var/run/docker.sock"" --docker_stop_timeout=""0ns"" --docker_store_dir=""/tmp/mesos/store/docker"" --enforce_container_disk_quota=""false"" --executor_registration_timeout=""1mins"" --executor_shutdown_grace_period=""5secs"" --fetcher_cache_dir=""/tmp/DockerContainerizerTest_ROOT_DOCKER_LaunchWithPersistentVolumes_U5vZX1/fetch"" --fetcher_cache_size=""2GB"" --frameworks_home="""" --gc_delay=""1weeks"" --gc_disk_headroom=""0.1"" --hadoop_home="""" --help=""false"" --hostname_lookup=""true"" --image_provisioner_backend=""copy"" --initialize_driver_logging=""true"" --isolation=""posix/cpu,posix/mem"" --launcher_dir=""/mnt/teamcity/work/4240ba9ddd0997c3/build/src"" --logbufsecs=""0"" --logging_level=""INFO"" --oversubscribed_resources_interval=""15secs"" --perf_duration=""10secs"" --perf_interval=""1mins"" --qos_correction_interval_min=""0ns"" --quiet=""false"" --recover=""reconnect"" --recovery_timeout=""15mins"" --registration_backoff_factor=""10ms"" --resources=""cpu:2;mem:2048;disk(role1):2048"" --revocable_cpu_low_priority=""true"" --sandbox_directory=""/mnt/mesos/sandbox"" --strict=""true"" --switch_user=""true"" --systemd_enable_support=""true"" --systemd_runtime_directory=""/run/systemd/system"" --version=""false"" --work_dir=""/tmp/DockerContainerizerTest_ROOT_DOCKER_LaunchWithPersistentVolumes_U5vZX1"" I0222 18:16:12.377109 26703 credentials.hpp:83] Loading credential for authentication from '/tmp/DockerContainerizerTest_ROOT_DOCKER_LaunchWithPersistentVolumes_U5vZX1/credential' I0222 18:16:12.377300 26703 slave.cpp:324] Slave using credential for: test-principal I0222 18:16:12.377439 26703 resources.cpp:576] Parsing resources as JSON failed: cpu:2;mem:2048;disk(role1):2048 Trying semicolon-delimited string format instead I0222 18:16:12.377804 26703 slave.cpp:464] Slave resources: cpu(*):2; mem(*):2048; disk(role1):2048; cpus(*):8; ports(*):[31000-32000] I0222 18:16:12.377881 26703 slave.cpp:472] Slave attributes: [ ] I0222 18:16:12.377889 26703 slave.cpp:477] Slave hostname: ip-172-30-2-148.mesosphere.io I0222 18:16:12.378779 26701 state.cpp:58] Recovering state from '/tmp/DockerContainerizerTest_ROOT_DOCKER_LaunchWithPersistentVolumes_U5vZX1/meta' I0222 18:16:12.379092 26697 status_update_manager.cpp:200] Recovering status update manager I0222 18:16:12.379156 26681 sched.cpp:222] Version: 0.28.0 I0222 18:16:12.379250 26697 docker.cpp:722] Recovering Docker containers I0222 18:16:12.379421 26703 slave.cpp:4565] Finished recovery I0222 18:16:12.379627 26700 sched.cpp:326] New master detected at master@172.30.2.148:35274 I0222 18:16:12.379735 26703 slave.cpp:4737] Querying resource estimator for oversubscribable resources I0222 18:16:12.379765 26700 sched.cpp:382] Authenticating with master master@172.30.2.148:35274 I0222 18:16:12.379781 26700 sched.cpp:389] Using default CRAM-MD5 authenticatee I0222 18:16:12.379964 26696 status_update_manager.cpp:174] Pausing sending status updates I0222 18:16:12.379992 26702 authenticatee.cpp:121] Creating new client SASL connection I0222 18:16:12.380030 26697 slave.cpp:796] New master detected at master@172.30.2.148:35274 I0222 18:16:12.380106 26697 slave.cpp:859] Authenticating with master master@172.30.2.148:35274 I0222 18:16:12.380127 26697 slave.cpp:864] Using default CRAM-MD5 authenticatee I0222 18:16:12.380188 26699 master.cpp:5526] Authenticating scheduler-1850b1cd-3396-4479-b2f3-47ee6c3fa270@172.30.2.148:35274 I0222 18:16:12.380269 26700 authenticator.cpp:413] Starting authentication session for crammd5_authenticatee(832)@172.30.2.148:35274 I0222 18:16:12.380280 26698 authenticatee.cpp:121] Creating new client SASL connection I0222 18:16:12.380307 26697 slave.cpp:832] Detecting new master I0222 18:16:12.380450 26697 slave.cpp:4751] Received oversubscribable resources from the resource estimator I0222 18:16:12.380452 26699 master.cpp:5526] Authenticating slave(396)@172.30.2.148:35274 I0222 18:16:12.380506 26698 authenticator.cpp:98] Creating new server SASL connection I0222 18:16:12.380540 26697 authenticator.cpp:413] Starting authentication session for crammd5_authenticatee(833)@172.30.2.148:35274 I0222 18:16:12.380635 26700 authenticatee.cpp:212] Received SASL authentication mechanisms: CRAM-MD5 I0222 18:16:12.380659 26700 authenticatee.cpp:238] Attempting to authenticate with mechanism 'CRAM-MD5' I0222 18:16:12.380762 26700 authenticator.cpp:203] Received SASL authentication start I0222 18:16:12.380765 26701 authenticator.cpp:98] Creating new server SASL connection I0222 18:16:12.380843 26700 authenticator.cpp:325] Authentication requires more steps I0222 18:16:12.380911 26698 authenticatee.cpp:212] Received SASL authentication mechanisms: CRAM-MD5 I0222 18:16:12.380931 26702 authenticatee.cpp:258] Received SASL authentication step I0222 18:16:12.380936 26698 authenticatee.cpp:238] Attempting to authenticate with mechanism 'CRAM-MD5' I0222 18:16:12.381036 26702 authenticator.cpp:231] Received SASL authentication step I0222 18:16:12.381052 26698 authenticator.cpp:203] Received SASL authentication start I0222 18:16:12.381062 26702 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: 'ip-172-30-2-148' server FQDN: 'ip-172-30-2-148' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0222 18:16:12.381072 26702 auxprop.cpp:179] Looking up auxiliary property '*userPassword' I0222 18:16:12.381104 26702 auxprop.cpp:179] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0222 18:16:12.381104 26698 authenticator.cpp:325] Authentication requires more steps I0222 18:16:12.381134 26702 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: 'ip-172-30-2-148' server FQDN: 'ip-172-30-2-148' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0222 18:16:12.381142 26702 auxprop.cpp:129] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0222 18:16:12.381147 26702 auxprop.cpp:129] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0222 18:16:12.381162 26702 authenticator.cpp:317] Authentication success I0222 18:16:12.381184 26698 authenticatee.cpp:258] Received SASL authentication step I0222 18:16:12.381247 26699 authenticatee.cpp:298] Authentication success I0222 18:16:12.381283 26696 authenticator.cpp:231] Received SASL authentication step I0222 18:16:12.381311 26696 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: 'ip-172-30-2-148' server FQDN: 'ip-172-30-2-148' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0222 18:16:12.381325 26696 auxprop.cpp:179] Looking up auxiliary property '*userPassword' I0222 18:16:12.381319 26701 master.cpp:5556] Successfully authenticated principal 'test-principal' at scheduler-1850b1cd-3396-4479-b2f3-47ee6c3fa270@172.30.2.148:35274 I0222 18:16:12.381345 26700 authenticator.cpp:431] Authentication session cleanup for crammd5_authenticatee(832)@172.30.2.148:35274 I0222 18:16:12.381361 26696 auxprop.cpp:179] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0222 18:16:12.381397 26696 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: 'ip-172-30-2-148' server FQDN: 'ip-172-30-2-148' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0222 18:16:12.381413 26696 auxprop.cpp:129] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0222 18:16:12.381422 26696 auxprop.cpp:129] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0222 18:16:12.381441 26696 authenticator.cpp:317] Authentication success I0222 18:16:12.381548 26698 sched.cpp:471] Successfully authenticated with master master@172.30.2.148:35274 I0222 18:16:12.381563 26698 sched.cpp:776] Sending SUBSCRIBE call to master@172.30.2.148:35274 I0222 18:16:12.381634 26700 authenticatee.cpp:298] Authentication success I0222 18:16:12.381660 26698 sched.cpp:809] Will retry registration in 770.60771ms if necessary I0222 18:16:12.381675 26697 master.cpp:5556] Successfully authenticated principal 'test-principal' at slave(396)@172.30.2.148:35274 I0222 18:16:12.381734 26702 authenticator.cpp:431] Authentication session cleanup for crammd5_authenticatee(833)@172.30.2.148:35274 I0222 18:16:12.381811 26697 master.cpp:2280] Received SUBSCRIBE call for framework 'default' at scheduler-1850b1cd-3396-4479-b2f3-47ee6c3fa270@172.30.2.148:35274 I0222 18:16:12.381882 26697 master.cpp:1751] Authorizing framework principal 'test-principal' to receive offers for role 'role1' I0222 18:16:12.382004 26698 slave.cpp:927] Successfully authenticated with master master@172.30.2.148:35274 I0222 18:16:12.382123 26698 slave.cpp:1321] Will retry registration in 8.1941ms if necessary I0222 18:16:12.382282 26701 master.cpp:4240] Registering slave at slave(396)@172.30.2.148:35274 (ip-172-30-2-148.mesosphere.io) with id 652149b4-3932-4d8b-ba6f-8c9d9045be70-S0 I0222 18:16:12.382482 26701 master.cpp:2351] Subscribing framework default with checkpointing disabled and capabilities [ ] I0222 18:16:12.382612 26703 registrar.cpp:439] Applied 1 operations in 46327ns; attempting to update the 'registry' I0222 18:16:12.382829 26699 hierarchical.cpp:265] Added framework 652149b4-3932-4d8b-ba6f-8c9d9045be70-0000 I0222 18:16:12.382910 26699 hierarchical.cpp:1434] No resources available to allocate! I0222 18:16:12.382915 26701 sched.cpp:703] Framework registered with 652149b4-3932-4d8b-ba6f-8c9d9045be70-0000 I0222 18:16:12.382936 26699 hierarchical.cpp:1529] No inverse offers to send out! I0222 18:16:12.382953 26699 hierarchical.cpp:1127] Performed allocation for 0 slaves in 89949ns I0222 18:16:12.382982 26701 sched.cpp:717] Scheduler::registered took 46498ns I0222 18:16:12.383536 26698 log.cpp:683] Attempting to append 423 bytes to the log I0222 18:16:12.383628 26699 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 3 I0222 18:16:12.384196 26700 replica.cpp:537] Replica received write request for position 3 from (13775)@172.30.2.148:35274 I0222 18:16:12.386602 26700 leveldb.cpp:341] Persisting action (442 bytes) to leveldb took 2.377119ms I0222 18:16:12.386625 26700 replica.cpp:712] Persisted action at 3 I0222 18:16:12.387104 26698 replica.cpp:691] Replica received learned notice for position 3 from @0.0.0.0:0 I0222 18:16:12.389159 26698 leveldb.cpp:341] Persisting action (444 bytes) to leveldb took 2.032301ms I0222 18:16:12.389181 26698 replica.cpp:712] Persisted action at 3 I0222 18:16:12.389196 26698 replica.cpp:697] Replica learned APPEND action at position 3 I0222 18:16:12.390281 26698 registrar.cpp:484] Suc...",2 MESOS-4746,"CMake: Add leveldb library to 3rdparty external builds.",NULL,3 MESOS-4747,"ContainerLoggerTest.MesosContainerizerRecover cannot be executed in isolation","Some cleanup of spawned processes is missing in {{ContainerLoggerTest.MesosContainerizerRecover}} so that when the test is run in isolation the global teardown might find lingering processes. {code} [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from ContainerLoggerTest [ RUN ] ContainerLoggerTest.MesosContainerizerRecover [ OK ] ContainerLoggerTest.MesosContainerizerRecover (13 ms) [----------] 1 test from ContainerLoggerTest (13 ms total) [----------] Global test environment tear-down ../../src/tests/environment.cpp:728: Failure Failed Tests completed with child processes remaining: -+- 7112 /SOME/PATH/src/mesos/build/src/.libs/mesos-tests --gtest_filter=ContainerLoggerTest.MesosContainerizerRecover \--- 7130 (sh) [==========] 1 test from 1 test case ran. (23 ms total) [ PASSED ] 1 test. [ FAILED ] 0 tests, listed below: 0 FAILED TESTS {code} Observered on OS X with clang-trunk and an unoptimized build. ",1 MESOS-4748,"Add Appc image fetcher tests.","Mesos now has support for fetching Appc images. Add tests that verifies the new component.",3 MESOS-4749,"Move HTB out of containers","Currently we set a fixed HTB bandwidth in each of the container, which makes it impossible to share the link if idle. As the first step, we should move it out of the containers, into the qdisc hierarchy of the physical interface.",3 MESOS-4750,"Document: Mesos Executor expects all SSL_* environment variables to be set","I was trying to run Docker containers in a fully SSL-ized Mesos cluster but ran into problems because the executor was failing with a ""Failed to shutdown socket with fd 10: Transport endpoint is not connected"". My understanding of why this is happening is because the executor was trying to report its status to Mesos slave over HTTPS, but doesnt have the appropriate certs/env setup inside the executor. (Thanks to mslackbot/joseph for helping me figure this out on #mesos) It turns out, the executor expects all SSL_* variables to be set inside `CommandInfo.environment` which gets picked up by the executor to successfully reports its status to the slave. This part of __executor needing all the SSL_* variables to be set in its environment__ is missing in the Mesos SSL transitioning guide. I request you to please add this vital information to the doc.",2 MESOS-4754,"The ""executors"" field is exposed under a backwards incompatible schema.","In 0.26.0, the master's {{/state}} endpoint generated the following: {code} { /* ... */ ""frameworks"": [ { /* ... */ ""executors"": [ { ""command"": { ""argv"": [], ""uris"": [], ""value"": ""/Users/mpark/Projects/mesos/build/opt/src/long-lived-executor"" }, ""executor_id"": ""default"", ""framework_id"": ""0ea528a9-64ba-417f-98ea-9c4b8d418db6-0000"", ""name"": ""Long Lived Executor (C++)"", ""resources"": { ""cpus"": 0, ""disk"": 0, ""mem"": 0 }, ""slave_id"": ""8a513678-03a1-4cb5-9279-c3c0c591f1d8-S0"" } ], /* ... */ } ] /* ... */ } {code} In 0.27.1, the {{ExecutorInfo}} is mistakenly exposed in the raw protobuf schema: {code} { /* ... */ ""frameworks"": [ { /* ... */ ""executors"": [ { ""command"": { ""shell"": true, ""value"": ""/Users/mpark/Projects/mesos/build/opt/src/long-lived-executor"" }, ""executor_id"": { ""value"": ""default"" }, ""framework_id"": { ""value"": ""368a5a49-480b-41f6-a13b-24a69c92a72e-0000"" }, ""name"": ""Long Lived Executor (C++)"", ""slave_id"": ""8a513678-03a1-4cb5-9279-c3c0c591f1d8-S0"", ""source"": ""cpp_long_lived_framework"" } ], /* ... */ } ] /* ... */ } {code} This is a backwards incompatible API change.",2 MESOS-4757,"Mesos containerizer should get uid/gids before pivot_root.","Currently, we call os::su(user) after pivot_root. This is problematic because /etc/passwd and /etc/group might be missing in container's root filesystem. We should instead, get the uid/gids before pivot_root, and call setuid/setgroups after pivot_root.",3 MESOS-4758,"Add a 'name' field into NetworkInfo.","This allows the framework writer to specify the name of the network they want their container to join. Why not using 'groups'? That's because there might be multiple groups under a single network (e.g., admin vs. user, public vs. private, etc.).",1 MESOS-4759,"Add network/cni isolator for Mesos containerizer.","See the design doc for more context (MESOS-4742). The isolator will interact with CNI plugins to create the network for the container to join.",8 MESOS-4760,"Expose metrics and gauges for fetcher cache usage and hit rate","To evaluate the fetcher cache and calibrate the value of the fetcher_cache_size flag, it would be useful to have metrics and gauges on agents that expose operational statistics like cache hit rate, occupied cache size, and time spent downloading resources that were not present.",2 MESOS-4761,"Add agent flags to allow operators to specify CNI plugin and config directories.","According to design doc, we plan to add the following flags: “--network_cni_plugins_dir” Location of the CNI plugin binaries. The “network/cni” isolator will find CNI plugins under this directory so that it can execute the plugins to add/delete container from the CNI networks. It is the operator’s responsibility to install the CNI plugin binaries in the specified directory. “--network_cni_config_dir” Location of the CNI network configuration files. For each network that containers launched in Mesos agent can connect to, the operator should install a network configuration file in JSON format in the specified directory.",2 MESOS-4762,"Setup proper DNS resolver for containers in network/cni isolator.","Please get more context from the design doc (MESOS-4742). The CNI plugin will return the DNS information about the network. The network/cni isolator needs to properly setup /etc/resolv.conf for the container. We should consider the following cases: 1) container is using host filesystem 2) container is using a different filesystem 3) custom executor and command executor",5 MESOS-4763,"Add test mock for CNI plugins.","In order to test the network/cni isolator, we need to mock the behavior of an CNI plugin. One option is to write a mock script which acts as a CNI plugin. The isolator will talk to the mock script the same way it talks to an actual CNI plugin. The mock script can just join the host network?",5 MESOS-4764,"The network/cni isolator should report assigned IP address. ","In order for service discovery to work in some cases, the network/cni isolator needs to report the assigned IP address through the isolator->status() interface.",3 MESOS-4768,"MasterMaintenanceTest.InverseOffers is flaky","[MESOS-4169] significantly sped up this test, but also surfaced some more flakiness. This can be fixed in the same way as [MESOS-4059]. Verbose logs from ASF Centos7 build: {code} [ RUN ] MasterMaintenanceTest.InverseOffers I0224 22:35:53.714018 1948 leveldb.cpp:174] Opened db in 2.034387ms I0224 22:35:53.714663 1948 leveldb.cpp:181] Compacted db in 608839ns I0224 22:35:53.714709 1948 leveldb.cpp:196] Created db iterator in 19043ns I0224 22:35:53.714844 1948 leveldb.cpp:202] Seeked to beginning of db in 2330ns I0224 22:35:53.714956 1948 leveldb.cpp:271] Iterated through 0 keys in the db in 518ns I0224 22:35:53.715092 1948 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0224 22:35:53.715646 1968 recover.cpp:447] Starting replica recovery I0224 22:35:53.715915 1981 recover.cpp:473] Replica is in EMPTY status I0224 22:35:53.717067 1972 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (4533)@172.17.0.1:36678 I0224 22:35:53.717445 1981 recover.cpp:193] Received a recover response from a replica in EMPTY status I0224 22:35:53.717888 1978 recover.cpp:564] Updating replica status to STARTING I0224 22:35:53.718585 1979 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 525061ns I0224 22:35:53.718618 1979 replica.cpp:320] Persisted replica status to STARTING I0224 22:35:53.718827 1982 recover.cpp:473] Replica is in STARTING status I0224 22:35:53.719728 1969 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (4534)@172.17.0.1:36678 I0224 22:35:53.719974 1971 recover.cpp:193] Received a recover response from a replica in STARTING status I0224 22:35:53.720369 1970 recover.cpp:564] Updating replica status to VOTING I0224 22:35:53.720789 1982 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 322308ns I0224 22:35:53.720823 1982 replica.cpp:320] Persisted replica status to VOTING I0224 22:35:53.720968 1982 recover.cpp:578] Successfully joined the Paxos group I0224 22:35:53.721101 1982 recover.cpp:462] Recover process terminated I0224 22:35:53.721698 1982 master.cpp:376] Master aab18b61-7811-4c43-a672-d1a63818c880 (4db5fa128d2d) started on 172.17.0.1:36678 I0224 22:35:53.721719 1982 master.cpp:378] Flags at startup: --acls="""" --allocation_interval=""1secs"" --allocator=""HierarchicalDRF"" --authenticate=""false"" --authenticate_http=""true"" --authenticate_slaves=""true"" --authenticators=""crammd5"" --authorizers=""local"" --credentials=""/tmp/MjbcWP/credentials"" --framework_sorter=""drf"" --help=""false"" --hostname_lookup=""true"" --http_authenticators=""basic"" --initialize_driver_logging=""true"" --log_auto_initialize=""true"" --logbufsecs=""0"" --logging_level=""INFO"" --max_completed_frameworks=""50"" --max_completed_tasks_per_framework=""1000"" --max_slave_ping_timeouts=""5"" --quiet=""false"" --recovery_slave_removal_limit=""100%"" --registry=""replicated_log"" --registry_fetch_timeout=""1mins"" --registry_store_timeout=""100secs"" --registry_strict=""true"" --root_submissions=""true"" --slave_ping_timeout=""15secs"" --slave_reregister_timeout=""10mins"" --user_sorter=""drf"" --version=""false"" --webui_dir=""/mesos/mesos-0.28.0/_inst/share/mesos/webui"" --work_dir=""/tmp/MjbcWP/master"" --zk_session_timeout=""10secs"" I0224 22:35:53.722039 1982 master.cpp:425] Master allowing unauthenticated frameworks to register I0224 22:35:53.722053 1982 master.cpp:428] Master only allowing authenticated slaves to register I0224 22:35:53.722061 1982 credentials.hpp:35] Loading credentials for authentication from '/tmp/MjbcWP/credentials' I0224 22:35:53.722394 1982 master.cpp:468] Using default 'crammd5' authenticator I0224 22:35:53.722525 1982 master.cpp:537] Using default 'basic' HTTP authenticator I0224 22:35:53.722661 1982 master.cpp:571] Authorization enabled I0224 22:35:53.722813 1968 hierarchical.cpp:144] Initialized hierarchical allocator process I0224 22:35:53.722846 1980 whitelist_watcher.cpp:77] No whitelist given I0224 22:35:53.724957 1977 master.cpp:1712] The newly elected leader is master@172.17.0.1:36678 with id aab18b61-7811-4c43-a672-d1a63818c880 I0224 22:35:53.725000 1977 master.cpp:1725] Elected as the leading master! I0224 22:35:53.725023 1977 master.cpp:1470] Recovering from registrar I0224 22:35:53.725306 1967 registrar.cpp:307] Recovering registrar I0224 22:35:53.725808 1977 log.cpp:659] Attempting to start the writer I0224 22:35:53.727145 1973 replica.cpp:493] Replica received implicit promise request from (4536)@172.17.0.1:36678 with proposal 1 I0224 22:35:53.727728 1973 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 424560ns I0224 22:35:53.727828 1973 replica.cpp:342] Persisted promised to 1 I0224 22:35:53.729080 1973 coordinator.cpp:238] Coordinator attempting to fill missing positions I0224 22:35:53.731009 1979 replica.cpp:388] Replica received explicit promise request from (4537)@172.17.0.1:36678 for position 0 with proposal 2 I0224 22:35:53.731580 1979 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 478479ns I0224 22:35:53.731613 1979 replica.cpp:712] Persisted action at 0 I0224 22:35:53.734354 1979 replica.cpp:537] Replica received write request for position 0 from (4538)@172.17.0.1:36678 I0224 22:35:53.734485 1979 leveldb.cpp:436] Reading position from leveldb took 60879ns I0224 22:35:53.735877 1979 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 1.324061ms I0224 22:35:53.735930 1979 replica.cpp:712] Persisted action at 0 I0224 22:35:53.737061 1970 replica.cpp:691] Replica received learned notice for position 0 from @0.0.0.0:0 I0224 22:35:53.738881 1970 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 1.772814ms I0224 22:35:53.738939 1970 replica.cpp:712] Persisted action at 0 I0224 22:35:53.738975 1970 replica.cpp:697] Replica learned NOP action at position 0 I0224 22:35:53.740136 1976 log.cpp:675] Writer started with ending position 0 I0224 22:35:53.741750 1976 leveldb.cpp:436] Reading position from leveldb took 74863ns I0224 22:35:53.743479 1976 registrar.cpp:340] Successfully fetched the registry (0B) in 18.11968ms I0224 22:35:53.743755 1976 registrar.cpp:439] Applied 1 operations in 56670ns; attempting to update the 'registry' I0224 22:35:53.745604 1978 log.cpp:683] Attempting to append 170 bytes to the log I0224 22:35:53.745905 1977 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 1 I0224 22:35:53.746968 1981 replica.cpp:537] Replica received write request for position 1 from (4539)@172.17.0.1:36678 I0224 22:35:53.747480 1981 leveldb.cpp:341] Persisting action (189 bytes) to leveldb took 456947ns I0224 22:35:53.747609 1981 replica.cpp:712] Persisted action at 1 I0224 22:35:53.750448 1981 replica.cpp:691] Replica received learned notice for position 1 from @0.0.0.0:0 I0224 22:35:53.751158 1981 leveldb.cpp:341] Persisting action (191 bytes) to leveldb took 535163ns I0224 22:35:53.751258 1981 replica.cpp:712] Persisted action at 1 I0224 22:35:53.751389 1981 replica.cpp:697] Replica learned APPEND action at position 1 I0224 22:35:53.753149 1979 registrar.cpp:484] Successfully updated the 'registry' in 9.228032ms I0224 22:35:53.753324 1979 registrar.cpp:370] Successfully recovered registrar I0224 22:35:53.753593 1979 log.cpp:702] Attempting to truncate the log to 1 I0224 22:35:53.753805 1979 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 2 I0224 22:35:53.754055 1981 master.cpp:1522] Recovered 0 slaves from the Registry (131B) ; allowing 10mins for slaves to re-register I0224 22:35:53.754349 1979 hierarchical.cpp:171] Skipping recovery of hierarchical allocator: nothing to recover I0224 22:35:53.755764 1977 replica.cpp:537] Replica received write request for position 2 from (4540)@172.17.0.1:36678 I0224 22:35:53.756459 1977 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 488559ns I0224 22:35:53.756561 1977 replica.cpp:712] Persisted action at 2 I0224 22:35:53.757932 1972 replica.cpp:691] Replica received learned notice for position 2 from @0.0.0.0:0 I0224 22:35:53.758400 1972 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 343827ns I0224 22:35:53.758539 1972 leveldb.cpp:399] Deleting ~1 keys from leveldb took 34231ns I0224 22:35:53.758658 1972 replica.cpp:712] Persisted action at 2 I0224 22:35:53.758782 1972 replica.cpp:697] Replica learned TRUNCATE action at position 2 I0224 22:35:53.778059 1978 slave.cpp:193] Slave started on 115)@172.17.0.1:36678 I0224 22:35:53.778105 1978 slave.cpp:194] Flags at startup: --appc_simple_discovery_uri_prefix=""http://"" --appc_store_dir=""/tmp/mesos/store/appc"" --authenticatee=""crammd5"" --cgroups_cpu_enable_pids_and_tids_count=""false"" --cgroups_enable_cfs=""false"" --cgroups_hierarchy=""/sys/fs/cgroup"" --cgroups_limit_swap=""false"" --cgroups_root=""mesos"" --container_disk_watch_interval=""15secs"" --containerizers=""mesos"" --credential=""/tmp/MasterMaintenanceTest_InverseOffers_ywqvFF/credential"" --default_role=""*"" --disk_watch_interval=""1mins"" --docker=""docker"" --docker_kill_orphans=""true"" --docker_registry=""https://registry-1.docker.io"" --docker_remove_delay=""6hrs"" --docker_socket=""/var/run/docker.sock"" --docker_stop_timeout=""0ns"" --docker_store_dir=""/tmp/mesos/store/docker"" --enforce_container_disk_quota=""false"" --executor_registration_timeout=""1mins"" --executor_shutdown_grace_period=""5secs"" --fetcher_cache_dir=""/tmp/MasterMaintenanceTest_InverseOffers_ywqvFF/fetch"" --fetcher_cache_size=""2GB"" --frameworks_home="""" --gc_delay=""1weeks"" --gc_disk_headroom=""0.1"" --hadoop_home="""" --help=""false"" --hostname=""maintenance-host"" --hostname_lookup=""true"" --image_provisioner_backend=""copy"" --initialize_driver_logging=""true"" --isolation=""posix/cpu,posix/mem"" --launcher_dir=""/mesos/mesos-0.28.0/_build/src"" --logbufsecs=""0"" --logging_level=""INFO"" --oversubscribed_resources_interval=""15secs"" --perf_duration=""10secs"" --perf_interval=""1mins"" --qos_correction_interval_min=""0ns"" --quiet=""false"" --recover=""reconnect"" --recovery_timeout=""15mins"" --registration_backoff_factor=""10ms"" --resources=""cpus:2;mem:1024;disk:1024;ports:[31000-32000]"" --revocable_cpu_low_priority=""true"" --sandbox_directory=""/mnt/mesos/sandbox"" --strict=""true"" --switch_user=""true"" --systemd_enable_support=""true"" --systemd_runtime_directory=""/run/systemd/system"" --version=""false"" --work_dir=""/tmp/MasterMaintenanceTest_InverseOffers_ywqvFF"" I0224 22:35:53.778609 1978 credentials.hpp:83] Loading credential for authentication from '/tmp/MasterMaintenanceTest_InverseOffers_ywqvFF/credential' I0224 22:35:53.779175 1978 slave.cpp:324] Slave using credential for: test-principal I0224 22:35:53.779520 1978 resources.cpp:576] Parsing resources as JSON failed: cpus:2;mem:1024;disk:1024;ports:[31000-32000] Trying semicolon-delimited string format instead I0224 22:35:53.780192 1978 slave.cpp:464] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0224 22:35:53.780362 1978 slave.cpp:472] Slave attributes: [ ] I0224 22:35:53.780483 1978 slave.cpp:477] Slave hostname: maintenance-host I0224 22:35:53.782126 1967 state.cpp:58] Recovering state from '/tmp/MasterMaintenanceTest_InverseOffers_ywqvFF/meta' I0224 22:35:53.782892 1969 status_update_manager.cpp:200] Recovering status update manager I0224 22:35:53.783242 1969 slave.cpp:4565] Finished recovery I0224 22:35:53.784001 1969 slave.cpp:4737] Querying resource estimator for oversubscribable resources I0224 22:35:53.784678 1969 slave.cpp:796] New master detected at master@172.17.0.1:36678 I0224 22:35:53.784874 1967 status_update_manager.cpp:174] Pausing sending status updates I0224 22:35:53.784808 1969 slave.cpp:859] Authenticating with master master@172.17.0.1:36678 I0224 22:35:53.784945 1969 slave.cpp:864] Using default CRAM-MD5 authenticatee I0224 22:35:53.785181 1969 slave.cpp:832] Detecting new master I0224 22:35:53.785326 1969 slave.cpp:4751] Received oversubscribable resources from the resource estimator I0224 22:35:53.785557 1969 authenticatee.cpp:121] Creating new client SASL connection I0224 22:35:53.786227 1969 master.cpp:5526] Authenticating slave(115)@172.17.0.1:36678 I0224 22:35:53.786492 1969 authenticator.cpp:413] Starting authentication session for crammd5_authenticatee(298)@172.17.0.1:36678 I0224 22:35:53.786962 1969 authenticator.cpp:98] Creating new server SASL connection I0224 22:35:53.787274 1969 authenticatee.cpp:212] Received SASL authentication mechanisms: CRAM-MD5 I0224 22:35:53.787308 1969 authenticatee.cpp:238] Attempting to authenticate with mechanism 'CRAM-MD5' I0224 22:35:53.787400 1969 authenticator.cpp:203] Received SASL authentication start I0224 22:35:53.787470 1969 authenticator.cpp:325] Authentication requires more steps I0224 22:35:53.787884 1972 authenticatee.cpp:258] Received SASL authentication step I0224 22:35:53.787992 1972 authenticator.cpp:231] Received SASL authentication step I0224 22:35:53.788027 1972 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: '4db5fa128d2d' server FQDN: '4db5fa128d2d' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0224 22:35:53.788040 1972 auxprop.cpp:179] Looking up auxiliary property '*userPassword' I0224 22:35:53.788090 1972 auxprop.cpp:179] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0224 22:35:53.788122 1972 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: '4db5fa128d2d' server FQDN: '4db5fa128d2d' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0224 22:35:53.788136 1972 auxprop.cpp:129] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0224 22:35:53.788146 1972 auxprop.cpp:129] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0224 22:35:53.788164 1972 authenticator.cpp:317] Authentication success I0224 22:35:53.788331 1972 authenticatee.cpp:298] Authentication success I0224 22:35:53.788439 1972 master.cpp:5556] Successfully authenticated principal 'test-principal' at slave(115)@172.17.0.1:36678 I0224 22:35:53.788529 1972 authenticator.cpp:431] Authentication session cleanup for crammd5_authenticatee(298)@172.17.0.1:36678 I0224 22:35:53.788988 1972 slave.cpp:927] Successfully authenticated with master master@172.17.0.1:36678 I0224 22:35:53.789139 1972 slave.cpp:1321] Will retry registration in 1.535786ms if necessary I0224 22:35:53.789515 1972 master.cpp:4240] Registering slave at slave(115)@172.17.0.1:36678 (maintenance-host) with id aab18b61-7811-4c43-a672-d1a63818c880-S0 I0224 22:35:53.790577 1972 registrar.cpp:439] Applied 1 operations in 78745ns; attempting to update the 'registry' I0224 22:35:53.791128 1971 process.cpp:3141] Handling HTTP event for process 'master' with path: '/master/maintenance/schedule' I0224 22:35:53.791877 1971 http.cpp:501] HTTP POST for /master/maintenance/schedule from 172.17.0.1:45095 I0224 22:35:53.793313 1972 log.cpp:683] Attempting to append 343 bytes to the log I0224 22:35:53.793586 1972 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 3 I0224 22:35:53.794533 1971 replica.cpp:537] Replica received write request for position 3 from (4547)@172.17.0.1:36678 I0224 22:35:53.794862 1971 leveldb.cpp:341] Persisting action (362 bytes) to leveldb took 283614ns I0224 22:35:53.794893 1971 replica.cpp:712] Persisted action at 3 I0224 22:35:53.796646 1979 replica.cpp:691] Replica received learned notice for position 3 from @0.0.0.0:0 I0224 22:35:53.797102 1972 slave.cpp:1321] Will retry registration in 17.198963ms if necessary I0224 22:35:53.797186 1979 leveldb.cpp:341] Persisting action (364 bytes) to leveldb took 498502ns I0224 22:35:53.797230 1979 replica.cpp:712] Persisted action at 3 I0224 22:35:53.797260 1979 replica.cpp:697] Replica learned APPEND action at position 3 I0224 22:35:53.797417 1972 master.cpp:4228] Ignoring register slave message from slave(115)@172.17.0.1:36678 (maintenance-host) as admission is already in progress I0224 22:35:53.799119 1978 registrar.cpp:484] Successfully updated the 'registry' in 8.45824ms I0224 22:35:53.799613 1978 registrar.cpp:439] Applied 1 operations in 176193ns; attempting to update the 'registry' I0224 22:35:53.800472 1972 master.cpp:4308] Registered slave aab18b61-7811-4c43-a672-d1a63818c880-S0 at slave(115)@172.17.0.1:36678 (maintenance-host) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0224 22:35:53.800623 1978 log.cpp:702] Attempting to truncate the log to 3 I0224 22:35:53.801255 1969 hierarchical.cpp:473] Added slave aab18b61-7811-4c43-a672-d1a63818c880-S0 (maintenance-host) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (allocated: ) I0224 22:35:53.801301 1978 slave.cpp:971] Registered with master master@172.17.0.1:36678; given slave ID aab18b61-7811-4c43-a672-d1a63818c880-S0 I0224 22:35:53.801331 1978 fetcher.cpp:81] Clearing fetcher cache I0224 22:35:53.801431 1969 hierarchical.cpp:1434] No resources available to allocate! I0224 22:35:53.801466 1969 hierarchical.cpp:1147] Performed allocation for slave aab18b61-7811-4c43-a672-d1a63818c880-S0 in 162751ns I0224 22:35:53.801532 1969 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 4 I0224 22:35:53.801867 1978 slave.cpp:994] Checkpointing SlaveInfo to '/tmp/MasterMaintenanceTest_InverseOffers_ywqvFF/meta/slaves/aab18b61-7811-4c43-a672-d1a63818c880-S0/slave.info' I0224 22:35:53.801877 1969 status_update_manager.cpp:181] Resuming sending status updates I0224 22:35:53.802898 1977 replica.cpp:537] Replica received write request for position 4 from (4548)@172.17.0.1:36678 I0224 22:35:53.803252 1978 slave.cpp:1030] Forwarding total oversubscribed resources I0224 22:35:53.803640 1970 master.cpp:4649] Received update of slave aab18b61-7811-4c43-a672-d1a63818c880-S0 at slave(115)@172.17.0.1:36678 (maintenance-host) with total oversubscribed resources I0224 22:35:53.803858 1977 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 912626ns I0224 22:35:53.803889 1977 replica.cpp:712] Persisted action at 4 I0224 22:35:53.804144 1978 slave.cpp:3482] Received ping from slave-observer(117)@172.17.0.1:36678 I0224 22:35:53.804535 1971 hierarchical.cpp:531] Slave aab18b61-7811-4c43-a672-d1a63818c880-S0 (maintenance-host) updated with oversubscribed resources (total: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000], allocated: ) I0224 22:35:53.804684 1971 hierarchical.cpp:1434] No resources available to allocate! I0224 22:35:53.804714 1971 hierarchical.cpp:1147] Performed allocation for slave aab18b61-7811-4c43-a672-d1a63818c880-S0 in 131453ns I0224 22:35:53.805541 1967 replica.cpp:691] Replica received learned notice for position 4 from @0.0.0.0:0 I0224 22:35:53.805941 1967 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 366444ns I0224 22:35:53.806015 1967 leveldb.cpp:399] Deleting ~2 keys from leveldb took 42808ns I0224 22:35:53.806041 1967 replica.cpp:712] Persisted action at 4 I0224 22:35:53.806066 1967 replica.cpp:697] Replica learned TRUNCATE action at position 4 I0224 22:35:53.807355 1978 log.cpp:683] Attempting to append 465 bytes to the log I0224 22:35:53.807551 1978 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 5 I0224 22:35:53.809638 1979 replica.cpp:537] Replica received write request for position 5 from (4549)@172.17.0.1:36678 I0224 22:35:53.810858 1979 leveldb.cpp:341] Persisting action (484 bytes) to leveldb took 1.167663ms I0224 22:35:53.810904 1979 replica.cpp:712] Persisted action at 5 I0224 22:35:53.811997 1979 replica.cpp:691] Replica received learned notice for position 5 from @0.0.0.0:0 I0224 22:35:53...",1 MESOS-4771,"Document the network/cni isolator.","We need to document this isolator in mesos-containerizer.md (e.g., how to configure it, what's the pre-requisite, etc.)",3 MESOS-4772,"TaskInfo/ExecutorInfo should include fine-grained ownership/namespacing","We need a way to assign fine-grained ownership to tasks/executors so that multi-user frameworks can tell Mesos to associate the task with a user identity (rather than just the framework principal+role). Then, when an HTTP user requests to view the task's sandbox contents, or kill the task, or list all tasks, the authorizer can determine whether to allow/deny/filter the request based on finer-grained, user-level ownership. Some systems may want TaskInfo.owner to represent a group rather than an individual user. That's fine as long as the framework sets the field to the group ID in such a way that a group-aware authorizer can interpret it.",2 MESOS-4776,"Libprocess metrics/snapshot endpoint rate limiting should be configurable.","Currently the {{/metrics/snapshot}} endpoint in libprocess has a [hard-coded|https://github.com/apache/mesos/blob/0.27.1/3rdparty/libprocess/include/process/metrics/metrics.hpp#L52] rate limit of 2 requests per second: {code} MetricsProcess() : ProcessBase(""metrics""), limiter(2, Seconds(1)) {} {code} This should be configurable via a libprocess environment variable so that users can control this when initializing libprocess.",2 MESOS-4778,"Add appc/runtime isolator for runtime isolation for appc images.","Appc image also contains runtime information like 'exec', 'env', 'workingDirectory' etc. https://github.com/appc/spec/blob/master/spec/aci.md Similar to docker images, we need to support a subset of them (mainly 'exec', 'env' and 'workingDirectory').",13 MESOS-4780,"Remove `user` and `rootfs` flags in Windows launcher.",NULL,2 MESOS-4781,"Executor env variables should not be leaked to the command task.","Currently, command task inherits the env variables of the command executor. This is less ideal because the command executor environment variables include some Mesos internal env variables like MESOS_XXX and LIBPROCESS_XXX. Also, this behavior does not match what Docker containerizer does. We should construct the env variables from scratch for the command task, rather than relying on inheriting the env variables from the command executor.",3 MESOS-4783,"Disable rate limiting of the global metrics endpoint for mesos-tests execution","Once we can optionally disable rate limiting in the global metrics endpoint with MESOS-4776 we should disable the rate limiting during the execution of mesos-tests. * rate limiting makes it cumbersome to repeatedly hit the endpoint since one would not want to interfere with the rate limiting * rate limiting might incur additional wait time which might slown down tests",3 MESOS-4784,"SlaveTest.MetricsSlaveLaunchErrors test relies on implicit blocking behavior hitting the global metrics endpoint","The test attempts to observe a change in the {{slave/container_launch_errors}} metric, but does not wait for the triggering action to take place. Currently the test passes since hitting the endpoint blocks for some rate limit-related time which provides under many circumstances enough wait time for the action to take place. ",1 MESOS-4785,"Reorganize ACL subject/object descriptions.","The authorization documentation would benefit from a reorganization of the ACL subject/object descriptions. Instead of simple lists of the available subjects and objects, it would be nice to see a table showing which subject and object is used with each action.",5 MESOS-4787,"HTTP endpoint docs should use shorter paths","My understanding is that the recommended path for the v1 scheduler API is {{/api/v1/scheduler}}, but the HTTP endpoint [docs|http://mesos.apache.org/documentation/latest/endpoints/] for this endpoint list the path as {{/master/api/v1/scheduler}}; the filename of the doc page is also in the {{master}} subdirectory. Similarly, we document the master state endpoint as {{/master/state}}, whereas the preferred name is now just {{/state}}, and so on for most of the other endpoints. Unlike we the V1 API, we might want to consider backward compatibility and document both forms -- not sure. But certainly it seems like we should encourage people to use the shorter paths, not the longer ones.",2 MESOS-4790,"Revert external linkage of symbols in master/constants.hpp","src/master/constants.hpp contains: {code} // TODO(bmahler): It appears there may be a bug with gcc-4.1.2 in which the // duration constants were not being initialized when having static linkage. // This issue did not manifest in newer gcc's. Specifically, 4.2.1 was ok. // So we've moved these to have external linkage but perhaps in the future // we can revert this. {code} From commit 232a23b2a2e11f4e905b834aa2a11afe5bf6438a. We should investigate whether this is still necessary on supported compilers; it likely is not.",1 MESOS-4794,"Add documentation around using the docker containerizer on CentOS 6.","Support for persistent volumes was added to the docker containerizer in [MESOS-3413]. However, this does not work on CentOS 6. On CentOS 6, the same {{docker run -v ...}} operation does not perform a recursive bind, whereas on every other OS supported by Mesos, docker does a recursive bind. Docker has already [dropped support for CentOS 6|https://github.com/docker/docker/issues/14365], so we should add precautionary documentation in case anyone tries to use the docker containerizer on CentOS 6.",1 MESOS-4797,"Add a couple of registrar tests for /weights endpoint",NULL,2 MESOS-4798,"Make existing scheduler library tests use the callback interface.","We need to migrate the existing tests in {{src/tests/scheduler_tests.cpp}} to use the new callback interface introduced in {{MESOS-3339}}. The changes to {{src/tests/master_maintenance_tests.cpp}} would be done when MESOS-4831 is resolved. For an example see {{SchedulerTest.SchedulerFailover}} which already uses this new interface.",5 MESOS-4801,"Updated `createFrameworkInfo` for hierarchical_allocator_tests.cpp.","The function of {{createFrameworkInfo}} in hierarchical_allocator_tests.cpp should be updated by enabling caller can set a framework capability to create a framework which can use revocable resources.",1 MESOS-4802,"Update leveldb patch file to suport PowerPC LE","See: https://github.com/google/leveldb/releases/tag/v1.18 for improvements / bug fixes. The motivation is that leveldb 1.18 has officially supported IBM Power (ppc64le), so this is needed by [MESOS-4312|https://issues.apache.org/jira/browse/MESOS-4312]. Update: Since someone updated leveldb to 1.4, so I only update the patch file to support PowerPC LE. Because I don't think upgrade 3rdparty library frequently is a good thing.",3 MESOS-4803,"Update vendored libev to 4.22","The motivation is that libev 4.22 has officially supported IBM Power (ppc64le), so this is needed by [MESOS-4312|https://issues.apache.org/jira/browse/MESOS-4312].",3 MESOS-4805,"Update ry-http-parser-1c3624a to nodejs/http-parser 2.6.1","See https://github.com/nodejs/http-parser/releases/tag/v2.6.1. The motivation is that nodejs/http-parser 2.6.1 has officially supported IBM Power (ppc64le), so this is needed by [MESOS-4312|https://issues.apache.org/jira/browse/MESOS-4312].",3 MESOS-4807,"IOTest.BufferedRead writes to the current directory","libprocess's {{IOTest.BufferedRead}} writes to the current directory. This is bad for a number of reasons, e.g., * should the test fail data might be leaked to random locations, * the test cannot be executed from a write-only directory, or * executing the same test in parallel would race on the existence of the created file, and show bogus behavior. The test should probably be executed from a temporary directory, e.g., via stout's {{TemporaryDirectoryTest}} fixture.",1 MESOS-4810,"ProvisionerDockerPullerTest.ROOT_INTERNET_CURL_ShellCommand fails.","{noformat} [09:46:46] : [Step 11/11] [ RUN ] ProvisionerDockerRegistryPullerTest.ROOT_INTERNET_CURL_ShellCommand [09:46:46]W: [Step 11/11] I0229 09:46:46.628413 1166 leveldb.cpp:174] Opened db in 4.242882ms [09:46:46]W: [Step 11/11] I0229 09:46:46.629926 1166 leveldb.cpp:181] Compacted db in 1.483621ms [09:46:46]W: [Step 11/11] I0229 09:46:46.629966 1166 leveldb.cpp:196] Created db iterator in 15498ns [09:46:46]W: [Step 11/11] I0229 09:46:46.629977 1166 leveldb.cpp:202] Seeked to beginning of db in 1405ns [09:46:46]W: [Step 11/11] I0229 09:46:46.629984 1166 leveldb.cpp:271] Iterated through 0 keys in the db in 239ns [09:46:46]W: [Step 11/11] I0229 09:46:46.630015 1166 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned [09:46:46]W: [Step 11/11] I0229 09:46:46.630470 1183 recover.cpp:447] Starting replica recovery [09:46:46]W: [Step 11/11] I0229 09:46:46.630702 1180 recover.cpp:473] Replica is in EMPTY status [09:46:46]W: [Step 11/11] I0229 09:46:46.631767 1182 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (14567)@172.30.2.124:37431 [09:46:46]W: [Step 11/11] I0229 09:46:46.632115 1183 recover.cpp:193] Received a recover response from a replica in EMPTY status [09:46:46]W: [Step 11/11] I0229 09:46:46.632450 1186 recover.cpp:564] Updating replica status to STARTING [09:46:46]W: [Step 11/11] I0229 09:46:46.633476 1186 master.cpp:375] Master 3fbb2fb0-4f18-498b-a440-9acbf6923a13 (ip-172-30-2-124.mesosphere.io) started on 172.30.2.124:37431 [09:46:46]W: [Step 11/11] I0229 09:46:46.633491 1186 master.cpp:377] Flags at startup: --acls="""" --allocation_interval=""1secs"" --allocator=""HierarchicalDRF"" --authenticate=""true"" --authenticate_http=""true"" --authenticate_slaves=""true"" --authenticators=""crammd5"" --authorizers=""local"" --credentials=""/tmp/4UxXoW/credentials"" --framework_sorter=""drf"" --help=""false"" --hostname_lookup=""true"" --http_authenticators=""basic"" --initialize_driver_logging=""true"" --log_auto_initialize=""true"" --logbufsecs=""0"" --logging_level=""INFO"" --max_completed_frameworks=""50"" --max_completed_tasks_per_framework=""1000"" --max_slave_ping_timeouts=""5"" --quiet=""false"" --recovery_slave_removal_limit=""100%"" --registry=""replicated_log"" --registry_fetch_timeout=""1mins"" --registry_store_timeout=""100secs"" --registry_strict=""true"" --root_submissions=""true"" --slave_ping_timeout=""15secs"" --slave_reregister_timeout=""10mins"" --user_sorter=""drf"" --version=""false"" --webui_dir=""/usr/local/share/mesos/webui"" --work_dir=""/tmp/4UxXoW/master"" --zk_session_timeout=""10secs"" [09:46:46]W: [Step 11/11] I0229 09:46:46.633677 1186 master.cpp:422] Master only allowing authenticated frameworks to register [09:46:46]W: [Step 11/11] I0229 09:46:46.633685 1186 master.cpp:427] Master only allowing authenticated slaves to register [09:46:46]W: [Step 11/11] I0229 09:46:46.633692 1186 credentials.hpp:35] Loading credentials for authentication from '/tmp/4UxXoW/credentials' [09:46:46]W: [Step 11/11] I0229 09:46:46.633851 1183 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 1.191043ms [09:46:46]W: [Step 11/11] I0229 09:46:46.633873 1183 replica.cpp:320] Persisted replica status to STARTING [09:46:46]W: [Step 11/11] I0229 09:46:46.633894 1186 master.cpp:467] Using default 'crammd5' authenticator [09:46:46]W: [Step 11/11] I0229 09:46:46.634003 1186 master.cpp:536] Using default 'basic' HTTP authenticator [09:46:46]W: [Step 11/11] I0229 09:46:46.634062 1184 recover.cpp:473] Replica is in STARTING status [09:46:46]W: [Step 11/11] I0229 09:46:46.634109 1186 master.cpp:570] Authorization enabled [09:46:46]W: [Step 11/11] I0229 09:46:46.634249 1187 whitelist_watcher.cpp:77] No whitelist given [09:46:46]W: [Step 11/11] I0229 09:46:46.634255 1184 hierarchical.cpp:144] Initialized hierarchical allocator process [09:46:46]W: [Step 11/11] I0229 09:46:46.634884 1187 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (14569)@172.30.2.124:37431 [09:46:46]W: [Step 11/11] I0229 09:46:46.635278 1181 recover.cpp:193] Received a recover response from a replica in STARTING status [09:46:46]W: [Step 11/11] I0229 09:46:46.635742 1187 recover.cpp:564] Updating replica status to VOTING [09:46:46]W: [Step 11/11] I0229 09:46:46.636391 1180 master.cpp:1711] The newly elected leader is master@172.30.2.124:37431 with id 3fbb2fb0-4f18-498b-a440-9acbf6923a13 [09:46:46]W: [Step 11/11] I0229 09:46:46.636415 1180 master.cpp:1724] Elected as the leading master! [09:46:46]W: [Step 11/11] I0229 09:46:46.636430 1180 master.cpp:1469] Recovering from registrar [09:46:46]W: [Step 11/11] I0229 09:46:46.636554 1187 registrar.cpp:307] Recovering registrar [09:46:46]W: [Step 11/11] I0229 09:46:46.637111 1181 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 1.120322ms [09:46:46]W: [Step 11/11] I0229 09:46:46.637133 1181 replica.cpp:320] Persisted replica status to VOTING [09:46:46]W: [Step 11/11] I0229 09:46:46.637218 1186 recover.cpp:578] Successfully joined the Paxos group [09:46:46]W: [Step 11/11] I0229 09:46:46.637354 1186 recover.cpp:462] Recover process terminated [09:46:46]W: [Step 11/11] I0229 09:46:46.637715 1182 log.cpp:659] Attempting to start the writer [09:46:46]W: [Step 11/11] I0229 09:46:46.638617 1184 replica.cpp:493] Replica received implicit promise request from (14570)@172.30.2.124:37431 with proposal 1 [09:46:46]W: [Step 11/11] I0229 09:46:46.639700 1184 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 1.057386ms [09:46:46]W: [Step 11/11] I0229 09:46:46.639722 1184 replica.cpp:342] Persisted promised to 1 [09:46:46]W: [Step 11/11] I0229 09:46:46.640251 1184 coordinator.cpp:238] Coordinator attempting to fill missing positions [09:46:46]W: [Step 11/11] I0229 09:46:46.641274 1185 replica.cpp:388] Replica received explicit promise request from (14571)@172.30.2.124:37431 for position 0 with proposal 2 [09:46:46]W: [Step 11/11] I0229 09:46:46.642371 1185 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 1.061574ms [09:46:46]W: [Step 11/11] I0229 09:46:46.642396 1185 replica.cpp:712] Persisted action at 0 [09:46:46]W: [Step 11/11] I0229 09:46:46.643299 1186 replica.cpp:537] Replica received write request for position 0 from (14572)@172.30.2.124:37431 [09:46:46]W: [Step 11/11] I0229 09:46:46.643349 1186 leveldb.cpp:436] Reading position from leveldb took 21735ns [09:46:46]W: [Step 11/11] I0229 09:46:46.644448 1186 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 1.06671ms [09:46:46]W: [Step 11/11] I0229 09:46:46.644469 1186 replica.cpp:712] Persisted action at 0 [09:46:46]W: [Step 11/11] I0229 09:46:46.645077 1181 replica.cpp:691] Replica received learned notice for position 0 from @0.0.0.0:0 [09:46:46]W: [Step 11/11] I0229 09:46:46.646174 1181 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 1.069097ms [09:46:46]W: [Step 11/11] I0229 09:46:46.646198 1181 replica.cpp:712] Persisted action at 0 [09:46:46]W: [Step 11/11] I0229 09:46:46.646211 1181 replica.cpp:697] Replica learned NOP action at position 0 [09:46:46]W: [Step 11/11] I0229 09:46:46.646716 1182 log.cpp:675] Writer started with ending position 0 [09:46:46]W: [Step 11/11] I0229 09:46:46.647538 1183 leveldb.cpp:436] Reading position from leveldb took 21456ns [09:46:46]W: [Step 11/11] I0229 09:46:46.648298 1186 registrar.cpp:340] Successfully fetched the registry (0B) in 11.71072ms [09:46:46]W: [Step 11/11] I0229 09:46:46.648388 1186 registrar.cpp:439] Applied 1 operations in 21138ns; attempting to update the 'registry' [09:46:46]W: [Step 11/11] I0229 09:46:46.648947 1187 log.cpp:683] Attempting to append 210 bytes to the log [09:46:46]W: [Step 11/11] I0229 09:46:46.649050 1183 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 1 [09:46:46]W: [Step 11/11] I0229 09:46:46.649655 1187 replica.cpp:537] Replica received write request for position 1 from (14573)@172.30.2.124:37431 [09:46:46]W: [Step 11/11] I0229 09:46:46.650725 1187 leveldb.cpp:341] Persisting action (229 bytes) to leveldb took 1.041938ms [09:46:46]W: [Step 11/11] I0229 09:46:46.650748 1187 replica.cpp:712] Persisted action at 1 [09:46:46]W: [Step 11/11] I0229 09:46:46.651198 1181 replica.cpp:691] Replica received learned notice for position 1 from @0.0.0.0:0 [09:46:46]W: [Step 11/11] I0229 09:46:46.652312 1181 leveldb.cpp:341] Persisting action (231 bytes) to leveldb took 1.092268ms [09:46:46]W: [Step 11/11] I0229 09:46:46.652335 1181 replica.cpp:712] Persisted action at 1 [09:46:46]W: [Step 11/11] I0229 09:46:46.652349 1181 replica.cpp:697] Replica learned APPEND action at position 1 [09:46:46]W: [Step 11/11] I0229 09:46:46.653095 1187 registrar.cpp:484] Successfully updated the 'registry' in 4.664064ms [09:46:46]W: [Step 11/11] I0229 09:46:46.653236 1187 registrar.cpp:370] Successfully recovered registrar [09:46:46]W: [Step 11/11] I0229 09:46:46.653306 1181 log.cpp:702] Attempting to truncate the log to 1 [09:46:46]W: [Step 11/11] I0229 09:46:46.653476 1184 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 2 [09:46:46]W: [Step 11/11] I0229 09:46:46.653642 1183 master.cpp:1521] Recovered 0 slaves from the Registry (171B) ; allowing 10mins for slaves to re-register [09:46:46]W: [Step 11/11] I0229 09:46:46.653659 1181 hierarchical.cpp:171] Skipping recovery of hierarchical allocator: nothing to recover [09:46:46]W: [Step 11/11] I0229 09:46:46.654270 1181 replica.cpp:537] Replica received write request for position 2 from (14574)@172.30.2.124:37431 [09:46:46]W: [Step 11/11] I0229 09:46:46.655357 1181 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 1.055267ms [09:46:46]W: [Step 11/11] I0229 09:46:46.655378 1181 replica.cpp:712] Persisted action at 2 [09:46:46]W: [Step 11/11] I0229 09:46:46.655850 1184 replica.cpp:691] Replica received learned notice for position 2 from @0.0.0.0:0 [09:46:46]W: [Step 11/11] I0229 09:46:46.657009 1184 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 1.137223ms [09:46:46]W: [Step 11/11] I0229 09:46:46.657059 1184 leveldb.cpp:399] Deleting ~1 keys from leveldb took 26459ns [09:46:46]W: [Step 11/11] I0229 09:46:46.657074 1184 replica.cpp:712] Persisted action at 2 [09:46:46]W: [Step 11/11] I0229 09:46:46.657089 1184 replica.cpp:697] Replica learned TRUNCATE action at position 2 [09:46:46]W: [Step 11/11] I0229 09:46:46.665710 1166 containerizer.cpp:149] Using isolation: docker/runtime,filesystem/linux [09:46:46]W: [Step 11/11] I0229 09:46:46.672399 1166 linux_launcher.cpp:101] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher [09:46:46]W: [Step 11/11] E0229 09:46:46.676822 1166 shell.hpp:93] Command 'hadoop version 2>&1' failed; this is the output: [09:46:46]W: [Step 11/11] sh: hadoop: command not found [09:46:46]W: [Step 11/11] E0229 09:46:46.676851 1166 fetcher.cpp:58] Failed to create URI fetcher plugin 'hadoop': Failed to create HDFS client: Failed to execute 'hadoop version 2>&1'; the command was either not found or exited with a non-zero exit status: 127 [09:46:46]W: [Step 11/11] I0229 09:46:46.678383 1166 linux.cpp:81] Making '/tmp/ProvisionerDockerRegistryPullerTest_ROOT_INTERNET_CURL_ShellCommand_5BWCfv' a shared mount [09:46:46]W: [Step 11/11] I0229 09:46:46.687223 1180 slave.cpp:193] Slave started on 422)@172.30.2.124:37431 [09:46:46]W: [Step 11/11] I0229 09:46:46.687248 1180 slave.cpp:194] Flags at startup: --appc_simple_discovery_uri_prefix=""http://"" --appc_store_dir=""/tmp/mesos/store/appc"" --authenticatee=""crammd5"" --cgroups_cpu_enable_pids_and_tids_count=""false"" --cgroups_enable_cfs=""false"" --cgroups_hierarchy=""/sys/fs/cgroup"" --cgroups_limit_swap=""false"" --cgroups_root=""mesos"" --container_disk_watch_interval=""15secs"" --containerizers=""mesos"" --credential=""/tmp/ProvisionerDockerRegistryPullerTest_ROOT_INTERNET_CURL_ShellCommand_5BWCfv/credential"" --default_role=""*"" --disk_watch_interval=""1mins"" --docker=""docker"" --docker_kill_orphans=""true"" --docker_registry=""https://registry-1.docker.io"" --docker_remove_delay=""6hrs"" --docker_socket=""/var/run/docker.sock"" --docker_stop_timeout=""0ns"" --docker_store_dir=""/tmp/mesos/store/docker"" --enforce_container_disk_quota=""false"" --executor_registration_timeout=""1mins"" --executor_shutdown_grace_period=""5secs"" --fetcher_cache_dir=""/tmp/ProvisionerDockerRegistryPullerTest_ROOT_INTERNET_CURL_ShellCommand_5BWCfv/fetch"" --fetcher_cache_size=""2GB"" --frameworks_home="""" --gc_delay=""1weeks"" --gc_disk_headroom=""0.1"" --hadoop_home="""" --help=""false"" --hostname_lookup=""true"" --image_providers=""docker"" --image_provisioner_backend=""copy"" --initialize_driver_logging=""true"" --isolation=""docker/runtime,filesystem/linux"" --launcher_dir=""/mnt/teamcity/work/4240ba9ddd0997c3/build/src"" --logbufsecs=""0"" --logging_level=""INFO"" --oversubscribed_resources_interval=""15secs"" --perf_duration=""10secs"" --perf_interval=""1mins"" --qos_correction_interval_min=""0ns"" --quiet=""false"" --recover=""reconnect"" --recovery_timeout=""15mins"" --registration_backoff_factor=""10ms"" --resources=""cpus:2;mem:1024;disk:1024;ports:[31000-32000]"" --revocable_cpu_low_priority=""true"" --sandbox_directory=""/mnt/mesos/sandbox"" --strict=""true"" --switch_user=""true"" --systemd_enable_support=""true"" --systemd_runtime_directory=""/run/systemd/system"" --version=""false"" --work_dir=""/tmp/ProvisionerDockerRegistryPullerTest_ROOT_INTERNET_CURL_ShellCommand_5BWCfv"" [09:46:46]W: [Step 11/11] I0229 09:46:46.687531 1180 credentials.hpp:83] Loading credential for authentication from '/tmp/ProvisionerDockerRegistryPullerTest_ROOT_INTERNET_CURL_ShellCommand_5BWCfv/credential' [09:46:46]W: [Step 11/11] I0229 09:46:46.687666 1180 slave.cpp:324] Slave using credential for: test-principal [09:46:46]W: [Step 11/11] I0229 09:46:46.687798 1180 resources.cpp:572] Parsing resources as JSON failed: cpus:2;mem:1024;disk:1024;ports:[31000-32000] [09:46:46]W: [Step 11/11] Trying semicolon-delimited string format instead [09:46:46]W: [Step 11/11] I0229 09:46:46.688151 1180 slave.cpp:464] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] [09:46:46]W: [Step 11/11] I0229 09:46:46.688207 1180 slave.cpp:472] Slave attributes: [ ] [09:46:46]W: [Step 11/11] I0229 09:46:46.688217 1180 slave.cpp:477] Slave hostname: ip-172-30-2-124.mesosphere.io [09:46:46]W: [Step 11/11] I0229 09:46:46.689259 1187 state.cpp:58] Recovering state from '/tmp/ProvisionerDockerRegistryPullerTest_ROOT_INTERNET_CURL_ShellCommand_5BWCfv/meta' [09:46:46]W: [Step 11/11] I0229 09:46:46.689394 1166 sched.cpp:222] Version: 0.28.0 [09:46:46]W: [Step 11/11] I0229 09:46:46.689497 1180 status_update_manager.cpp:200] Recovering status update manager [09:46:46]W: [Step 11/11] I0229 09:46:46.689798 1182 containerizer.cpp:407] Recovering containerizer [09:46:46]W: [Step 11/11] I0229 09:46:46.690021 1186 sched.cpp:326] New master detected at master@172.30.2.124:37431 [09:46:46]W: [Step 11/11] I0229 09:46:46.690146 1186 sched.cpp:382] Authenticating with master master@172.30.2.124:37431 [09:46:46]W: [Step 11/11] I0229 09:46:46.690162 1186 sched.cpp:389] Using default CRAM-MD5 authenticatee [09:46:46]W: [Step 11/11] I0229 09:46:46.690378 1181 authenticatee.cpp:121] Creating new client SASL connection [09:46:46]W: [Step 11/11] I0229 09:46:46.690688 1186 master.cpp:5540] Authenticating scheduler-52603476-875a-49a8-85d4-c98d102cdfab@172.30.2.124:37431 [09:46:46]W: [Step 11/11] I0229 09:46:46.690801 1184 authenticator.cpp:413] Starting authentication session for crammd5_authenticatee(877)@172.30.2.124:37431 [09:46:46]W: [Step 11/11] I0229 09:46:46.691025 1181 authenticator.cpp:98] Creating new server SASL connection [09:46:46]W: [Step 11/11] I0229 09:46:46.691314 1180 authenticatee.cpp:212] Received SASL authentication mechanisms: CRAM-MD5 [09:46:46]W: [Step 11/11] I0229 09:46:46.691339 1180 authenticatee.cpp:238] Attempting to authenticate with mechanism 'CRAM-MD5' [09:46:46]W: [Step 11/11] I0229 09:46:46.691437 1180 authenticator.cpp:203] Received SASL authentication start [09:46:46]W: [Step 11/11] I0229 09:46:46.691490 1180 authenticator.cpp:325] Authentication requires more steps [09:46:46]W: [Step 11/11] I0229 09:46:46.691581 1180 authenticatee.cpp:258] Received SASL authentication step [09:46:46]W: [Step 11/11] I0229 09:46:46.691684 1180 authenticator.cpp:231] Received SASL authentication step [09:46:46]W: [Step 11/11] I0229 09:46:46.691712 1180 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: 'ip-172-30-2-124.mesosphere.io' server FQDN: 'ip-172-30-2-124.mesosphere.io' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false [09:46:46]W: [Step 11/11] I0229 09:46:46.691726 1180 auxprop.cpp:179] Looking up auxiliary property '*userPassword' [09:46:46]W: [Step 11/11] I0229 09:46:46.691768 1180 auxprop.cpp:179] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' [09:46:46]W: [Step 11/11] I0229 09:46:46.691802 1180 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: 'ip-172-30-2-124.mesosphere.io' server FQDN: 'ip-172-30-2-124.mesosphere.io' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true [09:46:46]W: [Step 11/11] I0229 09:46:46.691817 1180 auxprop.cpp:129] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true [09:46:46]W: [Step 11/11] I0229 09:46:46.691829 1180 auxprop.cpp:129] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true [09:46:46]W: [Step 11/11] I0229 09:46:46.691848 1180 authenticator.cpp:317] Authentication success [09:46:46]W: [Step 11/11] I0229 09:46:46.691944 1186 authenticatee.cpp:298] Authentication success [09:46:46]W: [Step 11/11] I0229 09:46:46.692011 1185 master.cpp:5570] Successfully authenticated principal 'test-principal' at scheduler-52603476-875a-49a8-85d4-c98d102cdfab@172.30.2.124:37431 [09:46:46]W: [Step 11/11] I0229 09:46:46.692056 1187 authenticator.cpp:431] Authentication session cleanup for crammd5_authenticatee(877)@172.30.2.124:37431 [09:46:46]W: [Step 11/11] I0229 09:46:46.692308 1184 sched.cpp:471] Successfully authenticated with master master@172.30.2.124:37431 [09:46:46]W: [Step 11/11] I0229 09:46:46.692325 1184 sched.cpp:776] Sending SUBSCRIBE call to master@172.30.2.124:37431 [09:46:46]W: [Step 11/11] I0229 09:46:46.692399 1184 sched.cpp:809] Will retry registration in 954.231367ms if necessary [09:46:46]W: [Step 11/11] I0229 09:46:46.692505 1183 master.cpp:2279] Received SUBSCRIBE call for framework 'default' at scheduler-52603476-875a-49a8-85d4-c98d102cdfab@172.30.2.124:37431 [09:46:46]W: [Step 11/11] I0229 09:46:46.692553 1183 master.cpp:1750] Authorizing framework principal 'test-principal' to receive offers for role '*' [09:46:46]W: [Step 11/11] I0229 09:46:46.692836 1184 master.cpp:2350] Subscribing framework default with checkpointing disabled and capabilities [ ] [09:46:46]W: [Step 11/11] I0229 09:46:46.692942 1183 metadata_manager.cpp:188] No images to load from disk. Docker provisioner image storage path '/tmp/mesos/store/docker/storedImages' does not exist [09:46:46]W: [Step 11/11] I0229 09:46:46.693208 1180 provisioner.cpp:245] Provisioner recovery complete [09:46:46]W: [Step 11/11] I0229 09:46:46.693295 1186 hierarchical.cpp:265] Added framework 3fbb2fb0-4f18-498b-a440-9acbf6923a13-0000 [09:46:46]W: [Step 11/11] I0229 09:46:46.693357 1186 hierarchical.cpp:1437] No resources available to allocate! [09:46:46]W: [Step 11/11] I0229 09:46:46.693397 1186 hierarchical.cpp:1532] No inverse offers to send out! [09:46:46]W: [Step 11/11] I0229 09:46:46.693424 1186 ...",3 MESOS-4813,"Implement base tests for unified container using local puller.","Using command line executor to test shell commands with local docker images.",2 MESOS-4817,"Remove internal usage of deprecated *.json endpoints.","We still use the deprecated *.json internally (UI, tests, documentation). ",3 MESOS-4818,"Add end to end testing for Appc images.","Add tests that covers integration test of the Appc provisioner feature with mesos containerizer. ",3 MESOS-4819,"Add documentation for Appc image discovery.","Add documentation for the Appc image discovery feature that covers: - Use case - Implementation detail (Simple discovery).",3 MESOS-4820,"Need to set `EXPOSED` ports from docker images into `ContainerConfig`","Most docker images have an `EXPOSE` command associated with them. This tells the container run-time the TCP ports that the micro-service ""wishes"" to expose to the outside world. With the `Unified containerizer` project since `MesosContainerizer` is going to natively support docker images it is imperative that the Mesos container run time have a mechanism to expose ports listed in a Docker image. The first step to achieve this is to extract this information from the `Docker` image and set in the `ContainerConfig` . The `ContainerConfig` can then be used to pass this information to any isolator (for e.g. `network/cni` isolator) that will install port forwarding rules to expose the desired ports.",1 MESOS-4821,"Introduce a port field in `ImageManifest` in order to set exposed ports for a container.","Networking isolators such as `network/cni` need to learn about ports that a container wishes to be exposed to the outside world. This can be achieved by adding a field to the `ImageManifest` protobuf and allowing the `ImageProvisioner` to set these fields to inform the isolator of the ports that the container wishes to be exposed. ",1 MESOS-4822,"Add support for local image fetching in Appc provisioner.","Currently Appc image provisioner supports http(s) fetching. It would be valuable to add support for local file path(URI) based fetching.",2 MESOS-4823,"Implement port forwarding in `network/cni` isolator","Most docker and appc images wish to expose ports that micro-services are listening on, to the outside world. When containers are running on bridged (or ptp) networking this can be achieved by installing port forwarding rules on the agent (using iptables). This can be done in the `network/cni` isolator. The reason we would like this functionality to be implemented in the `network/cni` isolator, and not a CNI plugin, is that the specifications currently do not support specifying port forwarding rules. Further, to install these rules the isolator needs two pieces of information, the exposed ports and the IP address associated with the container. Bother are available to the isolator.",2 MESOS-4824,"""filesystem/linux"" isolator does not unmount orphaned persistent volumes","A persistent volume can be orphaned when: # A framework registers with checkpointing enabled. # The framework starts a task + a persistent volume. # The agent exits. The task continues to run. # Something wipes the agent's {{meta}} directory. This removes the checkpointed framework info from the agent. # The agent comes back and recovers. The framework for the task is not found, so the task is considered orphaned now. The agent currently does not unmount the persistent volume, saying (with {{GLOG_v=1}}) {code} I0229 23:55:42.078940 5635 linux.cpp:711] Ignoring cleanup request for unknown container: a35189d3-85d5-4d02-b568-67f675b6dc97 {code} Test implemented here: https://reviews.apache.org/r/44122/",2 MESOS-4825,"Master's slave reregister logic does not update version field","The master's logic for reregistering a slave does not update the version field if the slave re-registers with a new version.",1 MESOS-4829,"Remove `grace_period_seconds` field from Shutdown event v1 protobuf.","There are two ways in which a shutdown of executor can be triggered: 1. If it receives an explicit `Shutdown` message from the agent. 2. If the recovery timeout period has elapsed, and the executor still hasn’t been able to (re-)connect with the agent. Currently, the executor library relies on the field `grace_period_seconds` having a default value of 5 seconds to handle the second scenario. https://github.com/apache/mesos/blob/master/src/executor/executor.cpp#L608 The driver used to trigger the grace period via a constant defined in src/slave/constants.cpp. https://github.com/apache/mesos/blob/master/src/exec/exec.cpp#L92 The agent may want to force a shorter shutdown grace period (e.g. oversubscription eviction may have shorter deadline) in the future. For now, we can just read the value via an environment variable.",3 MESOS-4830,"Bind docker runtime isolator with docker image provider.","If image provider is specified as `docker` but docker/runtime is not set, it would be not meaningful, because of no executables. A check should be added to make sure docker runtime isolator is on if using docker as image provider.",1 MESOS-4832,"DockerContainerizerTest.ROOT_DOCKER_RecoverOrphanedPersistentVolumes exits when the /tmp directory is bind-mounted","If the {{/tmp}} directory (where Mesos tests create temporary directories) is a bind mount, the test suite will exit here: {code} [ RUN ] DockerContainerizerTest.ROOT_DOCKER_RecoverOrphanedPersistentVolumes I0226 03:17:26.722806 1097 leveldb.cpp:174] Opened db in 12.587676ms I0226 03:17:26.723496 1097 leveldb.cpp:181] Compacted db in 636999ns I0226 03:17:26.723536 1097 leveldb.cpp:196] Created db iterator in 18271ns I0226 03:17:26.723547 1097 leveldb.cpp:202] Seeked to beginning of db in 1555ns I0226 03:17:26.723554 1097 leveldb.cpp:271] Iterated through 0 keys in the db in 363ns I0226 03:17:26.723593 1097 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0226 03:17:26.724128 1117 recover.cpp:447] Starting replica recovery I0226 03:17:26.724367 1117 recover.cpp:473] Replica is in EMPTY status I0226 03:17:26.725237 1117 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (13810)@172.30.2.151:51934 I0226 03:17:26.725744 1114 recover.cpp:193] Received a recover response from a replica in EMPTY status I0226 03:17:26.726356 1111 master.cpp:376] Master 5cc57c0e-f1ad-4107-893f-420ed1a1db1a (ip-172-30-2-151.mesosphere.io) started on 172.30.2.151:51934 I0226 03:17:26.726369 1118 recover.cpp:564] Updating replica status to STARTING I0226 03:17:26.726378 1111 master.cpp:378] Flags at startup: --acls="""" --allocation_interval=""1secs"" --allocator=""HierarchicalDRF"" --authenticate=""true"" --authenticate_http=""true"" --authenticate_slaves=""true"" --authenticators=""crammd5"" --authorizers=""local"" --credentials=""/tmp/djHTVQ/credentials"" --framework_sorter=""drf"" --help=""false"" --hostname_lookup=""true"" --http_authenticators=""basic"" --initialize_driver_logging=""true"" --log_auto_initialize=""true"" --logbufsecs=""0"" --logging_level=""INFO"" --max_completed_frameworks=""50"" --max_completed_tasks_per_framework=""1000"" --max_slave_ping_timeouts=""5"" --quiet=""false"" --recovery_slave_removal_limit=""100%"" --registry=""replicated_log"" --registry_fetch_timeout=""1mins"" --registry_store_timeout=""100secs"" --registry_strict=""true"" --root_submissions=""true"" --slave_ping_timeout=""15secs"" --slave_reregister_timeout=""10mins"" --user_sorter=""drf"" --version=""false"" --webui_dir=""/usr/local/share/mesos/webui"" --work_dir=""/tmp/djHTVQ/master"" --zk_session_timeout=""10secs"" I0226 03:17:26.726605 1111 master.cpp:423] Master only allowing authenticated frameworks to register I0226 03:17:26.726616 1111 master.cpp:428] Master only allowing authenticated slaves to register I0226 03:17:26.726632 1111 credentials.hpp:35] Loading credentials for authentication from '/tmp/djHTVQ/credentials' I0226 03:17:26.726860 1111 master.cpp:468] Using default 'crammd5' authenticator I0226 03:17:26.726977 1111 master.cpp:537] Using default 'basic' HTTP authenticator I0226 03:17:26.727092 1111 master.cpp:571] Authorization enabled I0226 03:17:26.727243 1118 hierarchical.cpp:144] Initialized hierarchical allocator process I0226 03:17:26.727285 1116 whitelist_watcher.cpp:77] No whitelist given I0226 03:17:26.728852 1114 master.cpp:1712] The newly elected leader is master@172.30.2.151:51934 with id 5cc57c0e-f1ad-4107-893f-420ed1a1db1a I0226 03:17:26.728876 1114 master.cpp:1725] Elected as the leading master! I0226 03:17:26.728891 1114 master.cpp:1470] Recovering from registrar I0226 03:17:26.728977 1117 registrar.cpp:307] Recovering registrar I0226 03:17:26.731503 1112 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 4.977811ms I0226 03:17:26.731539 1112 replica.cpp:320] Persisted replica status to STARTING I0226 03:17:26.731711 1111 recover.cpp:473] Replica is in STARTING status I0226 03:17:26.732501 1114 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (13812)@172.30.2.151:51934 I0226 03:17:26.732862 1111 recover.cpp:193] Received a recover response from a replica in STARTING status I0226 03:17:26.733264 1117 recover.cpp:564] Updating replica status to VOTING I0226 03:17:26.733836 1118 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 388246ns I0226 03:17:26.733855 1118 replica.cpp:320] Persisted replica status to VOTING I0226 03:17:26.733979 1113 recover.cpp:578] Successfully joined the Paxos group I0226 03:17:26.734149 1113 recover.cpp:462] Recover process terminated I0226 03:17:26.734478 1111 log.cpp:659] Attempting to start the writer I0226 03:17:26.735523 1114 replica.cpp:493] Replica received implicit promise request from (13813)@172.30.2.151:51934 with proposal 1 I0226 03:17:26.736130 1114 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 576451ns I0226 03:17:26.736150 1114 replica.cpp:342] Persisted promised to 1 I0226 03:17:26.736709 1115 coordinator.cpp:238] Coordinator attempting to fill missing positions I0226 03:17:26.737771 1114 replica.cpp:388] Replica received explicit promise request from (13814)@172.30.2.151:51934 for position 0 with proposal 2 I0226 03:17:26.738386 1114 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 583184ns I0226 03:17:26.738404 1114 replica.cpp:712] Persisted action at 0 I0226 03:17:26.739312 1118 replica.cpp:537] Replica received write request for position 0 from (13815)@172.30.2.151:51934 I0226 03:17:26.739367 1118 leveldb.cpp:436] Reading position from leveldb took 26157ns I0226 03:17:26.740638 1118 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 1.238477ms I0226 03:17:26.740669 1118 replica.cpp:712] Persisted action at 0 I0226 03:17:26.741158 1118 replica.cpp:691] Replica received learned notice for position 0 from @0.0.0.0:0 I0226 03:17:26.742878 1118 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 1.697254ms I0226 03:17:26.742902 1118 replica.cpp:712] Persisted action at 0 I0226 03:17:26.742916 1118 replica.cpp:697] Replica learned NOP action at position 0 I0226 03:17:26.743393 1117 log.cpp:675] Writer started with ending position 0 I0226 03:17:26.744370 1112 leveldb.cpp:436] Reading position from leveldb took 34329ns I0226 03:17:26.745240 1117 registrar.cpp:340] Successfully fetched the registry (0B) in 16.21888ms I0226 03:17:26.745350 1117 registrar.cpp:439] Applied 1 operations in 30460ns; attempting to update the 'registry' I0226 03:17:26.746016 1111 log.cpp:683] Attempting to append 210 bytes to the log I0226 03:17:26.746119 1116 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 1 I0226 03:17:26.746798 1114 replica.cpp:537] Replica received write request for position 1 from (13816)@172.30.2.151:51934 I0226 03:17:26.747251 1114 leveldb.cpp:341] Persisting action (229 bytes) to leveldb took 411333ns I0226 03:17:26.747269 1114 replica.cpp:712] Persisted action at 1 I0226 03:17:26.747808 1113 replica.cpp:691] Replica received learned notice for position 1 from @0.0.0.0:0 I0226 03:17:26.749511 1113 leveldb.cpp:341] Persisting action (231 bytes) to leveldb took 1.673488ms I0226 03:17:26.749534 1113 replica.cpp:712] Persisted action at 1 I0226 03:17:26.749550 1113 replica.cpp:697] Replica learned APPEND action at position 1 I0226 03:17:26.750422 1111 registrar.cpp:484] Successfully updated the 'registry' in 5.021952ms I0226 03:17:26.750560 1111 registrar.cpp:370] Successfully recovered registrar I0226 03:17:26.750635 1112 log.cpp:702] Attempting to truncate the log to 1 I0226 03:17:26.750751 1113 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 2 I0226 03:17:26.751096 1116 master.cpp:1522] Recovered 0 slaves from the Registry (171B) ; allowing 10mins for slaves to re-register I0226 03:17:26.751126 1111 hierarchical.cpp:171] Skipping recovery of hierarchical allocator: nothing to recover I0226 03:17:26.751561 1118 replica.cpp:537] Replica received write request for position 2 from (13817)@172.30.2.151:51934 I0226 03:17:26.751999 1118 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 406823ns I0226 03:17:26.752018 1118 replica.cpp:712] Persisted action at 2 I0226 03:17:26.752521 1113 replica.cpp:691] Replica received learned notice for position 2 from @0.0.0.0:0 I0226 03:17:26.754161 1113 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 1.614888ms I0226 03:17:26.754210 1113 leveldb.cpp:399] Deleting ~1 keys from leveldb took 26384ns I0226 03:17:26.754225 1113 replica.cpp:712] Persisted action at 2 I0226 03:17:26.754240 1113 replica.cpp:697] Replica learned TRUNCATE action at position 2 I0226 03:17:26.765103 1115 slave.cpp:193] Slave started on 399)@172.30.2.151:51934 I0226 03:17:26.765130 1115 slave.cpp:194] Flags at startup: --appc_simple_discovery_uri_prefix=""http://"" --appc_store_dir=""/tmp/mesos/store/appc"" --authenticatee=""crammd5"" --cgroups_cpu_enable_pids_and_tids_count=""false"" --cgroups_enable_cfs=""false"" --cgroups_hierarchy=""/sys/fs/cgroup"" --cgroups_limit_swap=""false"" --cgroups_root=""mesos"" --container_disk_watch_interval=""15secs"" --containerizers=""mesos"" --credential=""/tmp/DockerContainerizerTest_ROOT_DOCKER_RecoverOrphanedPersistentVolumes_aJOesP/credential"" --default_role=""*"" --disk_watch_interval=""1mins"" --docker=""docker"" --docker_kill_orphans=""true"" --docker_registry=""https://registry-1.docker.io"" --docker_remove_delay=""6hrs"" --docker_socket=""/var/run/docker.sock"" --docker_stop_timeout=""0ns"" --docker_store_dir=""/tmp/mesos/store/docker"" --enforce_container_disk_quota=""false"" --executor_registration_timeout=""1mins"" --executor_shutdown_grace_period=""5secs"" --fetcher_cache_dir=""/tmp/DockerContainerizerTest_ROOT_DOCKER_RecoverOrphanedPersistentVolumes_aJOesP/fetch"" --fetcher_cache_size=""2GB"" --frameworks_home="""" --gc_delay=""1weeks"" --gc_disk_headroom=""0.1"" --hadoop_home="""" --help=""false"" --hostname_lookup=""true"" --image_provisioner_backend=""copy"" --initialize_driver_logging=""true"" --isolation=""posix/cpu,posix/mem"" --launcher_dir=""/mnt/teamcity/work/4240ba9ddd0997c3/build/src"" --logbufsecs=""0"" --logging_level=""INFO"" --oversubscribed_resources_interval=""15secs"" --perf_duration=""10secs"" --perf_interval=""1mins"" --qos_correction_interval_min=""0ns"" --quiet=""false"" --recover=""reconnect"" --recovery_timeout=""15mins"" --registration_backoff_factor=""10ms"" --resources=""cpu:2;mem:2048;disk(role1):2048"" --revocable_cpu_low_priority=""true"" --sandbox_directory=""/mnt/mesos/sandbox"" --strict=""true"" --switch_user=""true"" --systemd_enable_support=""true"" --systemd_runtime_directory=""/run/systemd/system"" --version=""false"" --work_dir=""/tmp/DockerContainerizerTest_ROOT_DOCKER_RecoverOrphanedPersistentVolumes_aJOesP"" I0226 03:17:26.765403 1115 credentials.hpp:83] Loading credential for authentication from '/tmp/DockerContainerizerTest_ROOT_DOCKER_RecoverOrphanedPersistentVolumes_aJOesP/credential' I0226 03:17:26.765573 1115 slave.cpp:324] Slave using credential for: test-principal I0226 03:17:26.765733 1115 resources.cpp:576] Parsing resources as JSON failed: cpu:2;mem:2048;disk(role1):2048 Trying semicolon-delimited string format instead I0226 03:17:26.766185 1115 slave.cpp:464] Slave resources: cpu(*):2; mem(*):2048; disk(role1):2048; cpus(*):8; ports(*):[31000-32000] I0226 03:17:26.766242 1115 slave.cpp:472] Slave attributes: [ ] I0226 03:17:26.766250 1115 slave.cpp:477] Slave hostname: ip-172-30-2-151.mesosphere.io I0226 03:17:26.767325 1097 sched.cpp:222] Version: 0.28.0 I0226 03:17:26.767390 1111 state.cpp:58] Recovering state from '/tmp/DockerContainerizerTest_ROOT_DOCKER_RecoverOrphanedPersistentVolumes_aJOesP/meta' I0226 03:17:26.767603 1115 status_update_manager.cpp:200] Recovering status update manager I0226 03:17:26.767865 1113 docker.cpp:726] Recovering Docker containers I0226 03:17:26.767971 1111 sched.cpp:326] New master detected at master@172.30.2.151:51934 I0226 03:17:26.768045 1111 sched.cpp:382] Authenticating with master master@172.30.2.151:51934 I0226 03:17:26.768059 1111 sched.cpp:389] Using default CRAM-MD5 authenticatee I0226 03:17:26.768070 1118 slave.cpp:4565] Finished recovery I0226 03:17:26.768273 1112 authenticatee.cpp:121] Creating new client SASL connection I0226 03:17:26.768435 1118 slave.cpp:4737] Querying resource estimator for oversubscribable resources I0226 03:17:26.768565 1111 master.cpp:5526] Authenticating scheduler-c59020d6-385e-48a3-8a10-9e5c3f1dbd92@172.30.2.151:51934 I0226 03:17:26.768661 1118 slave.cpp:796] New master detected at master@172.30.2.151:51934 I0226 03:17:26.768659 1115 authenticator.cpp:413] Starting authentication session for crammd5_authenticatee(839)@172.30.2.151:51934 I0226 03:17:26.768679 1113 status_update_manager.cpp:174] Pausing sending status updates I0226 03:17:26.768728 1118 slave.cpp:859] Authenticating with master master@172.30.2.151:51934 I0226 03:17:26.768743 1118 slave.cpp:864] Using default CRAM-MD5 authenticatee I0226 03:17:26.768865 1118 slave.cpp:832] Detecting new master I0226 03:17:26.768868 1112 authenticator.cpp:98] Creating new server SASL connection I0226 03:17:26.768908 1114 authenticatee.cpp:121] Creating new client SASL connection I0226 03:17:26.769003 1118 slave.cpp:4751] Received oversubscribable resources from the resource estimator I0226 03:17:26.769103 1115 authenticatee.cpp:212] Received SASL authentication mechanisms: CRAM-MD5 I0226 03:17:26.769131 1115 authenticatee.cpp:238] Attempting to authenticate with mechanism 'CRAM-MD5' I0226 03:17:26.769209 1116 master.cpp:5526] Authenticating slave(399)@172.30.2.151:51934 I0226 03:17:26.769253 1114 authenticator.cpp:203] Received SASL authentication start I0226 03:17:26.769295 1115 authenticator.cpp:413] Starting authentication session for crammd5_authenticatee(840)@172.30.2.151:51934 I0226 03:17:26.769307 1114 authenticator.cpp:325] Authentication requires more steps I0226 03:17:26.769403 1117 authenticatee.cpp:258] Received SASL authentication step I0226 03:17:26.769495 1114 authenticator.cpp:98] Creating new server SASL connection I0226 03:17:26.769531 1115 authenticator.cpp:231] Received SASL authentication step I0226 03:17:26.769554 1115 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: 'ip-172-30-2-151.mesosphere.io' server FQDN: 'ip-172-30-2-151.mesosphere.io' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0226 03:17:26.769562 1115 auxprop.cpp:179] Looking up auxiliary property '*userPassword' I0226 03:17:26.769608 1115 auxprop.cpp:179] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0226 03:17:26.769629 1115 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: 'ip-172-30-2-151.mesosphere.io' server FQDN: 'ip-172-30-2-151.mesosphere.io' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0226 03:17:26.769637 1115 auxprop.cpp:129] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0226 03:17:26.769642 1115 auxprop.cpp:129] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0226 03:17:26.769654 1115 authenticator.cpp:317] Authentication success I0226 03:17:26.769728 1117 authenticatee.cpp:298] Authentication success I0226 03:17:26.769769 1112 authenticatee.cpp:212] Received SASL authentication mechanisms: CRAM-MD5 I0226 03:17:26.769767 1118 master.cpp:5556] Successfully authenticated principal 'test-principal' at scheduler-c59020d6-385e-48a3-8a10-9e5c3f1dbd92@172.30.2.151:51934 I0226 03:17:26.769803 1112 authenticatee.cpp:238] Attempting to authenticate with mechanism 'CRAM-MD5' I0226 03:17:26.769798 1114 authenticator.cpp:431] Authentication session cleanup for crammd5_authenticatee(839)@172.30.2.151:51934 I0226 03:17:26.769881 1112 authenticator.cpp:203] Received SASL authentication start I0226 03:17:26.769932 1112 authenticator.cpp:325] Authentication requires more steps I0226 03:17:26.769981 1117 sched.cpp:471] Successfully authenticated with master master@172.30.2.151:51934 I0226 03:17:26.770004 1117 sched.cpp:776] Sending SUBSCRIBE call to master@172.30.2.151:51934 I0226 03:17:26.770064 1118 authenticatee.cpp:258] Received SASL authentication step I0226 03:17:26.770102 1117 sched.cpp:809] Will retry registration in 1.937819802secs if necessary I0226 03:17:26.770165 1115 authenticator.cpp:231] Received SASL authentication step I0226 03:17:26.770193 1115 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: 'ip-172-30-2-151.mesosphere.io' server FQDN: 'ip-172-30-2-151.mesosphere.io' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0226 03:17:26.770207 1115 auxprop.cpp:179] Looking up auxiliary property '*userPassword' I0226 03:17:26.770213 1116 master.cpp:2280] Received SUBSCRIBE call for framework 'default' at scheduler-c59020d6-385e-48a3-8a10-9e5c3f1dbd92@172.30.2.151:51934 I0226 03:17:26.770241 1115 auxprop.cpp:179] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0226 03:17:26.770274 1115 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: 'ip-172-30-2-151.mesosphere.io' server FQDN: 'ip-172-30-2-151.mesosphere.io' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0226 03:17:26.770277 1116 master.cpp:1751] Authorizing framework principal 'test-principal' to receive offers for role 'role1' I0226 03:17:26.770298 1115 auxprop.cpp:129] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0226 03:17:26.770331 1115 auxprop.cpp:129] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0226 03:17:26.770349 1115 authenticator.cpp:317] Authentication success I0226 03:17:26.770428 1118 authenticatee.cpp:298] Authentication success I0226 03:17:26.770442 1116 master.cpp:5556] Successfully authenticated principal 'test-principal' at slave(399)@172.30.2.151:51934 I0226 03:17:26.770547 1116 authenticator.cpp:431] Authentication session cleanup for crammd5_authenticatee(840)@172.30.2.151:51934 I0226 03:17:26.770846 1116 master.cpp:2351] Subscribing framework default with checkpointing enabled and capabilities [ ] I0226 03:17:26.770866 1118 slave.cpp:927] Successfully authenticated with master master@172.30.2.151:51934 I0226 03:17:26.770966 1118 slave.cpp:1321] Will retry registration in 1.453415ms if necessary I0226 03:17:26.771225 1115 hierarchical.cpp:265] Added framework 5cc57c0e-f1ad-4107-893f-420ed1a1db1a-0000 I0226 03:17:26.771275 1118 sched.cpp:703] Framework registered with 5cc57c0e-f1ad-4107-893f-420ed1a1db1a-0000 I0226 03:17:26.771299 1115 hierarchical.cpp:1434] No resources available to allocate! I0226 03:17:26.771328 1115 hierarchical.cpp:1529] No inverse offers to send out! I0226 03:17:26.771344 1118 sched.cpp:717] Scheduler::registered took 50146ns I0226 03:17:26.771356 1116 master.cpp:4240] Registering slave at slave(399)@172.30.2.151:51934 (ip-172-30-2-151.mesosphere.io) with id 5cc57c0e-f1ad-4107-893f-420ed1a1db1a-S0 I0226 03:17:26.771348 1115 hierarchical.cpp:1127] Performed allocation for 0 slaves in 101438ns I0226 03:17:26.771860 1114 registrar.cpp:439] Applied 1 operations in 59672ns; attempting to update the 'registry' I0226 03:17:26.772645 1117 log.cpp:683] Attempting to append 423 bytes to the log I0226 03:17:26.772758 1112 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 3 I0226 03:17:26.773435 1117 replica.cpp:537] Replica received write request for position 3 from (13824)@172.30.2.151:51934 I0226 03:17:26.773586 1111 slave.cpp:1321] Will retry registration in 2.74261ms if necessary I0226 03:17:26.773682 1115 master.cpp:4228] Ignoring register slave message from slave(399)@172.30.2.151:51934 (ip-172-30-2-151.mesosphere.io) as admission is already in progress I0226 03:17:26.773937 1117 leveldb.cpp:341] Persisting action (442 bytes) to level...",2 MESOS-4833,"Poor allocator performance with labeled resources and/or persistent volumes","Modifying the {{HierarchicalAllocator_BENCHMARK_Test.ResourceLabels}} benchmark from https://reviews.apache.org/r/43686/ to use distinct labels between different slaves, performance regresses from ~2 seconds to ~3 minutes. The culprit seems to be the way in which the allocator merges together resources; reserved resource labels (or persistent volume IDs) inhibit merging, which causes performance to be much worse.",5 MESOS-4834,"Add 'file' fetcher plugin.","Add support for ""file"" based URI fetcher. This could be useful for container image provisioning from local file system.",2 MESOS-4835,"CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyTracedProcess is flaky","Verbose logs: {code} [ RUN ] CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyTracedProcess I0302 00:43:14.127846 11755 cgroups.cpp:2427] Freezing cgroup /sys/fs/cgroup/freezer/mesos_test I0302 00:43:14.267411 11758 cgroups.cpp:1409] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos_test after 139.46496ms I0302 00:43:14.409395 11751 cgroups.cpp:2445] Thawing cgroup /sys/fs/cgroup/freezer/mesos_test I0302 00:43:14.551304 11751 cgroups.cpp:1438] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos_test after 141.811968ms ../../src/tests/containerizer/cgroups_tests.cpp:949: Failure Value of: ::waitpid(pid, &status, 0) Actual: 23809 Expected: -1 ../../src/tests/containerizer/cgroups_tests.cpp:950: Failure Value of: (*__errno_location ()) Actual: 0 Expected: 10 [ FAILED ] CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyTracedProcess (1055 ms) {code}",2 MESOS-4836,"Fix rmdir for windows","This is due to a bug in MESOS-4415 that landed for 0.27.0.",1 MESOS-4839,"Move placement new processes into the freezer cgroup into a parent hook.","The Linux Launcher places new processes into the freezer cgroup. This is currently done by a combination of childSetup function (blocking the new process until parent is done) and the parent (placing child process into the cgroup and then signaling child to continue). ParentHooks support this behavior (blocking child until some work is done in the parent) in a much cleaner way. ",3 MESOS-4840,"Remove internal usage of deprecated ShutdownFramework ACL","{{ShutdownFramework}} acl was deprecated a couple of versions ago in favor of the {{TeardownFramework}} message. Its deprecation cycle came with 0.27. That means we should remove the message and its references in the code base.",2 MESOS-4844,"Add authentication to master endpoints","Before we can add authorization around operator endpoints, we need to add authentication support, so that unauthenticated requests are denied when --authenticate_http is enabled, and so that the principal is passed into `route()`.",2 MESOS-4848,"Agent Authn Research Spike","Research the master authentication flags to see what changes will be necessary for agent http authentication. Write up a 1-2 page summary/design doc.",2 MESOS-4849,"Add agent flags for HTTP authentication","Flags should be added to the agent to: 1. Enable HTTP authentication ({{--authenticate_http}}) 2. Specify credentials ({{--http_credentials}}) 3. Specify HTTP authenticators ({{--authenticators}})",2 MESOS-4850,"Add authentication to agent endpoints /state and /flags","The {{/state}} and {{/flags}} endpoints are installed in {{src/slave/slave.cpp}}, and thus are straightforward to make authenticated. Other agent endpoints require a bit more consideration, and are tracked in MESOS-4902. For more information on agent endpoints, see http://mesos.apache.org/documentation/latest/endpoints/ or search for `route(` in the source code: {code} $ grep -rn ""route("" src/ |grep -v master |grep -v tests |grep -v json src/version/version.cpp:75: route(""/"", VERSION_HELP(), &VersionProcess::version); src/files/files.cpp:150: route(""/browse"", src/files/files.cpp:153: route(""/read"", src/files/files.cpp:156: route(""/download"", src/files/files.cpp:159: route(""/debug"", src/slave/slave.cpp:580: route(""/api/v1/executor"", src/slave/slave.cpp:595: route(""/state"", src/slave/slave.cpp:601: route(""/flags"", src/slave/slave.cpp:607: route(""/health"", src/slave/monitor.cpp:100: route(""/statistics"", $ grep -rn ""route("" 3rdparty/ |grep -v tests |grep -v README |grep -v examples |grep -v help |grep -v ""process..pp"" 3rdparty/libprocess/include/process/profiler.hpp:34: route(""/start"", START_HELP(), &Profiler::start); 3rdparty/libprocess/include/process/profiler.hpp:35: route(""/stop"", STOP_HELP(), &Profiler::stop); 3rdparty/libprocess/include/process/system.hpp:70: route(""/stats.json"", statsHelp(), &System::stats); 3rdparty/libprocess/include/process/logging.hpp:44: route(""/toggle"", TOGGLE_HELP(), &This::toggle); {code}",3 MESOS-4854,"Update CHANGELOG with net_cls isolator","Need to update the CHANGELOG for 0.28 release.",1 MESOS-4858,"Make changes to executor v1 library around managing connections.","While implementing pipelining changes for the scheduler library (MESOS-3570), we noticed a couple of small bugs that we would like to fix in the executor library: - Don't pass {{Connection}} objects to {{defer}} callbacks as they can sometimes lead to deadlocks. - Minor cleanups around not accepting {{SUBSCRIBE}} call if one is currently in progress. - Create a random UUID (connectionId) before we initiate a connection to the agent, as in some scenarios, we can accept connection attempts from stale connections.",3 MESOS-4859,"Add explicit upgrade instructions to the docs","The documentation currently contains per-version upgrade guidelines, which for recent releases only outlines the upgrade concerns for that version, without detailing explicit upgrade instructions. We should add explicit upgrade instructions to the top of the upgrades documentation, which can be supplemented by the per-version concerns. This is done within the upgrade docs for some early versions, with text like: {code} In order to upgrade a running cluster: Install the new master binaries and restart the masters. Upgrade the schedulers by linking the latest native library and mesos jar (if necessary). Restart the schedulers. Install the new slave binaries and restart the slaves. Upgrade the executors by linking the latest native library and mesos jar (if necessary). {code} Instructions to this effect should be featured prominently in the doc.",1 MESOS-4860,"Add a script to install the Nvidia GDK on a host.","This script can be used to install the Nvidia GDK for Cuda 7.5 on a mesos development machine. The purpose of the Nvidia GDK is to provide all the necessary header files (nvml.h) and library files (libnvidia-ml.so) necessary to build mesos with Nvidia GPU support. If the machine on which Mesos is being compiled doesn't have any GPUs, then libnvidia-ml.so consists only of stubs, allowing Mesos to build and run, but not actually do anything useful under the hood. This enables us to build a GPU-enabled mesos on a development machine without GPUs and then deploy it to a production machine with GPUs and be reasonably sure it will work.",2 MESOS-4861,"Add configure flags to build with Nvidia GPU support.","The configure flags can be used to enable Nvidia GPU support, as well as specify the installation directories of the nvml header and library files if not already installed in standard include/library paths on the system. They will also be used to conditionally build support for Nvidia GPUs into Mesos.",2 MESOS-4863,"Add Nvidia GPU isolator tests.","We need to be able to run unit tests that verify GPU isolation, as well as run full blown tests that actually exercise the GPUs. These tests should only build when the proper configure flags are set for enabling nvidia GPU support.",2 MESOS-4864,"Add flag to specify available Nvidia GPUs on an agent's command line.","In the initial GPU support we will not do auto-discovery of GPUs on an agent. As such, an operator will need to specify a flag on the command line, listing all of the GPUs available on the system.",3 MESOS-4865,"Add GPUs as an explicit resource.","We will add ""gpus"" as an explicitly recognized resource in Mesos, akin to cpus, memory, ports, and disk. In the containerizer, we will verify that the number of GPU resources passed in via the --resources flag matches the list of GPUs passed in via the --nvidia_gpus flag. In the future we will add autodiscovery so this matching is unnecessary. However, we will always have to pass ""gpus"" as a resource to make any GPU available on the system (unlike for cpus and memory, where the default is probed).",3 MESOS-4868,"PersistentVolumeTests do not need to set up ACLs.","The {{PersistentVolumeTest}} s have a custom helper for setting up ACLs in the {{master::Flags}}: {code} ACLs acls; hashset roles; foreach (const FrameworkInfo& framework, frameworks) { mesos::ACL::RegisterFramework* acl = acls.add_register_frameworks(); acl->mutable_principals()->add_values(framework.principal()); acl->mutable_roles()->add_values(framework.role()); roles.insert(framework.role()); } flags.acls = acls; flags.roles = strings::join("","", roles); {code} This is no longer necessary with implicit roles.",1 MESOS-4872,"Dump the contents of the sandbox when a test fails","[~bernd-mesos] added this logic for extra info about a rare flaky test: https://github.com/apache/mesos/blob/d26baee1f377aedb148ad04cc004bb38b85ee4f6/src/tests/fetcher_cache_tests.cpp#L249-L259 This information is useful regardless of the test type and should be generalized for {{cluster::Slave}}. i.e. # When a {{cluster::Slave}} is destructed, it can detect if the test has failed. # If so, navigate through its own {{work_dir}} and print sandboxes and/or other useful debugging info. Also see the refactor in [MESOS-4634].",3 MESOS-4873,"Add documentation about container image support.",NULL,5 MESOS-4877,"Mesos containerizer can't handle top level docker image like ""alpine"" (must use ""library/alpine"")","This can be demonstrated with the {{mesos-execute}} command: # Docker containerizer with image {{alpine}}: success {code} sudo ./build/src/mesos-execute --docker_image=alpine --containerizer=docker --name=just-a-test --command=""sleep 1000"" --master=localhost:5050 {code} # Mesos containerizer with image {{alpine}}: failure {code} sudo ./build/src/mesos-execute --docker_image=alpine --containerizer=mesos --name=just-a-test --command=""sleep 1000"" --master=localhost:5050 {code} # Mesos containerizer with image {{library/alpine}}: success {code} sudo ./build/src/mesos-execute --docker_image=library/alpine --containerizer=mesos --name=just-a-test --command=""sleep 1000"" --master=localhost:5050 {code} In the slave logs: {code} ea-4460-83 9c-838da86af34c-0007' I0306 16:32:41.418269 3403 metadata_manager.cpp:159] Looking for image 'alpine:latest' I0306 16:32:41.418699 3403 registry_puller.cpp:194] Pulling image 'alpine:latest' from 'docker-manifest://registry-1.docker.io:443alpine?latest#https' to '/tmp/mesos-test /store/docker/staging/ka7MlQ' E0306 16:32:43.098131 3400 slave.cpp:3773] Container '4bf9132d-9a57-4baa-a78c-e7164e93ace6' for executor 'just-a-test' of framework 4f055c6f-1bea-4460-839c-838da86af34c-0 007 failed to start: Collect failed: Unexpected HTTP response '401 Unauthorized {code} curl command executed: {code} $ sudo sysdig -A -p ""*%evt.time %proc.cmdline"" evt.type=execve and proc.name=curl 16:42:53.198998042 curl -s -S -L -D - https://registry-1.docker.io:443/v2/alpine/manifests/latest 16:42:53.784958541 curl -s -S -L -D - https://auth.docker.io/token?service=registry.docker.io&scope=repository:alpine:pull 16:42:54.294192024 curl -s -S -L -D - -H Authorization: Bearer eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCIsIng1YyI6WyJNSUlDTHpDQ0FkU2dBd0lCQWdJQkFEQUtCZ2dxaGtqT1BRUURBakJHTVVRd1FnWURWUVFERXp0Uk5Gb3pPa2RYTjBrNldGUlFSRHBJVFRSUk9rOVVWRmc2TmtGRlF6cFNUVE5ET2tGU01rTTZUMFkzTnpwQ1ZrVkJPa2xHUlVrNlExazFTekFlRncweE5UQTJNalV4T1RVMU5EWmFGdzB4TmpBMk1qUXhPVFUxTkRaYU1FWXhSREJDQmdOVkJBTVRPMGhHU1UwNldGZFZWam8yUVZkSU9sWlpUVEk2TTFnMVREcFNWREkxT2s5VFNrbzZTMVExUmpwWVRsSklPbFJMTmtnNlMxUkxOanBCUVV0VU1Ga3dFd1lIS29aSXpqMENBUVlJS29aSXpqMERBUWNEUWdBRXl2UzIvdEI3T3JlMkVxcGRDeFdtS1NqV1N2VmJ2TWUrWGVFTUNVMDByQjI0akNiUVhreFdmOSs0MUxQMlZNQ29BK0RMRkIwVjBGZGdwajlOWU5rL2pxT0JzakNCcnpBT0JnTlZIUThCQWY4RUJBTUNBSUF3RHdZRFZSMGxCQWd3QmdZRVZSMGxBREJFQmdOVkhRNEVQUVE3U0VaSlRUcFlWMVZXT2paQlYwZzZWbGxOTWpveldEVk1PbEpVTWpVNlQxTktTanBMVkRWR09saE9Va2c2VkVzMlNEcExWRXMyT2tGQlMxUXdSZ1lEVlIwakJEOHdQWUE3VVRSYU16cEhWemRKT2xoVVVFUTZTRTAwVVRwUFZGUllPalpCUlVNNlVrMHpRenBCVWpKRE9rOUdOemM2UWxaRlFUcEpSa1ZKT2tOWk5Vc3dDZ1lJS29aSXpqMEVBd0lEU1FBd1JnSWhBTXZiT2h4cHhrTktqSDRhMFBNS0lFdXRmTjZtRDFvMWs4ZEJOVGxuWVFudkFpRUF0YVJGSGJSR2o4ZlVSSzZ4UVJHRURvQm1ZZ3dZelR3Z3BMaGJBZzNOUmFvPSJdfQ.eyJhY2Nlc3MiOltdLCJhdWQiOiJyZWdpc3RyeS5kb2NrZXIuaW8iLCJleHAiOjE0NTcyODI4NzQsImlhdCI6MTQ1NzI4MjU3NCwiaXNzIjoiYXV0aC5kb2NrZXIuaW8iLCJqdGkiOiJaOGtyNXZXNEJMWkNIRS1IcVJIaCIsIm5iZiI6MTQ1NzI4MjU3NCwic3ViIjoiIn0.C2wtJq_P-m0buPARhmQjDfh6ztIAhcvgN3tfWIZEClSgXlVQ_sAQXAALNZKwAQL2Chj7NpHX--0GW-aeL_28Aw https://registry-1.docker.io:443/v2/alpine/manifests/latest {code} Also got the same result with {{ubuntu}} docker image.",3 MESOS-4879,"Update glog patch to support PowerPC LE","This is a part of PowerPC LE porting",1 MESOS-4881,"Rescind all outstanding offers after changing some weights.",NULL,2 MESOS-4882,"Add support for command and arguments to mesos-execute.","{{CommandInfo}} protobuf support two kinds of command: {code} // There are two ways to specify the command: // 1) If 'shell == true', the command will be launched via shell // (i.e., /bin/sh -c 'value'). The 'value' specified will be // treated as the shell command. The 'arguments' will be ignored. // 2) If 'shell == false', the command will be launched by passing // arguments to an executable. The 'value' specified will be // treated as the filename of the executable. The 'arguments' // will be treated as the arguments to the executable. This is // similar to how POSIX exec families launch processes (i.e., // execlp(value, arguments(0), arguments(1), ...)). {code} The mesos-execute cannot handle 2) now, enabling 2) can help with testing and running one off tasks.",5 MESOS-4886,"Support mesos containerizer force_pull_image option.","Currently for unified containerizer, images that are already cached by metadata manager cannot be updated. User has to delete corresponding images in store if an update is need. We should support `force_pull_image` option for unified containerizer, to provide override option if existed.",3 MESOS-4888,"Default cmd is executed as an incorrect command.","When mesos containerizer launch a container using a docker image, which only container default Cmd. The executable command is is a incorrect sequence. For example: If an image default entrypoint is null, cmd is ""sh"", user defines shell=false, value is none, and arguments as [-c, echo 'hello world']. The executable command is `[sh, -c, echo 'hello world', sh]`, which is incorrect. It should be `[sh, sh, -c, echo 'hello world']` instead. This problem is only exposed for the case: sh=0, value=0, argv=1, entrypoint=0, cmd=1. ",2 MESOS-4889,"Implement runtime isolator tests.","There different cases in docker runtime isolator. Some special cases should be tested with unique test case, to verify the docker runtime isolator logic is correct.",5 MESOS-4891,"Add a '/containers' endpoint to the agent to list all the active containers.","This endpoint will be similar to /monitor/statistics.json endpoint, but it'll also contain the 'container_status' about the container (see ContainerStatus in mesos.proto). We'll eventually deprecate the /monitor/statistics.json endpoint.",8 MESOS-4902,"Add authentication to libprocess endpoints","In addition to the endpoints addressed by MESOS-4850 and MESOS-5152, the following endpoints would also benefit from HTTP authentication: * {{/profiler/*}} * {{/logging/toggle}} * {{/metrics/snapshot}} Adding HTTP authentication to these endpoints is a bit more complicated because they are defined at the libprocess level. While working on MESOS-4850, it became apparent that since our tests use the same instance of libprocess for both master and agent, different default authentication realms must be used for master/agent so that HTTP authentication can be independently enabled/disabled for each. We should establish a mechanism for making an endpoint authenticated that allows us to: 1) Install an endpoint like {{/files}}, whose code is shared by the master and agent, with different authentication realms for the master and agent 2) Avoid hard-coding a default authentication realm into libprocess, to permit the use of different authentication realms for the master and agent and to keep application-level concerns from leaking into libprocess Another option would be to use a single default authentication realm and always enable or disable HTTP authentication for *both* the master and agent in tests. However, this wouldn't allow us to test scenarios where HTTP authentication is enabled on one but disabled on the other.",5 MESOS-4903,"Allow multiple loads of module manifests","The ModuleManager::load() is designed to be called exactly once during a process lifetime. This works well for Master/Agent environments. However, it can fail in Scheduler environments. For example, a single Scheduler binary might implement multiple scheduler drivers causing multiple calls to ModuleManager::load() leading to a failure.",3 MESOS-4908,"Tasks cannot be killed forcefully.","Currently there is no way for a scheduler to instruct the executor to kill a certain task immediately, skipping any possible timeouts and / or kill policies. This may be desirable in cases like, e.g., the kill policy is 10 minutes but something went wrong, so the scheduler decides to issue a forceful kill.",5 MESOS-4909,"Introduce kill policy for tasks.","A task may require some time to clean up or even a special mechanism to issue a kill request (currently it's a SIGTERM followed by SIGKILL). Introducing kill policies per task will help address these issue.",5 MESOS-4910,"Deprecate the --docker_stop_timeout agent flag.","Instead, a combination of {{executor_shutdown_grace_period}} agent flag and optionally task kill policies should be used.",1 MESOS-4911,"Executor driver does not respect executor shutdown grace period.","Executor shutdown grace period, configured on the agent, is propagated to executors via the `MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD` environment variable. The executor driver must use this timeout to delay the hard shutdown of the related executor.",1 MESOS-4912,"LinuxFilesystemIsolatorTest.ROOT_MultipleContainers fails.","Observed on our CI: {noformat} [09:34:15] : [Step 11/11] [ RUN ] LinuxFilesystemIsolatorTest.ROOT_MultipleContainers [09:34:19]W: [Step 11/11] I0309 09:34:19.906719 2357 linux.cpp:81] Making '/tmp/MLVLnv' a shared mount [09:34:19]W: [Step 11/11] I0309 09:34:19.923548 2357 linux_launcher.cpp:101] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher [09:34:19]W: [Step 11/11] I0309 09:34:19.924705 2376 containerizer.cpp:666] Starting container 'da610f7f-a709-4de8-94d3-74f4a520619b' for executor 'test_executor1' of framework '' [09:34:19]W: [Step 11/11] I0309 09:34:19.925355 2371 provisioner.cpp:285] Provisioning image rootfs '/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0' for container da610f7f-a709-4de8-94d3-74f4a520619b [09:34:19]W: [Step 11/11] I0309 09:34:19.925881 2377 copy.cpp:127] Copying layer path '/tmp/MLVLnv/test_image1' to rootfs '/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0' [09:34:30]W: [Step 11/11] I0309 09:34:30.835127 2376 linux.cpp:355] Bind mounting work directory from '/tmp/MLVLnv/slaves/test_slave/frameworks/executors/test_executor1/runs/da610f7f-a709-4de8-94d3-74f4a520619b' to '/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0/mnt/mesos/sandbox' for container da610f7f-a709-4de8-94d3-74f4a520619b [09:34:30]W: [Step 11/11] I0309 09:34:30.835392 2376 linux.cpp:683] Changing the ownership of the persistent volume at '/tmp/MLVLnv/volumes/roles/test_role/persistent_volume_id' with uid 0 and gid 0 [09:34:30]W: [Step 11/11] I0309 09:34:30.840425 2376 linux.cpp:723] Mounting '/tmp/MLVLnv/volumes/roles/test_role/persistent_volume_id' to '/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0/mnt/mesos/sandbox/volume' for persistent volume disk(test_role)[persistent_volume_id:volume]:32 of container da610f7f-a709-4de8-94d3-74f4a520619b [09:34:30]W: [Step 11/11] I0309 09:34:30.843878 2374 linux_launcher.cpp:304] Cloning child process with flags = CLONE_NEWNS [09:34:30]W: [Step 11/11] I0309 09:34:30.848302 2371 containerizer.cpp:666] Starting container 'fe4729c5-1e63-4cc6-a2e3-fe5006ffe087' for executor 'test_executor2' of framework '' [09:34:30]W: [Step 11/11] I0309 09:34:30.848758 2371 containerizer.cpp:1392] Destroying container 'da610f7f-a709-4de8-94d3-74f4a520619b' [09:34:30]W: [Step 11/11] I0309 09:34:30.848865 2373 provisioner.cpp:285] Provisioning image rootfs '/tmp/MLVLnv/provisioner/containers/fe4729c5-1e63-4cc6-a2e3-fe5006ffe087/backends/copy/rootfses/518b2464-43dd-47b0-9648-e78aedde6917' for container fe4729c5-1e63-4cc6-a2e3-fe5006ffe087 [09:34:30]W: [Step 11/11] I0309 09:34:30.849449 2375 copy.cpp:127] Copying layer path '/tmp/MLVLnv/test_image2' to rootfs '/tmp/MLVLnv/provisioner/containers/fe4729c5-1e63-4cc6-a2e3-fe5006ffe087/backends/copy/rootfses/518b2464-43dd-47b0-9648-e78aedde6917' [09:34:30]W: [Step 11/11] I0309 09:34:30.854038 2374 cgroups.cpp:2427] Freezing cgroup /sys/fs/cgroup/freezer/mesos/da610f7f-a709-4de8-94d3-74f4a520619b [09:34:30]W: [Step 11/11] I0309 09:34:30.856693 2372 cgroups.cpp:1409] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos/da610f7f-a709-4de8-94d3-74f4a520619b after 2.608128ms [09:34:30]W: [Step 11/11] I0309 09:34:30.859237 2377 cgroups.cpp:2445] Thawing cgroup /sys/fs/cgroup/freezer/mesos/da610f7f-a709-4de8-94d3-74f4a520619b [09:34:30]W: [Step 11/11] I0309 09:34:30.861454 2377 cgroups.cpp:1438] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/da610f7f-a709-4de8-94d3-74f4a520619b after 2176us [09:34:30]W: [Step 11/11] I0309 09:34:30.934608 2378 containerizer.cpp:1608] Executor for container 'da610f7f-a709-4de8-94d3-74f4a520619b' has exited [09:34:30]W: [Step 11/11] I0309 09:34:30.937692 2372 linux.cpp:798] Unmounting volume '/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0/mnt/mesos/sandbox/volume' for container da610f7f-a709-4de8-94d3-74f4a520619b [09:34:30]W: [Step 11/11] I0309 09:34:30.937742 2372 linux.cpp:817] Unmounting sandbox/work directory '/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0/mnt/mesos/sandbox' for container da610f7f-a709-4de8-94d3-74f4a520619b [09:34:30]W: [Step 11/11] I0309 09:34:30.938129 2375 provisioner.cpp:330] Destroying container rootfs at '/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0' for container da610f7f-a709-4de8-94d3-74f4a520619b [09:34:45] : [Step 11/11] ../../src/tests/containerizer/filesystem_isolator_tests.cpp:1318: Failure [09:34:45] : [Step 11/11] Failed to wait 15secs for wait1 [09:34:48] : [Step 11/11] [ FAILED ] LinuxFilesystemIsolatorTest.ROOT_MultipleContainers (32341 ms) {noformat}",3 MESOS-4914,"ProcessorManager delegate should be an Option, not just a string.","Currently, the delegate field in the ProcessManager is just a string type. We check for 'existence' of a delegate by comparing (delegate != """"). Using an Option is the preferred method for things like this.",1 MESOS-4916,"Allow modules to express if they are multi-instantiable and thread safe.","A module might be instantiated multiple time (e.g., multiple schedulers in the same Java process instantiating an authenticator module) within the same process. The current mechanism doesn't provide a way through the module API to forbid multiple instantiations. It is up to the module to check and return error on prior instantiation. Along similar lines, a module should be able to express thread-safety concerns. Typically, a module running in Master/Agent doesn't have to be concerned about thread safety if it uses libprocess API. However, we should investigate how it plays in the scheduler environment.",8 MESOS-4917,"Replace non-pod static variables in module/manager.[ch]pp with pod eqivalents.",NULL,3 MESOS-4918,"Cache module manifests while loading in ModuleManager.","Since the module managers are allowed to load the same module multiple times, we should be caching the module manifests to avoid cases where the module tries to trick the module manager by changing `ModuleBase` fields before the next call to `ModuleManager::load`.",3 MESOS-4922,"Setup proper /etc/hostname, /etc/hosts and /etc/resolv.conf for containers in network/cni isolator.","The network/cni isolator needs to properly setup /etc/hostname and /etc/hosts for the container with a hostname (e.g., randomly generated) and the assigned IP returned by CNI plugin. We should consider the following cases: 1) container is using host filesystem 2) container is using a different filesystem 3) custom executor and command executor",5 MESOS-4926,"Add a list parser for comma separated integers in flags.","Some flags require lists of integers to be passed in. We should have an explicit parser for this instead of relying on ad hoc solutions.",2 MESOS-4927,"The flag parser for `hashmap` should live in stout, not mesos.","The title says it all.",1 MESOS-4928,"Remove all '.get().' calls on Option / Try variables in the resources abstraction.","When possible, {{.get()}} calls should be replaced by {{->}} for {{Option}} / {{Try}} variables. This ticket only proposes a blanket change for this in the resource abstraction files, not the code base as a whole. This is in preparation for introducing the new GPU resource. Without this change, I would need to use the old {{.get()}} calls. Instead, I propose to fix the old code surrounding it so that consistency has me doing it the right way. ",1 MESOS-4932,"Propose Design for Authorization based filtering for endpoints.","The design doc can be found here: https://docs.google.com/document/d/1M27S7OTSfJ8afZCklOz00g_wcVrL32i9Lyl6g22GWeY",5 MESOS-4933,"Registrar HTTP Authentication.","Now that the master (and agents in progress) provide http authentication the registrar should do the same. See http://mesos.apache.org/documentation/latest/endpoints/registrar/registry/",3 MESOS-4934,"Enable HELP to include authentication status of endpoint.","As we enable authentication for more and more endpoints we should document which endpoints support authentication and which ones don't.",2 MESOS-4937,"Investigate container security options for Mesos containerizer","We should investigate the following to improve the container security for Mesos containerizer and come up with a list of features that we want to support in MVP. 1) Capabilities 2) User namespace 3) Seccomp 4) SELinux 5) AppArmor We should investigate what other container systems are doing regarding security: 1) [k8s| https://github.com/kubernetes/kubernetes/blob/master/pkg/api/v1/types.go#L2905] 2) [docker|https://docs.docker.com/engine/security/security/] 3) [oci|https://github.com/opencontainers/specs/blob/master/config.md]",5 MESOS-4938,"Support docker registry authentication",NULL,5 MESOS-4939,"Support specifying per-container docker registry.","Currently, we only support a per agent flag to specify the docker registry. We should instead, allow people to specify the registry as part of the docker image name (like `docker pull` does).",3 MESOS-4941,"Support update existing quota.","We want to support updating an existing quota without the cycle of delete and recreate. This avoids the possible starvation risk of losing the quota between delete and recreate, and also makes the interface friendly. Design doc: https://docs.google.com/document/d/1c8fJY9_N0W04FtUQ_b_kZM6S0eePU7eYVyfUP14dSys",8 MESOS-4942,"Docker runtime isolator tests may cause disk issue.","Currently slave working directory is used as docker store dir and archive dir, which is problematic. Because slave work dir is exactly `environment->mkdtemp()`, it will get cleaned up until the end of the whole test. But the runtime isolator local puller tests cp the host's rootfs, which size is relatively big. Cleanup has to be done by each test tear down. ",2 MESOS-4943,"Reduce the size of LinuxRootfs in tests.","Right now, LinuxRootfs copies files from the host filesystem to construct a chroot-able rootfs. We copy a lot of unnecessary files, making it very large. We can potentially strip a lot files.",13 MESOS-4944,"Improve overlay backend so that it's writable","Currently, the overlay backend will provision a read-only FS. We can use an empty directory from the container sandbox to act as the upper layer so that it's writable.",5 MESOS-4949,"Executor shutdown grace period should be configurable.","Currently, executor shutdown grace period is specified by an agent flag, which is propagated to executors via the {{MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD}} environment variable. There is no way to adjust this timeout for the needs of a particular executor. To tackle this problem, we propose to introduce an optional {{shutdown_grace_period}} field in {{ExecutorInfo}}.",3 MESOS-4950,"Implement reconnect funtionality in the scheduler library.","Currently, there is no way for the schedulers to force a reconnection attempt with the master using the scheduler library {{src/scheduler/scheduler.cpp}}. It is specifically useful in scenarios where there is a one way network partition with the master. Due to this, the scheduler has not received any {{HEARTBEAT}} events from the master. In this case, the scheduler might want to force a reconnection attempt with the master instead of relying on the {{disconnected}} callback.",3 MESOS-4951,"Enable actors to pass an authentication realm to libprocess","To prepare for MESOS-4902, the Mesos master and agent need a way to pass the desired authentication realm to libprocess. Since some endpoints (like {{/profiler/*}}) get installed in libprocess, the master/agent should be able to specify during initialization what authentication realm the libprocess-level endpoints will be authenticated under.",2 MESOS-4956,"Add authentication to /files endpoints","To protect access (authz) to master/agent logs as well as executor sandboxes, we need authentication on the /files endpoints. Adding HTTP authentication to these endpoints is a bit complicated since they are defined in code that is shared by the master and agent. While working on MESOS-4850, it became apparent that since our tests use the same instance of libprocess for both master and agent, different default authentication realms must be used for master/agent so that HTTP authentication can be independently enabled/disabled for each. We should establish a mechanism for making an endpoint authenticated that allows us to: 1) Install an endpoint like {{/files}}, whose code is shared by the master and agent, with different authentication realms for the master and agent 2) Avoid hard-coding a default authentication realm into libprocess, to permit the use of different authentication realms for the master and agent and to keep application-level concerns from leaking into libprocess Another option would be to use a single default authentication realm and always enable or disable HTTP authentication for *both* the master and agent in tests. However, this wouldn't allow us to test scenarios where HTTP authentication is enabled on one but disabled on the other.",5 MESOS-4961,"ContainerLoggerTest.LOGROTATE_RotateInSandbox is flaky","The logger subprocesses may exit before we reach the {{waitpid}} in the test. If this happens, {{waitpid}} will return a {{-1}} as the process no longer exists. Verbose logs: {code} [ RUN ] ContainerLoggerTest.LOGROTATE_RotateInSandbox I0316 14:28:51.329337 1242 cluster.cpp:139] Creating default 'local' authorizer I0316 14:28:51.332823 1242 leveldb.cpp:174] Opened db in 3.079559ms I0316 14:28:51.333916 1242 leveldb.cpp:181] Compacted db in 1.054247ms I0316 14:28:51.333979 1242 leveldb.cpp:196] Created db iterator in 21450ns I0316 14:28:51.334005 1242 leveldb.cpp:202] Seeked to beginning of db in 2205ns I0316 14:28:51.334025 1242 leveldb.cpp:271] Iterated through 0 keys in the db in 410ns I0316 14:28:51.334089 1242 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0316 14:28:51.334661 1275 recover.cpp:447] Starting replica recovery I0316 14:28:51.335044 1275 recover.cpp:473] Replica is in EMPTY status I0316 14:28:51.336207 1262 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (484)@172.17.0.3:45919 I0316 14:28:51.336730 1270 recover.cpp:193] Received a recover response from a replica in EMPTY status I0316 14:28:51.337257 1275 recover.cpp:564] Updating replica status to STARTING I0316 14:28:51.338001 1267 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 537200ns I0316 14:28:51.338032 1267 replica.cpp:320] Persisted replica status to STARTING I0316 14:28:51.338183 1261 master.cpp:376] Master c7653f60-33e9-4406-9f62-dc74c906bf83 (2cbb23302fe5) started on 172.17.0.3:45919 I0316 14:28:51.338295 1263 recover.cpp:473] Replica is in STARTING status I0316 14:28:51.338213 1261 master.cpp:378] Flags at startup: --acls="""" --allocation_interval=""1secs"" --allocator=""HierarchicalDRF"" --authenticate=""true"" --authenticate_http=""true"" --authenticate_slaves=""true"" --authenticators=""crammd5"" --authorizers=""local"" --credentials=""/tmp/XtqwkS/credentials"" --framework_sorter=""drf"" --help=""false"" --hostname_lookup=""true"" --http_authenticators=""basic"" --initialize_driver_logging=""true"" --log_auto_initialize=""true"" --logbufsecs=""0"" --logging_level=""INFO"" --max_completed_frameworks=""50"" --max_completed_tasks_per_framework=""1000"" --max_slave_ping_timeouts=""5"" --quiet=""false"" --recovery_slave_removal_limit=""100%"" --registry=""replicated_log"" --registry_fetch_timeout=""1mins"" --registry_store_timeout=""100secs"" --registry_strict=""true"" --root_submissions=""true"" --slave_ping_timeout=""15secs"" --slave_reregister_timeout=""10mins"" --user_sorter=""drf"" --version=""false"" --webui_dir=""/mesos/mesos-0.29.0/_inst/share/mesos/webui"" --work_dir=""/tmp/XtqwkS/master"" --zk_session_timeout=""10secs"" I0316 14:28:51.338562 1261 master.cpp:423] Master only allowing authenticated frameworks to register I0316 14:28:51.338572 1261 master.cpp:428] Master only allowing authenticated slaves to register I0316 14:28:51.338580 1261 credentials.hpp:35] Loading credentials for authentication from '/tmp/XtqwkS/credentials' I0316 14:28:51.338877 1261 master.cpp:468] Using default 'crammd5' authenticator I0316 14:28:51.339030 1262 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (485)@172.17.0.3:45919 I0316 14:28:51.339246 1261 master.cpp:537] Using default 'basic' HTTP authenticator I0316 14:28:51.339393 1261 master.cpp:571] Authorization enabled I0316 14:28:51.339390 1266 recover.cpp:193] Received a recover response from a replica in STARTING status I0316 14:28:51.339606 1271 whitelist_watcher.cpp:77] No whitelist given I0316 14:28:51.339607 1275 hierarchical.cpp:144] Initialized hierarchical allocator process I0316 14:28:51.340077 1268 recover.cpp:564] Updating replica status to VOTING I0316 14:28:51.340533 1270 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 331558ns I0316 14:28:51.340558 1270 replica.cpp:320] Persisted replica status to VOTING I0316 14:28:51.340672 1270 recover.cpp:578] Successfully joined the Paxos group I0316 14:28:51.340827 1270 recover.cpp:462] Recover process terminated I0316 14:28:51.341684 1270 master.cpp:1806] The newly elected leader is master@172.17.0.3:45919 with id c7653f60-33e9-4406-9f62-dc74c906bf83 I0316 14:28:51.341717 1270 master.cpp:1819] Elected as the leading master! I0316 14:28:51.341740 1270 master.cpp:1508] Recovering from registrar I0316 14:28:51.341954 1263 registrar.cpp:307] Recovering registrar I0316 14:28:51.342499 1273 log.cpp:659] Attempting to start the writer I0316 14:28:51.343616 1266 replica.cpp:493] Replica received implicit promise request from (487)@172.17.0.3:45919 with proposal 1 I0316 14:28:51.344183 1266 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 536941ns I0316 14:28:51.344208 1266 replica.cpp:342] Persisted promised to 1 I0316 14:28:51.344825 1267 coordinator.cpp:238] Coordinator attempting to fill missing positions I0316 14:28:51.346009 1276 replica.cpp:388] Replica received explicit promise request from (488)@172.17.0.3:45919 for position 0 with proposal 2 I0316 14:28:51.346371 1276 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 327890ns I0316 14:28:51.346393 1276 replica.cpp:712] Persisted action at 0 I0316 14:28:51.347363 1267 replica.cpp:537] Replica received write request for position 0 from (489)@172.17.0.3:45919 I0316 14:28:51.347414 1267 leveldb.cpp:436] Reading position from leveldb took 24861ns I0316 14:28:51.347774 1267 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 323654ns I0316 14:28:51.347796 1267 replica.cpp:712] Persisted action at 0 I0316 14:28:51.348323 1276 replica.cpp:691] Replica received learned notice for position 0 from @0.0.0.0:0 I0316 14:28:51.348714 1276 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 361981ns I0316 14:28:51.348738 1276 replica.cpp:712] Persisted action at 0 I0316 14:28:51.348760 1276 replica.cpp:697] Replica learned NOP action at position 0 I0316 14:28:51.349318 1274 log.cpp:675] Writer started with ending position 0 I0316 14:28:51.350275 1267 leveldb.cpp:436] Reading position from leveldb took 23849ns I0316 14:28:51.351171 1271 registrar.cpp:340] Successfully fetched the registry (0B) in 9.173248ms I0316 14:28:51.351300 1271 registrar.cpp:439] Applied 1 operations in 32119ns; attempting to update the 'registry' I0316 14:28:51.351989 1272 log.cpp:683] Attempting to append 170 bytes to the log I0316 14:28:51.352108 1266 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 1 I0316 14:28:51.352802 1263 replica.cpp:537] Replica received write request for position 1 from (490)@172.17.0.3:45919 I0316 14:28:51.353313 1263 leveldb.cpp:341] Persisting action (189 bytes) to leveldb took 474854ns I0316 14:28:51.353338 1263 replica.cpp:712] Persisted action at 1 I0316 14:28:51.354101 1273 replica.cpp:691] Replica received learned notice for position 1 from @0.0.0.0:0 I0316 14:28:51.354483 1273 leveldb.cpp:341] Persisting action (191 bytes) to leveldb took 338210ns I0316 14:28:51.354507 1273 replica.cpp:712] Persisted action at 1 I0316 14:28:51.354529 1273 replica.cpp:697] Replica learned APPEND action at position 1 I0316 14:28:51.355444 1275 registrar.cpp:484] Successfully updated the 'registry' in 4.084224ms I0316 14:28:51.355569 1275 registrar.cpp:370] Successfully recovered registrar I0316 14:28:51.355697 1268 log.cpp:702] Attempting to truncate the log to 1 I0316 14:28:51.355870 1269 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 2 I0316 14:28:51.356016 1274 master.cpp:1616] Recovered 0 slaves from the Registry (131B) ; allowing 10mins for slaves to re-register I0316 14:28:51.356032 1272 hierarchical.cpp:171] Skipping recovery of hierarchical allocator: nothing to recover I0316 14:28:51.356761 1273 replica.cpp:537] Replica received write request for position 2 from (491)@172.17.0.3:45919 I0316 14:28:51.357203 1273 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 406053ns I0316 14:28:51.357226 1273 replica.cpp:712] Persisted action at 2 I0316 14:28:51.357718 1270 replica.cpp:691] Replica received learned notice for position 2 from @0.0.0.0:0 I0316 14:28:51.358093 1270 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 345370ns I0316 14:28:51.358175 1270 leveldb.cpp:399] Deleting ~1 keys from leveldb took 57us I0316 14:28:51.358201 1270 replica.cpp:712] Persisted action at 2 I0316 14:28:51.358220 1270 replica.cpp:697] Replica learned TRUNCATE action at position 2 I0316 14:28:51.368399 1242 containerizer.cpp:149] Using isolation: posix/cpu,posix/mem,filesystem/posix W0316 14:28:51.406371 1242 backend.cpp:66] Failed to create 'bind' backend: BindBackend requires root privileges I0316 14:28:51.410480 1266 slave.cpp:193] Slave started on 12)@172.17.0.3:45919 I0316 14:28:51.410518 1266 slave.cpp:194] Flags at startup: --appc_simple_discovery_uri_prefix=""http://"" --appc_store_dir=""/tmp/mesos/store/appc"" --authenticatee=""crammd5"" --cgroups_cpu_enable_pids_and_tids_count=""false"" --cgroups_enable_cfs=""false"" --cgroups_hierarchy=""/sys/fs/cgroup"" --cgroups_limit_swap=""false"" --cgroups_root=""mesos"" --container_disk_watch_interval=""15secs"" --container_logger=""org_apache_mesos_LogrotateContainerLogger"" --containerizers=""mesos"" --credential=""/tmp/ContainerLoggerTest_LOGROTATE_RotateInSandbox_JHP0gy/credential"" --default_role=""*"" --disk_watch_interval=""1mins"" --docker=""docker"" --docker_kill_orphans=""true"" --docker_registry=""https://registry-1.docker.io"" --docker_remove_delay=""6hrs"" --docker_socket=""/var/run/docker.sock"" --docker_stop_timeout=""0ns"" --docker_store_dir=""/tmp/mesos/store/docker"" --enforce_container_disk_quota=""false"" --executor_registration_timeout=""1mins"" --executor_shutdown_grace_period=""5secs"" --fetcher_cache_dir=""/tmp/ContainerLoggerTest_LOGROTATE_RotateInSandbox_JHP0gy/fetch"" --fetcher_cache_size=""2GB"" --frameworks_home="""" --gc_delay=""1weeks"" --gc_disk_headroom=""0.1"" --hadoop_home="""" --help=""false"" --hostname_lookup=""true"" --image_provisioner_backend=""copy"" --initialize_driver_logging=""true"" --isolation=""posix/cpu,posix/mem"" --launcher_dir=""/mesos/mesos-0.29.0/_build/src"" --logbufsecs=""0"" --logging_level=""INFO"" --oversubscribed_resources_interval=""15secs"" --perf_duration=""10secs"" --perf_interval=""1mins"" --qos_correction_interval_min=""0ns"" --quiet=""false"" --recover=""reconnect"" --recovery_timeout=""15mins"" --registration_backoff_factor=""10ms"" --resources=""cpus:2;mem:1024;disk:1024;ports:[31000-32000]"" --revocable_cpu_low_priority=""true"" --sandbox_directory=""/mnt/mesos/sandbox"" --strict=""true"" --switch_user=""true"" --systemd_enable_support=""true"" --systemd_runtime_directory=""/run/systemd/system"" --version=""false"" --work_dir=""/tmp/ContainerLoggerTest_LOGROTATE_RotateInSandbox_JHP0gy"" I0316 14:28:51.411118 1266 credentials.hpp:83] Loading credential for authentication from '/tmp/ContainerLoggerTest_LOGROTATE_RotateInSandbox_JHP0gy/credential' I0316 14:28:51.411381 1266 slave.cpp:324] Slave using credential for: test-principal I0316 14:28:51.411696 1266 resources.cpp:572] Parsing resources as JSON failed: cpus:2;mem:1024;disk:1024;ports:[31000-32000] Trying semicolon-delimited string format instead I0316 14:28:51.412075 1266 slave.cpp:464] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0316 14:28:51.412148 1266 slave.cpp:472] Slave attributes: [ ] I0316 14:28:51.412160 1266 slave.cpp:477] Slave hostname: 2cbb23302fe5 I0316 14:28:51.413516 1263 state.cpp:58] Recovering state from '/tmp/ContainerLoggerTest_LOGROTATE_RotateInSandbox_JHP0gy/meta' I0316 14:28:51.413774 1266 status_update_manager.cpp:200] Recovering status update manager I0316 14:28:51.414029 1276 containerizer.cpp:407] Recovering containerizer I0316 14:28:51.415222 1269 provisioner.cpp:245] Provisioner recovery complete I0316 14:28:51.415650 1268 slave.cpp:4565] Finished recovery I0316 14:28:51.416115 1268 slave.cpp:4737] Querying resource estimator for oversubscribable resources I0316 14:28:51.416365 1268 slave.cpp:796] New master detected at master@172.17.0.3:45919 I0316 14:28:51.416448 1276 status_update_manager.cpp:174] Pausing sending status updates I0316 14:28:51.416445 1268 slave.cpp:859] Authenticating with master master@172.17.0.3:45919 I0316 14:28:51.416522 1268 slave.cpp:864] Using default CRAM-MD5 authenticatee I0316 14:28:51.416671 1268 slave.cpp:832] Detecting new master I0316 14:28:51.416731 1275 authenticatee.cpp:121] Creating new client SASL connection I0316 14:28:51.416807 1268 slave.cpp:4751] Received oversubscribable resources from the resource estimator I0316 14:28:51.417006 1263 master.cpp:5659] Authenticating slave(12)@172.17.0.3:45919 I0316 14:28:51.417103 1262 authenticator.cpp:413] Starting authentication session for crammd5_authenticatee(38)@172.17.0.3:45919 I0316 14:28:51.417348 1273 authenticator.cpp:98] Creating new server SASL connection I0316 14:28:51.417548 1266 authenticatee.cpp:212] Received SASL authentication mechanisms: CRAM-MD5 I0316 14:28:51.417582 1266 authenticatee.cpp:238] Attempting to authenticate with mechanism 'CRAM-MD5' I0316 14:28:51.417696 1264 authenticator.cpp:203] Received SASL authentication start I0316 14:28:51.417753 1264 authenticator.cpp:325] Authentication requires more steps I0316 14:28:51.417948 1265 authenticatee.cpp:258] Received SASL authentication step I0316 14:28:51.418107 1267 authenticator.cpp:231] Received SASL authentication step I0316 14:28:51.418159 1267 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: '2cbb23302fe5' server FQDN: '2cbb23302fe5' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0316 14:28:51.418180 1267 auxprop.cpp:179] Looking up auxiliary property '*userPassword' I0316 14:28:51.418233 1267 auxprop.cpp:179] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0316 14:28:51.418270 1267 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: '2cbb23302fe5' server FQDN: '2cbb23302fe5' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0316 14:28:51.418289 1267 auxprop.cpp:129] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0316 14:28:51.418300 1267 auxprop.cpp:129] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0316 14:28:51.418323 1267 authenticator.cpp:317] Authentication success I0316 14:28:51.418414 1264 authenticatee.cpp:298] Authentication success I0316 14:28:51.418473 1269 master.cpp:5689] Successfully authenticated principal 'test-principal' at slave(12)@172.17.0.3:45919 I0316 14:28:51.418514 1275 authenticator.cpp:431] Authentication session cleanup for crammd5_authenticatee(38)@172.17.0.3:45919 I0316 14:28:51.418781 1276 slave.cpp:927] Successfully authenticated with master master@172.17.0.3:45919 I0316 14:28:51.418937 1276 slave.cpp:1321] Will retry registration in 1.983001ms if necessary I0316 14:28:51.419108 1262 master.cpp:4370] Registering slave at slave(12)@172.17.0.3:45919 (2cbb23302fe5) with id c7653f60-33e9-4406-9f62-dc74c906bf83-S0 I0316 14:28:51.419643 1266 registrar.cpp:439] Applied 1 operations in 75642ns; attempting to update the 'registry' I0316 14:28:51.420670 1272 log.cpp:683] Attempting to append 339 bytes to the log I0316 14:28:51.420820 1269 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 3 I0316 14:28:51.421495 1270 slave.cpp:1321] Will retry registration in 1.437257ms if necessary I0316 14:28:51.421716 1275 master.cpp:4358] Ignoring register slave message from slave(12)@172.17.0.3:45919 (2cbb23302fe5) as admission is already in progress I0316 14:28:51.422107 1267 replica.cpp:537] Replica received write request for position 3 from (505)@172.17.0.3:45919 I0316 14:28:51.423033 1267 leveldb.cpp:341] Persisting action (358 bytes) to leveldb took 762815ns I0316 14:28:51.423066 1267 replica.cpp:712] Persisted action at 3 I0316 14:28:51.424069 1267 replica.cpp:691] Replica received learned notice for position 3 from @0.0.0.0:0 I0316 14:28:51.424232 1264 slave.cpp:1321] Will retry registration in 66.01292ms if necessary I0316 14:28:51.424342 1269 master.cpp:4358] Ignoring register slave message from slave(12)@172.17.0.3:45919 (2cbb23302fe5) as admission is already in progress I0316 14:28:51.424686 1267 leveldb.cpp:341] Persisting action (360 bytes) to leveldb took 574743ns I0316 14:28:51.424757 1267 replica.cpp:712] Persisted action at 3 I0316 14:28:51.424792 1267 replica.cpp:697] Replica learned APPEND action at position 3 I0316 14:28:51.426441 1272 registrar.cpp:484] Successfully updated the 'registry' in 6.721024ms I0316 14:28:51.426677 1262 log.cpp:702] Attempting to truncate the log to 3 I0316 14:28:51.426808 1264 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 4 I0316 14:28:51.427584 1261 slave.cpp:3482] Received ping from slave-observer(11)@172.17.0.3:45919 I0316 14:28:51.428213 1262 hierarchical.cpp:473] Added slave c7653f60-33e9-4406-9f62-dc74c906bf83-S0 (2cbb23302fe5) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (allocated: ) I0316 14:28:51.427865 1266 master.cpp:4438] Registered slave c7653f60-33e9-4406-9f62-dc74c906bf83-S0 at slave(12)@172.17.0.3:45919 (2cbb23302fe5) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0316 14:28:51.428270 1267 slave.cpp:971] Registered with master master@172.17.0.3:45919; given slave ID c7653f60-33e9-4406-9f62-dc74c906bf83-S0 I0316 14:28:51.428412 1265 replica.cpp:537] Replica received write request for position 4 from (506)@172.17.0.3:45919 I0316 14:28:51.428443 1267 fetcher.cpp:81] Clearing fetcher cache I0316 14:28:51.428503 1262 hierarchical.cpp:1453] No resources available to allocate! I0316 14:28:51.428535 1262 hierarchical.cpp:1150] Performed allocation for slave c7653f60-33e9-4406-9f62-dc74c906bf83-S0 in 205421ns I0316 14:28:51.428750 1273 status_update_manager.cpp:181] Resuming sending status updates I0316 14:28:51.429157 1265 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 695258ns I0316 14:28:51.429225 1267 slave.cpp:994] Checkpointing SlaveInfo to '/tmp/ContainerLoggerTest_LOGROTATE_RotateInSandbox_JHP0gy/meta/slaves/c7653f60-33e9-4406-9f62-dc74c906bf83-S0/slave.info' I0316 14:28:51.429275 1265 replica.cpp:712] Persisted action at 4 I0316 14:28:51.429759 1267 slave.cpp:1030] Forwarding total oversubscribed resources I0316 14:28:51.430055 1265 master.cpp:4782] Received update of slave c7653f60-33e9-4406-9f62-dc74c906bf83-S0 at slave(12)@172.17.0.3:45919 (2cbb23302fe5) with total oversubscribed resources I0316 14:28:51.430614 1271 replica.cpp:691] Replica received learned notice for position 4 from @0.0.0.0:0 I0316 14:28:51.430891 1242 sched.cpp:222] Version: 0.29.0 I0316 14:28:51.431043 1265 hierarchical.cpp:531] Slave c7653f60-33e9-4406-9f62-dc74c906bf83-S0 (2cbb23302fe5) updated with oversubscribed resources (total: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000], allocated: ) I0316 14:28:51.431236 1271 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 536892ns I0316 14:28:51.431267 1265 hierarchical.cpp:1453] No resources available to allocate! I0316 14:28:51.431584 1271 leveldb.cpp:399] Deleting ~2 keys from leveldb took 66904ns I0316 14:28:51.431538 1273 sched.cpp:326] New master detected at master@172.17.0.3:45919 I0316 14:28:51.431622 1271 replica.cpp:712] Persisted action at 4 I0316 14:28:51.431623 1265 hierarchical.cpp:1150] Performed allocation for slave c7653f60-33e9-4406-9f62-dc74c906bf83-S0 in 518588ns I0316 14:28:51.431660 1271 replica.cpp:697] Replica learned TRUNCATE action at position 4 I0316 14:28:51.431711 1273 sched.cpp:382]...",1 MESOS-4962,"Support for Mesos releases","As part of Mesos reaching 1.0, we need to formalize the policy of supporting Mesos releases. Some specific questions we need to answer: --> What fixes should we backports to older releases. --> How many old releases are supported. --> Should we have a LTS version? --> What is the cadence of major, minor and patch releases?",8 MESOS-4970,"Add more examples of JSON resources to docs","The configuration documentation currently only shows examples of scalar resource types in JSON format. The structures of JSON resources are a bit complicated, so it would be very helpful to include examples of ranges, sets, and text resource types as well.",1 MESOS-4978,"Update mesos-execute with Appc changes.","mesos-execute cli application currently does not have support for Appc images. Adding support would make integration tests easier.",3 MESOS-4982,"Update example long running to use v1 API.","We need to modify the long running test framework similar to {{src/examples/long_lived_framework.cpp}} to use the v1 API. This would allow us to vet the v1 API and the scheduler library in test clusters.",5 MESOS-4984,"MasterTest.SlavesEndpointTwoSlaves is flaky","Observed on Arch Linux with GCC 6, running in a virtualbox VM: [ RUN ] MasterTest.SlavesEndpointTwoSlaves /mesos-2/src/tests/master_tests.cpp:1710: Failure Value of: array.get().values.size() Actual: 1 Expected: 2u Which is: 2 [ FAILED ] MasterTest.SlavesEndpointTwoSlaves (86 ms) Seems to fail non-deterministically, perhaps more often when there is concurrent CPU load on the machine.",2 MESOS-4985,"Destroy a container while it's provisioning can lead to leaked provisioned directories.","Here is the possible sequence of events: 1) containerizer->launch 2) provisioner->provision is called. it is fetching the image 3) executor registration timed out 4) containerizer->destroy is called 5) container->state is still in PREPARING 6) provisioner->destroy is called So we can be calling provisioner->destory while provisioner->provision hasn't finished yet. provisioner->destroy might just skip since there's no information about the container yet, and later, provisioner will prepare the root filesystem. This root filesystem will not be destroyed as destroy already finishes.",3 MESOS-4992,"sandbox uri does not work outisde mesos http server","The SandBox uri of a framework does not work if i just copy paste it to the browser. For example the following sandbox uri: http://172.17.0.1:5050/#/slaves/50f87c73-79ef-4f2a-95f0-b2b4062b2de6-S0/frameworks/50f87c73-79ef-4f2a-95f0-b2b4062b2de6-0009/executors/driver-20160321155016-0001/browse should redirect to: http://172.17.0.1:5050/#/slaves/50f87c73-79ef-4f2a-95f0-b2b4062b2de6-S0/browse?path=%2Ftmp%2Fmesos%2Fslaves%2F50f87c73-79ef-4f2a-95f0-b2b4062b2de6-S0%2Fframeworks%2F50f87c73-79ef-4f2a-95f0-b2b4062b2de6-0009%2Fexecutors%2Fdriver-20160321155016-0001%2Fruns%2F60533483-31fb-4353-987d-f3393911cc80 yet it fails with the message: ""Failed to find slaves. Navigate to the slave's sandbox via the Mesos UI."" and redirects to: http://172.17.0.1:5050/#/ It is an issue for me because im working on expanding the mesos spark ui with sandbox uri, The other option is to get the slave info and parse the json file there and get executor paths not so straightforward or elegant though. Moreover i dont see the runs/container_id in the Mesos Proto Api. I guess this is hidden info, this is the needed piece of info to re-write the uri without redirection. ",3 MESOS-4998,"Problematic fork/clone performance at high load.","Creating a new subprocess in mesos involves forking/cloning a new process. In most cases (executors, perf, ..) the parent of the new process is the agent/slave process. This can lead to problematic behavior especially when creating several new processes at the same time. The problem here is that the normal fork() (or clone syscall used by libprocess) provides a copy-on-write (cow) view of the parents address space until the child execs its new binary. Note that during the time between fork and exec Mesos does several setup actions such as placing the new processes in systemd units or assigning them to the freezer cgroup. This cow property of the address space implies that existing memory is marked as read-only and any write will trigger a page-fault and a newly created page. Note this behavior also extends to the parent process and hence any write will be very costly. We simulated the number of pagefaults when forking/cloning new processes by this benchmark: https://github.com/joerg84/forking-benchmark Results can be seen here: https://docs.google.com/presentation/d/1SUjKAVHdrutLPpFJy3Q1yhinG5FOMw3HbbEdzuhZ7A8",8 MESOS-5004,"Clarify docs on '/reserve' and '/create-volumes' without authentication","For both reservations and persistent volume creation, the behavior of the HTTP endpoints differs slightly from that of the framework operations. Due to the implementation of HTTP authentication, it is not possible for a framework/operator to provide a principal when HTTP authentication is disabled. This means that when HTTP authentication is disabled, the endpoint handlers will _always_ receive {{None()}} as the principal associated with the request, and thus if authorization is enabled, the request will only succeed if the NONE principal is authorized to do stuff. The docs should be updated to explain this behavior explicitly.",1 MESOS-5005,"Enforce that DiskInfo principal is equal to framework/operator principal","Currently, we require that {{ReservationInfo.principal}} be equal to the principal provided for authentication, which means that when HTTP authentication is disabled this field cannot be set. Based on comments in 'mesos.proto', the original intention was to enforce this same constraint for {{Persistence.principal}}, but it seems that we don't enforce it. This should be changed to make the two fields equivalent, with one exception: when the framework/operator principal is {{None}}, we should allow the principal in {{DiskInfo}} to take any value, along the same lines as MESOS-5212.",3 MESOS-5006,"Add example for mesos-execute usage of Appc images in container-image.md.","Example usage for Appc flags and images needs to be added to container-image.md.",3 MESOS-5010,"Installation of mesos python package is incomplete","The installation of mesos python package is incomplete, i.e., the files {{cli.py}}, {{futures.py}}, and {{http.py}} are not installed. {code} % ../configure --enable-python % make install DESTDIR=$PWD/D % PYTHONPATH=$PWD/D/usr/local/lib/python2.7/site-packages:$PYTHONPATH python -c 'from mesos import http' Traceback (most recent call last): File """", line 1, in ImportError: cannot import name http {code} This appears to be first broken with {{d1d70b9}} (MESOS-3969, [Upgraded bundled pip to 7.1.2.|https://reviews.apache.org/r/40630]). Bisecting in {{pip}}-land shows that our install becomes broken for {{pip-6.0.1}} and later (we are using {{pip-7.1.2}}). ",2 MESOS-5013,"Add docker volume driver isolator for Mesos containerizer.","The isolator will interact with Docker Volume Driver Plugins to mount and unmount external volumes to container. ",8 MESOS-5014,"Call and Event Type enums in scheduler.proto should be optional","Having a 'required' Type enum has backwards compatibility issues when adding new enum types. See MESOS-4997 for details.",2 MESOS-5015,"Call and Event Type enums in executor.proto should be optional","Having a 'required' Type enum has backwards compatibility issues when adding new enum types. See MESOS-4997 for details.",2 MESOS-5016,"Add a reconnect() method to the C++ scheduler library","A reconnect() method on the library would allow the scheduler to force a reconnection (disconnect and reconnect) by the library. This might be used by the scheduler to react to lack of HEARTBEATs.",3 MESOS-5023,"MesosContainerizerProvisionerTest.DestroyWhileProvisioning is flaky.","Observed on the Apache Jenkins. {noformat} [ RUN ] MesosContainerizerProvisionerTest.ProvisionFailed I0324 13:38:56.284261 2948 containerizer.cpp:666] Starting container 'test_container' for executor 'executor' of framework '' I0324 13:38:56.285825 2939 containerizer.cpp:1421] Destroying container 'test_container' I0324 13:38:56.285854 2939 containerizer.cpp:1424] Waiting for the provisioner to complete for container 'test_container' [ OK ] MesosContainerizerProvisionerTest.ProvisionFailed (7 ms) [ RUN ] MesosContainerizerProvisionerTest.DestroyWhileProvisioning I0324 13:38:56.291187 2944 containerizer.cpp:666] Starting container 'c2316963-c6cb-4c7f-a3b9-17ca5931e5b2' for executor 'executor' of framework '' I0324 13:38:56.292157 2944 containerizer.cpp:1421] Destroying container 'c2316963-c6cb-4c7f-a3b9-17ca5931e5b2' I0324 13:38:56.292179 2944 containerizer.cpp:1424] Waiting for the provisioner to complete for container 'c2316963-c6cb-4c7f-a3b9-17ca5931e5b2' F0324 13:38:56.292899 2944 containerizer.cpp:752] Check failed: containers_.contains(containerId) *** Check failure stack trace: *** @ 0x2ac9973d0ae4 google::LogMessage::Fail() @ 0x2ac9973d0a30 google::LogMessage::SendToLog() @ 0x2ac9973d0432 google::LogMessage::Flush() @ 0x2ac9973d3346 google::LogMessageFatal::~LogMessageFatal() @ 0x2ac996af897c mesos::internal::slave::MesosContainerizerProcess::_launch() @ 0x2ac996b1f18a _ZZN7process8dispatchIbN5mesos8internal5slave25MesosContainerizerProcessERKNS1_11ContainerIDERK6OptionINS1_8TaskInfoEERKNS1_12ExecutorInfoERKSsRKS8_ISsERKNS1_7SlaveIDERKNS_3PIDINS3_5SlaveEEEbRKS8_INS3_13ProvisionInfoEES5_SA_SD_SsSI_SL_SQ_bSU_EENS_6FutureIT_EERKNSO_IT0_EEMS10_FSZ_T1_T2_T3_T4_T5_T6_T7_T8_T9_ET10_T11_T12_T13_T14_T15_T16_T17_T18_ENKUlPNS_11ProcessBaseEE_clES1P_ @ 0x2ac996b479d9 _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIbN5mesos8internal5slave25MesosContainerizerProcessERKNS5_11ContainerIDERK6OptionINS5_8TaskInfoEERKNS5_12ExecutorInfoERKSsRKSC_ISsERKNS5_7SlaveIDERKNS0_3PIDINS7_5SlaveEEEbRKSC_INS7_13ProvisionInfoEES9_SE_SH_SsSM_SP_SU_bSY_EENS0_6FutureIT_EERKNSS_IT0_EEMS14_FS13_T1_T2_T3_T4_T5_T6_T7_T8_T9_ET10_T11_T12_T13_T14_T15_T16_T17_T18_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ @ 0x2ac997334fef std::function<>::operator()() @ 0x2ac99731b1c7 process::ProcessBase::visit() @ 0x2ac997321154 process::DispatchEvent::visit() @ 0x9a699c process::ProcessBase::serve() @ 0x2ac9973173c0 process::ProcessManager::resume() @ 0x2ac99731445a _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_ @ 0x2ac997320916 _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0EEEET_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE @ 0x2ac9973208c6 _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_ @ 0x2ac997320858 _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE @ 0x2ac9973207af _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv @ 0x2ac997320748 _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv @ 0x2ac9989aea60 (unknown) @ 0x2ac999125182 start_thread @ 0x2ac99943547d (unknown) make[4]: Leaving directory `/mesos/mesos-0.29.0/_build/src' make[4]: *** [check-local] Aborted make[3]: *** [check-am] Error 2 make[3]: Leaving directory `/mesos/mesos-0.29.0/_build/src' make[2]: *** [check] Error 2 make[2]: Leaving directory `/mesos/mesos-0.29.0/_build/src' make[1]: *** [check-recursive] Error 1 make[1]: Leaving directory `/mesos/mesos-0.29.0/_build' make: *** [distcheck] Error 1 Build step 'Execute shell' marked build as failure {noformat}",2 MESOS-5027,"Enable authenticated login in the webui","The webui hits a number of endpoints to get the data that it displays: {{/state}}, {{/metrics/snapshot}}, {{/files/browse}}, {{/files/read}}, and maybe others? Once authentication is enabled on these endpoints, we need to add a login prompt to the webui so that users can provide credentials.",2 MESOS-5028,"Copy provisioner cannot replace directory with symlink","I'm trying to play with the new image provisioner on our custom docker images, but one of layer failed to get copied, possibly due to a dangling symlink. Error log with Glog_v=1: {quote} I0324 05:42:48.926678 15067 copy.cpp:127] Copying layer path '/tmp/mesos/store/docker/layers/5df0888641196b88dcc1b97d04c74839f02a73b8a194a79e134426d6a8fcb0f1/rootfs' to rootfs '/var/lib/mesos/provisioner/containers/5f05be6c-c970-4539-aa64-fd0eef2ec7ae/backends/copy/rootfses/507173f3-e316-48a3-a96e-5fdea9ffe9f6' E0324 05:42:49.028506 15062 slave.cpp:3773] Container '5f05be6c-c970-4539-aa64-fd0eef2ec7ae' for executor 'test' of framework 75932a89-1514-4011-bafe-beb6a208bb2d-0004 failed to start: Collect failed: Collect failed: Failed to copy layer: cp: cannot overwrite directory ‘/var/lib/mesos/provisioner/containers/5f05be6c-c970-4539-aa64-fd0eef2ec7ae/backends/copy/rootfses/507173f3-e316-48a3-a96e-5fdea9ffe9f6/etc/apt’ with non-directory {quote} Content of _/tmp/mesos/store/docker/layers/5df0888641196b88dcc1b97d04c74839f02a73b8a194a79e134426d6a8fcb0f1/rootfs/etc/apt_ points to a non-existing absolute path (cannot provide exact path but it's a result of us trying to mount apt keys into docker container at build time). I believe what happened is that we executed a script at build time, which contains equivalent of: {quote} rm -rf /etc/apt/* && ln -sf /build-mount-point/ /etc/apt {quote} ",3 MESOS-5031,"Authorization Action enum does not support upgrades.","We need to make the Action enum optional in authorization::Request, and add an `UNKNOWN = 0;` enum value. See MESOS-4997 for details.",2 MESOS-5032,"Remove plain text Credential format (after deprecation cycle)","Currently two formats of credentials are supported: JSON {code} ""credentials"": [ { ""principal"": ""sherman"", ""secret"": ""kitesurf"" } {code} And a deprecated new line file: {code} principal1 secret1 pricipal2 secret2 {code} We deprecated the new line format in 0.29, and should remove it after the deprecation cycle ends.",3 MESOS-5034,"Design doc for ordered message delivery in libprocess",NULL,3 MESOS-5044,"Temporary directories created by environment->mkdtemp cleanup can be problematic.","Currently in mesos test, we have the temporary directories created by `environment->mkdtemp()` cleaned up until the end of the test suite, which can be problematic. For instance, if we have many tests in a test suite, each of those tests is performing large size disk read/write in its temp dir, which may lead to out of disk issue on some resource limited machines. We should have these temp dir created by `environment->mkdtemp` cleaned up during each test teardown. Currently we only clean up the sandbox for each test.",1 MESOS-5049,"Refactore subproces setup functions.","Executing arbitrary setup functions while creating new processes is dangerous as all functions called have to be async safe. As setup functions are used for only very few purposes (setsid, chdir, monitoring and killing a process (see upcoming review) it makes sense to support them safely via parameters to subprocess. Another common use of child setup are is to block the child while doing some work in the parent. This pattern can be more cleanly expressed with parentHooks. ",3 MESOS-5050,"Design Linux capability support for Mesos containerizer","We should at least support the following cases: 1) A root user has reduced capability 2) A non-root user has the capability of CAP_NET_ADMIN (to do e.g., tcpdump)",5 MESOS-5051,"Create helpers for manipulating Linux capabilities.","These helpers can either based on some existing library (e.g. libcap), or use system calls directly.",5 MESOS-5054,"Namespace the stout flags","A recent name collision occurred when updating the 3rdparty http-parser library: https://github.com/apache/mesos/commit/94df63f72146501872a06c6487e94bdfd0f23025 We should put stout's {{flags}} namespace within another suitable namespace (perhaps {{stout::flags}}) to avoid such collisions.",2 MESOS-5055,"Slave/Agent Rename Phase I - Update strings in the log message and standard output","This is a sub ticket of MESOS-3780. In this ticket, we will rename all the slave to agent in the log messages and standard output.",2 MESOS-5056,"Replace Master/Slave Terminology Phase I - Update strings in the shell scripts outputs","This is a sub ticket of MESOS-3780. In this ticket, we will rename slave to agent in the shell script outputs",1 MESOS-5057,"Slave/Agent Rename Phase I - Update strings in error messages and other strings","This is a sub ticket of MESOS-3780. In this ticket, we will update all the slave to agent in the error messages and other strings in the code",3 MESOS-5062,"Update the long-lived-framework example to run on test clusters","There are a couple of problems with the long-lived framework that prevent it from being deployed (easily) on an actual cluster: * The framework will greedily accept all offers; it runs one executor per agent in the cluster. * The framework assumes the {{long-lived-executor}} binary is available on each agent. This is generally only true in the build environment or in single-agent test environments. * The framework does not specify an resources with the executor. This is required by many isolators. * The framework has no metrics.",3 MESOS-5064,"Remove default value for the agent `work_dir`","Following a crash report from the user we need to be more explicit about the dangers of using {{/tmp}} as agent {{work_dir}}. In addition, we can remove the default value for the {{\-\-work_dir}} flag, forcing users to explicitly set the work directory for the agent.",2 MESOS-5065,"Support docker private registry default docker config.","For docker private registry with authentication, docker containerizer should support using a default .docker/config.json file (or the old .dockercfg file) locally, which is pre-handled by operators. The default docker config file should be exposed by a new agent flag `--docker_config`. ",3 MESOS-5069,"Upgrade http-parser to v2.6.2",NULL,3 MESOS-5070,"Introduce more flexible subprocess interface for child options.","We introduced a number of parameters to the subprocess interface with MESOS-5049. Adding all options explicitly to the subprocess interface makes it inflexible. We should investigate a flexible options, which still prevents arbitrary code to be executed.",2 MESOS-5071,"Refactor the clone option to subprocess.","The clone option in subprocess is only used (at least in the Mesos codebase) to specify custom namespace flags to clone. It feels having the clone function in the subprocess interface is too explicit for this functionality. ",2 MESOS-5078,"Document TaskStatus reasons","We should document the possible {{reason}} values that can be found in the {{TaskStatus}} message.",1 MESOS-5082,"Fix a bug in the Nvidia GPU device isolator that exposes a discrepancy between clang and gcc in 'using' declarations","There appears to be a discrepancy between clang and gcc, which allows clang to accept `using` declarations of the form `using ns_name::name;` that contain nested classes, structs, and enums after the `name` field in the declaration (e.g. `using ns_name::name::enum;`). The language for describing this functionality is ambiguous in the C++11 specification as referenced here: http://en.cppreference.com/w/cpp/language/namespace#Using-declarations",1 MESOS-5101,"Add CMake build to docker_build.sh","Add the CMake build system to docker_build.sh to automatically test the build on Jenkins alongside gcc and clang.",2 MESOS-5108,"Design a short-term solution for a typed error handling mechanism.",NULL,2 MESOS-5109,"Capture the error code in `ErrnoError` and `WindowsError`.","The {{ErrnoError}} and {{WindowsError}} classes simply construct the error string via a mechanism such as {{strerror}}. They should also capture the error code, as it is an essential piece of information for such an error type.",2 MESOS-5110,"Introduce an additional template parameter to `Try` for typed error.","Add an additional template parameter {{E}} to the {{Try}} class template. {code} template class Try { /* ... */ }; {code}",3 MESOS-5111,"Update `network::connect` to use the typed error state of `Try`.","{{network::connect}} function returns a {{Try}} currently and the caller is required to inspect the state of {{errno}} out-of-band. {{network::connect}} should really return something like a {{Try}}.",2 MESOS-5112,"Introduce `WindowsSocketError`.","{{WindowsError}} invokes {{::GetLastError}} to retrieve the error code. Windows has a {{::WSAGetLastError}} function which at the interface level, is intended for failed socket operations. We should introduce a {{WindowsSocketError}} which invokes {{::WSAGetLastError}} and use them accordingly.",2 MESOS-5113,"`network/cni` isolator crashes when launched without the --network_cni_plugins_dir flag","If we start the agent with the --isolation='network/cni' but do not specify the --network_cni_plugins_dir flag, the agent crashes with the following stack dump: 0x00007ffff2324cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt #0 0x00007ffff2324cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x00007ffff23280d8 in __GI_abort () at abort.c:89 #2 0x00007ffff231db86 in __assert_fail_base (fmt=0x7ffff246e830 ""%s%s%s:%u: %s%sAssertion `%s' failed.\n%n"", assertion=assertion@entry=0x451f5c ""isSome()"", file=file@entry=0x451f65 ""../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp"", line=line@entry=111, function=function@entry=0x45294a ""const T &Option >::get() const & [T = std::basic_string]"") at assert.c:92 #3 0x00007ffff231dc32 in __GI___assert_fail (assertion=0x451f5c ""isSome()"", file=0x451f65 ""../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp"", line=111, function=0x45294a ""const T &Option >::get() const & [T = std::basic_string]"") at assert.c:101 #4 0x0000000000432c0d in Option::get() const & (this=0x6c1ea8) at ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:111 Python Exception list index out of range: #5 0x00007ffff63ef7cc in mesos::internal::slave::NetworkCniIsolatorProcess::recover (this=0x6c1e70, states=empty std::list, orphans=...) at ../../src/slave/containerizer/mesos/isolators/network/cni/cni.cpp:331 #6 0x00007ffff60cddd8 in operator() (this=0x7fffc0001e00, process=0x6c1ef8) at ../../3rdparty/libprocess/include/process/dispatch.hpp:239 #7 0x00007ffff60cd972 in std::_Function_handler process::dispatch > const&, hashset, std::equal_to > const&, std::list >, hashset, std::equal_to > >(process::PID const&, process::Future (mesos::internal::slave::MesosIsolatorProcess::*)(std::list > const&, hashset, std::equal_to > const&), std::list >, hashset, std::equal_to >)::{lambda(process::ProcessBase*)#1}>::_M_invoke(std::_Any_data const&, process::ProcessBase*) (__functor=..., __args=0x6c1ef8) at /usr/bin/../lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/functional:2071 #8 0x00007ffff6a6bf38 in std::function::operator()(process::ProcessBase*) const (this=0x7fffc0001d70, __args=0x6c1ef8) at /usr/bin/../lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/functional:2471 #9 0x00007ffff6a561b4 in process::ProcessBase::visit (this=0x6c1ef8, event=...) at ../../../3rdparty/libprocess/src/process.cpp:3130 #10 0x00007ffff6aac5fe in process::DispatchEvent::visit (this=0x7fffc0001570, visitor=0x6c1ef8) at ../../../3rdparty/libprocess/include/process/event.hpp:161 #11 0x00007ffff55e9c91 in process::ProcessBase::serve (this=0x6c1ef8, event=...) at ../../3rdparty/libprocess/include/process/process.hpp:82 #12 0x00007ffff6a53ed4 in process::ProcessManager::resume (this=0x67cca0, process=0x6c1ef8) at ../../../3rdparty/libprocess/src/process.cpp:2570 #13 0x00007ffff6a5bff5 in operator() (this=0x697d70, joining=...) at ../../../3rdparty/libprocess/src/process.cpp:2218 #14 0x00007ffff6a5bf33 in std::_Bind)>::__call(std::tuple<>&&, std::_Index_tuple<0ul>) (this=0x697d70, __args=) at /usr/bin/../lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/functional:1295 #15 0x00007ffff6a5bee6 in std::_Bind)>::operator()<, void>() (this=0x697d70) at /usr/bin/../lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/functional:1353 #16 0x00007ffff6a5be95 in std::_Bind_simple)> ()>::_M_invoke<>(std::_Index_tuple<>) (this=0x697d70) at /usr/bin/../lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/functional:1731 #17 0x00007ffff6a5be65 in std::_Bind_simple)> ()>::operator()() (this=0x697d70) at /usr/bin/../lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/functional:1720 #18 0x00007ffff6a5be3c in std::thread::_Impl)> ()> >::_M_run() (this=0x697d58) at /usr/bin/../lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/thread:115 #19 0x00007ffff2b98a60 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #20 0x00007ffff26bb182 in start_thread (arg=0x7fffeb92d700) at pthread_create.c:312 #21 0x00007ffff23e847d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 (gdb) frame 4 #4 0x0000000000432c0d in Option::get() const & (this=0x6c1ea8) at ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:111",1 MESOS-5114,"Flags::parse does not handle empty string correctly.","A missing default for quorum size has generated the following master config {code} MESOS_WORK_DIR=""/var/lib/mesos/master"" MESOS_ZK=""zk://zk1:2181,zk2:2181,zk3:2181/mesos"" MESOS_QUORUM= MESOS_PORT=5050 MESOS_CLUSTER=""mesos"" MESOS_LOG_DIR=""/var/log/mesos"" MESOS_LOGBUFSECS=1 MESOS_LOGGING_LEVEL=""INFO"" {code} This was causing each elected leader to attempt replica recovery. E.g. {{group.cpp:700] Trying to get '/mesos/log_replicas/0000000012' in ZooKeeper}} And eventually: {{master.cpp:1458] Recovery failed: Failed to recover registrar: Failed to perform fetch within 1mins}} Full log on one of the masters https://gist.github.com/clehene/09a9ddfe49b92a5deb4c1b421f63479e All masters and zk nodes were reachable over the network. Also once the quorum was configured the master recovery protocol finished gracefully. ",2 MESOS-5115,"Grant access to /dev/nvidiactl and /dev/nvidia-uvm in the Nvidia GPU isolator."," Calls to 'nvidia-smi' fail inside a container even if access to a GPU has been granted. Moreover, access to /dev/nvidiactl is actually required for a container to do anything useful with a GPU even if it has access to it. We should grant/revoke access to /dev/nvidiactl and /dev/nvidia-uvm as GPUs are added and removed from a container in the Nvidia GPU isolator.",2 MESOS-5121,"pivot_root is not available on PowerPC","When compile on ppc64le, it will through error message: src/linux/fs.cpp:443:2: error: #error ""pivot_root is not available"" The current code logic in src/linux/fs.cpp is: {code} #ifdef __NR_pivot_root int ret = ::syscall(__NR_pivot_root, newRoot.c_str(), putOld.c_str()); #elif __x86_64__ // A workaround for systems that have an old glib but have a new // kernel. The magic number '155' is the syscall number for // 'pivot_root' on the x86_64 architecture, see // arch/x86/syscalls/syscall_64.tbl int ret = ::syscall(155, newRoot.c_str(), putOld.c_str()); #else #error ""pivot_root is not available"" #endif {code} There is no old glib version and the new kernel version, it will never run code in *#ifdef __NR_pivot_root* condition, and when I build on Ubuntu 16.04(It has the latest linux kernel and glibc), it still can't step into the *#ifdef __NR_pivot_root* condition. For powerpc case, I added another condition: {code} #elif __powerpc__ || __ppc__ || __powerpc64__ || __ppc64__ // A workaround for powerpc. The magic number '203' is the syscall // number for 'pivot_root' on the powerpc architecture, see // https://w3challs.com/syscalls/?arch=powerpc_64 int ret = ::syscall(203, newRoot.c_str(), putOld.c_str()); {code}",1 MESOS-5124,"TASK_KILLING is not supported by mesos-execute.","Recently {{TASK_KILLING}} state (MESOS-4547) have been introduced to Mesos. We should add support for this feature to {{mesos-execute}}.",3 MESOS-5125,"Commit message hook iterates over words, rather than lines.","{{for LINE in $COMMIT_MESSAGE}} iterates over one word at a time, rather than one line at a time. We should use the following pattern instead: {code} while read LINE; do ... done <<< ""$COMMIT_MESSAGE"" {code}",2 MESOS-5126,"Commit message hook iterates over the commented lines.","Currently, the commit message hook iterates over the commented lines. For example, if there is a modified file for which its path is longer than 72 characters, the commit hook errors out. We should skip over the commented lines.",2 MESOS-5127,"Reset `LIBPROCESS_IP` in `network\cni` isolator.","Currently the `LIBPROCESS_IP` environment variable was being set to the Agent IP if the environment variable has not be defined by the `Framework`. For containers having their own IP address (as with containers on CNI networks) this becomes a problem since the command executor tries to bind to the `LIBPROCESS_IP` that does not exist in its network namespace, and fails. Thus, for containers launched on CNI networks the `LIBPROCESS_IP` should not be set, or rather is set to ""0.0.0.0"", allowing the container to bind to the IP address provided by the CNI network.",1 MESOS-5128,"PersistentVolumeTest.AccessPersistentVolume is flaky","Observed on ASF CI: {code} [ RUN ] DiskResource/PersistentVolumeTest.AccessPersistentVolume/0 I0405 17:29:19.134435 31837 cluster.cpp:139] Creating default 'local' authorizer I0405 17:29:19.251143 31837 leveldb.cpp:174] Opened db in 116.386403ms I0405 17:29:19.310050 31837 leveldb.cpp:181] Compacted db in 58.80688ms I0405 17:29:19.310180 31837 leveldb.cpp:196] Created db iterator in 37145ns I0405 17:29:19.310199 31837 leveldb.cpp:202] Seeked to beginning of db in 4212ns I0405 17:29:19.310210 31837 leveldb.cpp:271] Iterated through 0 keys in the db in 410ns I0405 17:29:19.310279 31837 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0405 17:29:19.311069 31861 recover.cpp:447] Starting replica recovery I0405 17:29:19.311362 31861 recover.cpp:473] Replica is in EMPTY status I0405 17:29:19.312641 31861 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (14359)@172.17.0.4:43972 I0405 17:29:19.313045 31860 recover.cpp:193] Received a recover response from a replica in EMPTY status I0405 17:29:19.313608 31860 recover.cpp:564] Updating replica status to STARTING I0405 17:29:19.316416 31867 master.cpp:376] Master 9565ff6f-f1b6-4259-8430-690e635c391f (4090d10eba90) started on 172.17.0.4:43972 I0405 17:29:19.316470 31867 master.cpp:378] Flags at startup: --acls="""" --allocation_interval=""1secs"" --allocator=""HierarchicalDRF"" --authenticate=""true"" --authenticate_http=""true"" --authenticate_slaves=""true"" --authenticators=""crammd5"" --authorizers=""local"" --credentials=""/tmp/0A9ELu/credentials"" --framework_sorter=""drf"" --help=""false"" --hostname_lookup=""true"" --http_authenticators=""basic"" --initialize_driver_logging=""true"" --log_auto_initialize=""true"" --logbufsecs=""0"" --logging_level=""INFO"" --max_completed_frameworks=""50"" --max_completed_tasks_per_framework=""1000"" --max_slave_ping_timeouts=""5"" --quiet=""false"" --recovery_slave_removal_limit=""100%"" --registry=""replicated_log"" --registry_fetch_timeout=""1mins"" --registry_store_timeout=""100secs"" --registry_strict=""true"" --root_submissions=""true"" --slave_ping_timeout=""15secs"" --slave_reregister_timeout=""10mins"" --user_sorter=""drf"" --version=""false"" --webui_dir=""/mesos/mesos-0.29.0/_inst/share/mesos/webui"" --work_dir=""/tmp/0A9ELu/master"" --zk_session_timeout=""10secs"" I0405 17:29:19.316938 31867 master.cpp:427] Master only allowing authenticated frameworks to register I0405 17:29:19.316951 31867 master.cpp:432] Master only allowing authenticated agents to register I0405 17:29:19.316961 31867 credentials.hpp:37] Loading credentials for authentication from '/tmp/0A9ELu/credentials' I0405 17:29:19.317402 31867 master.cpp:474] Using default 'crammd5' authenticator I0405 17:29:19.317643 31867 master.cpp:545] Using default 'basic' HTTP authenticator I0405 17:29:19.317854 31867 master.cpp:583] Authorization enabled I0405 17:29:19.318081 31864 whitelist_watcher.cpp:77] No whitelist given I0405 17:29:19.318079 31861 hierarchical.cpp:144] Initialized hierarchical allocator process I0405 17:29:19.320838 31864 master.cpp:1826] The newly elected leader is master@172.17.0.4:43972 with id 9565ff6f-f1b6-4259-8430-690e635c391f I0405 17:29:19.320888 31864 master.cpp:1839] Elected as the leading master! I0405 17:29:19.320909 31864 master.cpp:1526] Recovering from registrar I0405 17:29:19.321218 31871 registrar.cpp:331] Recovering registrar I0405 17:29:19.347045 31860 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 33.164133ms I0405 17:29:19.347126 31860 replica.cpp:320] Persisted replica status to STARTING I0405 17:29:19.347611 31869 recover.cpp:473] Replica is in STARTING status I0405 17:29:19.349215 31871 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (14361)@172.17.0.4:43972 I0405 17:29:19.349653 31870 recover.cpp:193] Received a recover response from a replica in STARTING status I0405 17:29:19.350236 31866 recover.cpp:564] Updating replica status to VOTING I0405 17:29:19.388882 31864 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 38.38299ms I0405 17:29:19.388993 31864 replica.cpp:320] Persisted replica status to VOTING I0405 17:29:19.389369 31856 recover.cpp:578] Successfully joined the Paxos group I0405 17:29:19.389735 31856 recover.cpp:462] Recover process terminated I0405 17:29:19.390476 31868 log.cpp:659] Attempting to start the writer I0405 17:29:19.392125 31862 replica.cpp:493] Replica received implicit promise request from (14362)@172.17.0.4:43972 with proposal 1 I0405 17:29:19.430706 31862 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 38.505062ms I0405 17:29:19.430816 31862 replica.cpp:342] Persisted promised to 1 I0405 17:29:19.431918 31856 coordinator.cpp:238] Coordinator attempting to fill missing positions I0405 17:29:19.433725 31861 replica.cpp:388] Replica received explicit promise request from (14363)@172.17.0.4:43972 for position 0 with proposal 2 I0405 17:29:19.472491 31861 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 38.659492ms I0405 17:29:19.472595 31861 replica.cpp:712] Persisted action at 0 I0405 17:29:19.474556 31864 replica.cpp:537] Replica received write request for position 0 from (14364)@172.17.0.4:43972 I0405 17:29:19.474652 31864 leveldb.cpp:436] Reading position from leveldb took 49423ns I0405 17:29:19.528175 31864 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 53.443616ms I0405 17:29:19.528300 31864 replica.cpp:712] Persisted action at 0 I0405 17:29:19.529389 31865 replica.cpp:691] Replica received learned notice for position 0 from @0.0.0.0:0 I0405 17:29:19.571137 31865 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 41.676495ms I0405 17:29:19.571254 31865 replica.cpp:712] Persisted action at 0 I0405 17:29:19.571302 31865 replica.cpp:697] Replica learned NOP action at position 0 I0405 17:29:19.572322 31856 log.cpp:675] Writer started with ending position 0 I0405 17:29:19.574060 31861 leveldb.cpp:436] Reading position from leveldb took 83200ns I0405 17:29:19.575417 31864 registrar.cpp:364] Successfully fetched the registry (0B) in 0ns I0405 17:29:19.575565 31864 registrar.cpp:463] Applied 1 operations in 46419ns; attempting to update the 'registry' I0405 17:29:19.576517 31857 log.cpp:683] Attempting to append 170 bytes to the log I0405 17:29:19.576849 31857 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 1 I0405 17:29:19.578390 31857 replica.cpp:537] Replica received write request for position 1 from (14365)@172.17.0.4:43972 I0405 17:29:19.780277 31857 leveldb.cpp:341] Persisting action (189 bytes) to leveldb took 201.808617ms I0405 17:29:19.780366 31857 replica.cpp:712] Persisted action at 1 I0405 17:29:19.782024 31857 replica.cpp:691] Replica received learned notice for position 1 from @0.0.0.0:0 I0405 17:29:19.823770 31857 leveldb.cpp:341] Persisting action (191 bytes) to leveldb took 41.667662ms I0405 17:29:19.823851 31857 replica.cpp:712] Persisted action at 1 I0405 17:29:19.823889 31857 replica.cpp:697] Replica learned APPEND action at position 1 I0405 17:29:19.825701 31867 registrar.cpp:508] Successfully updated the 'registry' in 0ns I0405 17:29:19.825929 31867 registrar.cpp:394] Successfully recovered registrar I0405 17:29:19.826015 31857 log.cpp:702] Attempting to truncate the log to 1 I0405 17:29:19.826262 31867 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 2 I0405 17:29:19.827647 31867 replica.cpp:537] Replica received write request for position 2 from (14366)@172.17.0.4:43972 I0405 17:29:19.828018 31857 master.cpp:1634] Recovered 0 agents from the Registry (131B) ; allowing 10mins for agents to re-register I0405 17:29:19.828065 31861 hierarchical.cpp:171] Skipping recovery of hierarchical allocator: nothing to recover I0405 17:29:19.865555 31867 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 37.822178ms I0405 17:29:19.865661 31867 replica.cpp:712] Persisted action at 2 I0405 17:29:19.866921 31867 replica.cpp:691] Replica received learned notice for position 2 from @0.0.0.0:0 I0405 17:29:19.907341 31867 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 40.356649ms I0405 17:29:19.907531 31867 leveldb.cpp:399] Deleting ~1 keys from leveldb took 91109ns I0405 17:29:19.907560 31867 replica.cpp:712] Persisted action at 2 I0405 17:29:19.907599 31867 replica.cpp:697] Replica learned TRUNCATE action at position 2 I0405 17:29:19.923305 31837 resources.cpp:572] Parsing resources as JSON failed: cpus:2;mem:2048 Trying semicolon-delimited string format instead I0405 17:29:19.926491 31837 containerizer.cpp:155] Using isolation: posix/cpu,posix/mem,filesystem/posix W0405 17:29:19.927836 31837 backend.cpp:66] Failed to create 'bind' backend: BindBackend requires root privileges I0405 17:29:19.932029 31862 slave.cpp:200] Agent started on 441)@172.17.0.4:43972 I0405 17:29:19.932086 31862 slave.cpp:201] Flags at startup: --appc_simple_discovery_uri_prefix=""http://"" --appc_store_dir=""/tmp/mesos/store/appc"" --authenticate_http=""true"" --authenticatee=""crammd5"" --cgroups_cpu_enable_pids_and_tids_count=""false"" --cgroups_enable_cfs=""false"" --cgroups_hierarchy=""/sys/fs/cgroup"" --cgroups_limit_swap=""false"" --cgroups_root=""mesos"" --container_disk_watch_interval=""15secs"" --containerizers=""mesos"" --credential=""/tmp/DiskResource_PersistentVolumeTest_AccessPersistentVolume_0_fJS7AC/credential"" --default_role=""*"" --disk_watch_interval=""1mins"" --docker=""docker"" --docker_kill_orphans=""true"" --docker_registry=""https://registry-1.docker.io"" --docker_remove_delay=""6hrs"" --docker_socket=""/var/run/docker.sock"" --docker_stop_timeout=""0ns"" --docker_store_dir=""/tmp/mesos/store/docker"" --enforce_container_disk_quota=""false"" --executor_registration_timeout=""1mins"" --executor_shutdown_grace_period=""5secs"" --fetcher_cache_dir=""/tmp/DiskResource_PersistentVolumeTest_AccessPersistentVolume_0_fJS7AC/fetch"" --fetcher_cache_size=""2GB"" --frameworks_home="""" --gc_delay=""1weeks"" --gc_disk_headroom=""0.1"" --hadoop_home="""" --help=""false"" --hostname_lookup=""true"" --http_authenticators=""basic"" --http_credentials=""/tmp/DiskResource_PersistentVolumeTest_AccessPersistentVolume_0_fJS7AC/http_credentials"" --image_provisioner_backend=""copy"" --initialize_driver_logging=""true"" --isolation=""posix/cpu,posix/mem"" --launcher_dir=""/mesos/mesos-0.29.0/_build/src"" --logbufsecs=""0"" --logging_level=""INFO"" --oversubscribed_resources_interval=""15secs"" --perf_duration=""10secs"" --perf_interval=""1mins"" --qos_correction_interval_min=""0ns"" --quiet=""false"" --recover=""reconnect"" --recovery_timeout=""15mins"" --registration_backoff_factor=""10ms"" --resources=""[{""name"":""cpus"",""role"":""*"",""scalar"":{""value"":2.0},""type"":""SCALAR""},{""name"":""mem"",""role"":""*"",""scalar"":{""value"":2048.0},""type"":""SCALAR""},{""name"":""disk"",""role"":""role1"",""scalar"":{""value"":4096.0},""type"":""SCALAR""}]"" --revocable_cpu_low_priority=""true"" --sandbox_directory=""/mnt/mesos/sandbox"" --strict=""true"" --switch_user=""true"" --systemd_enable_support=""true"" --systemd_runtime_directory=""/run/systemd/system"" --version=""false"" --work_dir=""/tmp/DiskResource_PersistentVolumeTest_AccessPersistentVolume_0_fJS7AC"" I0405 17:29:19.932665 31862 credentials.hpp:86] Loading credential for authentication from '/tmp/DiskResource_PersistentVolumeTest_AccessPersistentVolume_0_fJS7AC/credential' I0405 17:29:19.932934 31862 slave.cpp:338] Agent using credential for: test-principal I0405 17:29:19.932968 31862 credentials.hpp:37] Loading credentials for authentication from '/tmp/DiskResource_PersistentVolumeTest_AccessPersistentVolume_0_fJS7AC/http_credentials' I0405 17:29:19.933284 31862 slave.cpp:390] Using default 'basic' HTTP authenticator I0405 17:29:19.934916 31837 sched.cpp:222] Version: 0.29.0 I0405 17:29:19.935566 31862 slave.cpp:589] Agent resources: cpus(*):2; mem(*):2048; disk(role1):4096; ports(*):[31000-32000] I0405 17:29:19.935664 31862 slave.cpp:597] Agent attributes: [ ] I0405 17:29:19.935679 31862 slave.cpp:602] Agent hostname: 4090d10eba90 I0405 17:29:19.938390 31864 state.cpp:57] Recovering state from '/tmp/DiskResource_PersistentVolumeTest_AccessPersistentVolume_0_fJS7AC/meta' I0405 17:29:19.940608 31869 sched.cpp:326] New master detected at master@172.17.0.4:43972 I0405 17:29:19.940749 31869 sched.cpp:382] Authenticating with master master@172.17.0.4:43972 I0405 17:29:19.940773 31869 sched.cpp:389] Using default CRAM-MD5 authenticatee I0405 17:29:19.942371 31869 authenticatee.cpp:121] Creating new client SASL connection I0405 17:29:19.942873 31859 master.cpp:5679] Authenticating scheduler-bdf68f7f-d938-47ed-a132-bb3f218628bf@172.17.0.4:43972 I0405 17:29:19.943156 31859 authenticator.cpp:413] Starting authentication session for crammd5_authenticatee(896)@172.17.0.4:43972 I0405 17:29:19.943507 31863 authenticator.cpp:98] Creating new server SASL connection I0405 17:29:19.943740 31859 authenticatee.cpp:212] Received SASL authentication mechanisms: CRAM-MD5 I0405 17:29:19.943783 31859 authenticatee.cpp:238] Attempting to authenticate with mechanism 'CRAM-MD5' I0405 17:29:19.943892 31859 authenticator.cpp:203] Received SASL authentication start I0405 17:29:19.943977 31859 authenticator.cpp:325] Authentication requires more steps I0405 17:29:19.944066 31859 authenticatee.cpp:258] Received SASL authentication step I0405 17:29:19.944164 31859 authenticator.cpp:231] Received SASL authentication step I0405 17:29:19.944193 31859 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: '4090d10eba90' server FQDN: '4090d10eba90' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0405 17:29:19.944206 31859 auxprop.cpp:179] Looking up auxiliary property '*userPassword' I0405 17:29:19.944268 31859 auxprop.cpp:179] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0405 17:29:19.944300 31859 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: '4090d10eba90' server FQDN: '4090d10eba90' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0405 17:29:19.944313 31859 auxprop.cpp:129] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0405 17:29:19.944321 31859 auxprop.cpp:129] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0405 17:29:19.944339 31859 authenticator.cpp:317] Authentication success I0405 17:29:19.944541 31859 authenticatee.cpp:298] Authentication success I0405 17:29:19.944655 31859 master.cpp:5709] Successfully authenticated principal 'test-principal' at scheduler-bdf68f7f-d938-47ed-a132-bb3f218628bf@172.17.0.4:43972 I0405 17:29:19.944737 31859 authenticator.cpp:431] Authentication session cleanup for crammd5_authenticatee(896)@172.17.0.4:43972 I0405 17:29:19.945111 31859 sched.cpp:472] Successfully authenticated with master master@172.17.0.4:43972 I0405 17:29:19.945132 31859 sched.cpp:777] Sending SUBSCRIBE call to master@172.17.0.4:43972 I0405 17:29:19.945591 31859 sched.cpp:810] Will retry registration in 372.80738ms if necessary I0405 17:29:19.945744 31865 master.cpp:2346] Received SUBSCRIBE call for framework 'default' at scheduler-bdf68f7f-d938-47ed-a132-bb3f218628bf@172.17.0.4:43972 I0405 17:29:19.945838 31865 master.cpp:1865] Authorizing framework principal 'test-principal' to receive offers for role 'role1' I0405 17:29:19.946194 31865 master.cpp:2417] Subscribing framework default with checkpointing disabled and capabilities [ ] I0405 17:29:19.946866 31866 hierarchical.cpp:266] Added framework 9565ff6f-f1b6-4259-8430-690e635c391f-0000 I0405 17:29:19.946974 31866 hierarchical.cpp:1490] No resources available to allocate! I0405 17:29:19.947010 31866 hierarchical.cpp:1585] No inverse offers to send out! I0405 17:29:19.947054 31865 sched.cpp:704] Framework registered with 9565ff6f-f1b6-4259-8430-690e635c391f-0000 I0405 17:29:19.947074 31866 hierarchical.cpp:1141] Performed allocation for 0 agents in 178242ns I0405 17:29:19.947124 31865 sched.cpp:718] Scheduler::registered took 38907ns I0405 17:29:19.948712 31866 status_update_manager.cpp:200] Recovering status update manager I0405 17:29:19.948901 31866 containerizer.cpp:416] Recovering containerizer I0405 17:29:19.951021 31866 provisioner.cpp:245] Provisioner recovery complete I0405 17:29:19.951802 31866 slave.cpp:4773] Finished recovery I0405 17:29:19.952518 31866 slave.cpp:4945] Querying resource estimator for oversubscribable resources I0405 17:29:19.953248 31866 slave.cpp:928] New master detected at master@172.17.0.4:43972 I0405 17:29:19.953305 31865 status_update_manager.cpp:174] Pausing sending status updates I0405 17:29:19.953626 31866 slave.cpp:991] Authenticating with master master@172.17.0.4:43972 I0405 17:29:19.953716 31866 slave.cpp:996] Using default CRAM-MD5 authenticatee I0405 17:29:19.954074 31866 slave.cpp:964] Detecting new master I0405 17:29:19.954167 31861 authenticatee.cpp:121] Creating new client SASL connection I0405 17:29:19.954372 31866 slave.cpp:4959] Received oversubscribable resources from the resource estimator I0405 17:29:19.954756 31866 master.cpp:5679] Authenticating slave(441)@172.17.0.4:43972 I0405 17:29:19.954944 31861 authenticator.cpp:413] Starting authentication session for crammd5_authenticatee(897)@172.17.0.4:43972 I0405 17:29:19.955368 31863 authenticator.cpp:98] Creating new server SASL connection I0405 17:29:19.955687 31861 authenticatee.cpp:212] Received SASL authentication mechanisms: CRAM-MD5 I0405 17:29:19.955801 31861 authenticatee.cpp:238] Attempting to authenticate with mechanism 'CRAM-MD5' I0405 17:29:19.956075 31861 authenticator.cpp:203] Received SASL authentication start I0405 17:29:19.956279 31861 authenticator.cpp:325] Authentication requires more steps I0405 17:29:19.956455 31861 authenticatee.cpp:258] Received SASL authentication step I0405 17:29:19.956676 31861 authenticator.cpp:231] Received SASL authentication step I0405 17:29:19.956815 31861 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: '4090d10eba90' server FQDN: '4090d10eba90' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0405 17:29:19.956907 31861 auxprop.cpp:179] Looking up auxiliary property '*userPassword' I0405 17:29:19.957044 31861 auxprop.cpp:179] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0405 17:29:19.957166 31861 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: '4090d10eba90' server FQDN: '4090d10eba90' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0405 17:29:19.957264 31861 auxprop.cpp:129] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0405 17:29:19.957353 31861 auxprop.cpp:129] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0405 17:29:19.957449 31861 authenticator.cpp:317] Authentication success I0405 17:29:19.957664 31857 authenticatee.cpp:298] Authentication success I0405 17:29:19.957813 31857 master.cpp:5709] Successfully authenticated principal 'test-principal' at slave(441)@172.17.0.4:43972 I0405 17:29:19.958008 31861 authenticator.cpp:431] Authentication session cleanup for crammd5_authenticatee(897)@172.17.0.4:43972 I0405 17:29:19.958732 31857 slave.cpp:1061] Successfully authenticated with master master@172.17.0.4:43972 I0405 17:29:19.958930 31857 slave.cpp:1457] Will retry registration in 18.568334ms if necessary I0405 17:29:19.959262 31857 master.cpp:4390] Registering agent at slave(441)@172.17.0.4:43972 (4090d10eba90) with id 9565ff6f-f1b6-4259-8430-690e635c391f-S0 I0405 17:29:19.959934 31857 registrar.cpp:463] Applied 1 operations in 99197ns; attempting to update the 'registry' I0405 17:29:19.961587 31857 log.cpp:683] Attempting to append 343 bytes to the log I0405 17:29:19.9...",3 MESOS-5130,"Enable `newtork/cni` isolator in `MesosContainerizer` as the default `network` isolator.","Currently there are no default `network` isolators for `MesosContainerizer`. With the development of the `network/cni` isolator we have an interface to run Mesos on multitude of IP networks. Given that its based on an open standard (the CNI spec) which is gathering a lot of traction from vendors (calico, weave, coreOS) and already works on some default networks (bridge, ipvlan, macvlan) it makes sense to make it as the default network isolator.",1 MESOS-5132,"Commit message hook lints the diff in verbose mode.","In verbose mode (i.e., {{git commit --verbose}}), the commit message includes the diff of the commit at the bottom, delimited by the following lines: {code} # ------------------------ >8 ------------------------ # Do not touch the line above. # Everything below will be removed. {code} We should {{break}} once we encounter such a line.",2 MESOS-5133,"Expose TaskStatus source & reason in master's '/state' output","It would be helpful if the TaskStatus lists provided by the master's {{/state}} endpoint included the {{source}} and {{reason}} associated with the status message. The JSON modeling function for TaskStatus should be extended to include these fields.",1 MESOS-5135,"Update existing documentation to Include references to GPUs as a first class resource.","Specifically, the documentation in the following files should be udated: {noformat} docs/attributes-resources.md docs/monitoring.md {noformat}",1 MESOS-5136,"Update the default JSON representation of a Resource to include GPUs","The default JSON representation of a Resource currently lists a value of ""0"" if no value is set on a first class SCALAR resource (i.e. cpus, mem, disk). We should add GPUs in here as well. ",1 MESOS-5137,"Remove 'dashboard.js' from the webui.","This file is no longer in use anywhere.",1 MESOS-5138,"Fix Nvidia GPU test build for namespace change of MasterDetector","An update to master the day after all of the Nvidia GPU stuff landed has a build error in the Nvidia GPU tests. The namespace that MasterDetector lives in has changed and the test needs to be updated to pull in the class from the proper namespace now.",1 MESOS-5139,"ProvisionerDockerLocalStoreTest.LocalStoreTestWithTar is flaky","Found this on ASF CI while testing 0.28.1-rc2 {code} [ RUN ] ProvisionerDockerLocalStoreTest.LocalStoreTestWithTar E0406 18:29:30.870481 520 shell.hpp:93] Command 'hadoop version 2>&1' failed; this is the output: sh: 1: hadoop: not found E0406 18:29:30.870576 520 fetcher.cpp:59] Failed to create URI fetcher plugin 'hadoop': Failed to create HDFS client: Failed to execute 'hadoop version 2>&1'; the command was either not found or exited with a non-zero exit status: 127 I0406 18:29:30.871052 520 local_puller.cpp:90] Creating local puller with docker registry '/tmp/3l8ZBv/images' I0406 18:29:30.873325 539 metadata_manager.cpp:159] Looking for image 'abc' I0406 18:29:30.874438 539 local_puller.cpp:142] Untarring image 'abc' from '/tmp/3l8ZBv/images/abc.tar' to '/tmp/3l8ZBv/store/staging/5tw8bD' I0406 18:29:30.901916 547 local_puller.cpp:162] The repositories JSON file for image 'abc' is '{""abc"":{""latest"":""456""}}' I0406 18:29:30.902304 547 local_puller.cpp:290] Extracting layer tar ball '/tmp/3l8ZBv/store/staging/5tw8bD/123/layer.tar to rootfs '/tmp/3l8ZBv/store/staging/5tw8bD/123/rootfs' I0406 18:29:30.909144 547 local_puller.cpp:290] Extracting layer tar ball '/tmp/3l8ZBv/store/staging/5tw8bD/456/layer.tar to rootfs '/tmp/3l8ZBv/store/staging/5tw8bD/456/rootfs' ../../src/tests/containerizer/provisioner_docker_tests.cpp:183: Failure (imageInfo).failure(): Collect failed: Subprocess 'tar, tar, -x, -f, /tmp/3l8ZBv/store/staging/5tw8bD/456/layer.tar, -C, /tmp/3l8ZBv/store/staging/5tw8bD/456/rootfs' failed: tar: This does not look like a tar archive tar: Exiting with failure status due to previous errors [ FAILED ] ProvisionerDockerLocalStoreTest.LocalStoreTestWithTar (243 ms) {code}",2 MESOS-5142,"Add agent flags for HTTP authorization.","Flags should be added to the agent to: 1. Enable authorization ({{--authorizers}}) 2. Provide ACLs ({{--acls}})",2 MESOS-5144,"Cleanup memory leaks in libprocess finalize()","libprocess's {{finalize}} function currently leaks memory for a few different reasons. Cleaning up the {{SocketManager}} will be somewhat involved (MESOS-3910), but the remaining memory leaks should be fairly easy to address.",2 MESOS-5146,"MasterAllocatorTest/1.RebalancedForUpdatedWeights is flaky.","Observed on the ASF CI: {code} [ RUN ] MasterAllocatorTest/1.RebalancedForUpdatedWeights I0407 22:34:10.330394 29278 cluster.cpp:149] Creating default 'local' authorizer I0407 22:34:10.466182 29278 leveldb.cpp:174] Opened db in 135.608207ms I0407 22:34:10.516398 29278 leveldb.cpp:181] Compacted db in 50.159558ms I0407 22:34:10.516464 29278 leveldb.cpp:196] Created db iterator in 34959ns I0407 22:34:10.516484 29278 leveldb.cpp:202] Seeked to beginning of db in 10195ns I0407 22:34:10.516496 29278 leveldb.cpp:271] Iterated through 0 keys in the db in 7324ns I0407 22:34:10.516547 29278 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0407 22:34:10.517277 29298 recover.cpp:447] Starting replica recovery I0407 22:34:10.517693 29300 recover.cpp:473] Replica is in EMPTY status I0407 22:34:10.520251 29310 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (4775)@172.17.0.3:35855 I0407 22:34:10.520611 29311 recover.cpp:193] Received a recover response from a replica in EMPTY status I0407 22:34:10.521164 29299 recover.cpp:564] Updating replica status to STARTING I0407 22:34:10.523435 29298 master.cpp:382] Master f59f9057-a5c7-43e1-b129-96862e640a12 (129e11060069) started on 172.17.0.3:35855 I0407 22:34:10.523473 29298 master.cpp:384] Flags at startup: --acls="""" --allocation_interval=""1secs"" --allocator=""HierarchicalDRF"" --authenticate=""true"" --authenticate_http=""true"" --authenticate_slaves=""true"" --authenticators=""crammd5"" --authorizers=""local"" --credentials=""/tmp/3rZY8C/credentials"" --framework_sorter=""drf"" --help=""false"" --hostname_lookup=""true"" --http_authenticators=""basic"" --initialize_driver_logging=""true"" --log_auto_initialize=""true"" --logbufsecs=""0"" --logging_level=""INFO"" --max_completed_frameworks=""50"" --max_completed_tasks_per_framework=""1000"" --max_slave_ping_timeouts=""5"" --quiet=""false"" --recovery_slave_removal_limit=""100%"" --registry=""replicated_log"" --registry_fetch_timeout=""1mins"" --registry_store_timeout=""100secs"" --registry_strict=""true"" --root_submissions=""true"" --slave_ping_timeout=""15secs"" --slave_reregister_timeout=""10mins"" --user_sorter=""drf"" --version=""false"" --webui_dir=""/mesos/mesos-0.29.0/_inst/share/mesos/webui"" --work_dir=""/tmp/3rZY8C/master"" --zk_session_timeout=""10secs"" I0407 22:34:10.523885 29298 master.cpp:433] Master only allowing authenticated frameworks to register I0407 22:34:10.523901 29298 master.cpp:438] Master only allowing authenticated agents to register I0407 22:34:10.523913 29298 credentials.hpp:37] Loading credentials for authentication from '/tmp/3rZY8C/credentials' I0407 22:34:10.524298 29298 master.cpp:480] Using default 'crammd5' authenticator I0407 22:34:10.524441 29298 master.cpp:551] Using default 'basic' HTTP authenticator I0407 22:34:10.524564 29298 master.cpp:589] Authorization enabled I0407 22:34:10.525269 29305 hierarchical.cpp:145] Initialized hierarchical allocator process I0407 22:34:10.525333 29305 whitelist_watcher.cpp:77] No whitelist given I0407 22:34:10.527331 29298 master.cpp:1832] The newly elected leader is master@172.17.0.3:35855 with id f59f9057-a5c7-43e1-b129-96862e640a12 I0407 22:34:10.527441 29298 master.cpp:1845] Elected as the leading master! I0407 22:34:10.527545 29298 master.cpp:1532] Recovering from registrar I0407 22:34:10.527889 29298 registrar.cpp:331] Recovering registrar I0407 22:34:10.549734 29299 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 28.25177ms I0407 22:34:10.549782 29299 replica.cpp:320] Persisted replica status to STARTING I0407 22:34:10.550010 29299 recover.cpp:473] Replica is in STARTING status I0407 22:34:10.551352 29299 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (4777)@172.17.0.3:35855 I0407 22:34:10.551676 29299 recover.cpp:193] Received a recover response from a replica in STARTING status I0407 22:34:10.552315 29308 recover.cpp:564] Updating replica status to VOTING I0407 22:34:10.574865 29308 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 22.413614ms I0407 22:34:10.574928 29308 replica.cpp:320] Persisted replica status to VOTING I0407 22:34:10.575103 29308 recover.cpp:578] Successfully joined the Paxos group I0407 22:34:10.575346 29308 recover.cpp:462] Recover process terminated I0407 22:34:10.575913 29308 log.cpp:659] Attempting to start the writer I0407 22:34:10.577512 29308 replica.cpp:493] Replica received implicit promise request from (4778)@172.17.0.3:35855 with proposal 1 I0407 22:34:10.599984 29308 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 22.453613ms I0407 22:34:10.600026 29308 replica.cpp:342] Persisted promised to 1 I0407 22:34:10.601773 29304 coordinator.cpp:238] Coordinator attempting to fill missing positions I0407 22:34:10.603757 29307 replica.cpp:388] Replica received explicit promise request from (4779)@172.17.0.3:35855 for position 0 with proposal 2 I0407 22:34:10.634392 29307 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 30.269987ms I0407 22:34:10.634829 29307 replica.cpp:712] Persisted action at 0 I0407 22:34:10.637017 29297 replica.cpp:537] Replica received write request for position 0 from (4780)@172.17.0.3:35855 I0407 22:34:10.637099 29297 leveldb.cpp:436] Reading position from leveldb took 52948ns I0407 22:34:10.676170 29297 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 38.917487ms I0407 22:34:10.676352 29297 replica.cpp:712] Persisted action at 0 I0407 22:34:10.677564 29306 replica.cpp:691] Replica received learned notice for position 0 from @0.0.0.0:0 I0407 22:34:10.717959 29306 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 40.306229ms I0407 22:34:10.718202 29306 replica.cpp:712] Persisted action at 0 I0407 22:34:10.718399 29306 replica.cpp:697] Replica learned NOP action at position 0 I0407 22:34:10.719883 29306 log.cpp:675] Writer started with ending position 0 I0407 22:34:10.721688 29305 leveldb.cpp:436] Reading position from leveldb took 75934ns I0407 22:34:10.723640 29306 registrar.cpp:364] Successfully fetched the registry (0B) in 195648us I0407 22:34:10.723999 29306 registrar.cpp:463] Applied 1 operations in 108099ns; attempting to update the 'registry' I0407 22:34:10.725077 29311 log.cpp:683] Attempting to append 170 bytes to the log I0407 22:34:10.725328 29308 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 1 I0407 22:34:10.726552 29299 replica.cpp:537] Replica received write request for position 1 from (4781)@172.17.0.3:35855 I0407 22:34:10.759747 29299 leveldb.cpp:341] Persisting action (189 bytes) to leveldb took 33.089719ms I0407 22:34:10.759976 29299 replica.cpp:712] Persisted action at 1 I0407 22:34:10.761739 29299 replica.cpp:691] Replica received learned notice for position 1 from @0.0.0.0:0 I0407 22:34:10.801522 29299 leveldb.cpp:341] Persisting action (191 bytes) to leveldb took 39.694064ms I0407 22:34:10.801602 29299 replica.cpp:712] Persisted action at 1 I0407 22:34:10.801638 29299 replica.cpp:697] Replica learned APPEND action at position 1 I0407 22:34:10.803371 29311 registrar.cpp:508] Successfully updated the 'registry' in 79.163904ms I0407 22:34:10.803829 29311 registrar.cpp:394] Successfully recovered registrar I0407 22:34:10.804585 29311 master.cpp:1640] Recovered 0 agents from the Registry (131B) ; allowing 10mins for agents to re-register I0407 22:34:10.805269 29308 log.cpp:702] Attempting to truncate the log to 1 I0407 22:34:10.805721 29310 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 2 I0407 22:34:10.805276 29296 hierarchical.cpp:172] Skipping recovery of hierarchical allocator: nothing to recover I0407 22:34:10.806529 29307 replica.cpp:537] Replica received write request for position 2 from (4782)@172.17.0.3:35855 I0407 22:34:10.843320 29307 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 36.77593ms I0407 22:34:10.843531 29307 replica.cpp:712] Persisted action at 2 I0407 22:34:10.845369 29311 replica.cpp:691] Replica received learned notice for position 2 from @0.0.0.0:0 I0407 22:34:10.885098 29311 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 39.641102ms I0407 22:34:10.885401 29311 leveldb.cpp:399] Deleting ~1 keys from leveldb took 88701ns I0407 22:34:10.885745 29311 replica.cpp:712] Persisted action at 2 I0407 22:34:10.885862 29311 replica.cpp:697] Replica learned TRUNCATE action at position 2 I0407 22:34:10.900660 29278 containerizer.cpp:155] Using isolation: posix/cpu,posix/mem,filesystem/posix W0407 22:34:10.901793 29278 backend.cpp:66] Failed to create 'bind' backend: BindBackend requires root privileges I0407 22:34:10.905488 29302 slave.cpp:201] Agent started on 111)@172.17.0.3:35855 I0407 22:34:10.905553 29302 slave.cpp:202] Flags at startup: --appc_simple_discovery_uri_prefix=""http://"" --appc_store_dir=""/tmp/mesos/store/appc"" --authenticate_http=""true"" --authenticatee=""crammd5"" --cgroups_cpu_enable_pids_and_tids_count=""false"" --cgroups_enable_cfs=""false"" --cgroups_hierarchy=""/sys/fs/cgroup"" --cgroups_limit_swap=""false"" --cgroups_root=""mesos"" --container_disk_watch_interval=""15secs"" --containerizers=""mesos"" --credential=""/tmp/MasterAllocatorTest_1_RebalancedForUpdatedWeights_9aCAYa/credential"" --default_role=""*"" --disk_watch_interval=""1mins"" --docker=""docker"" --docker_kill_orphans=""true"" --docker_registry=""https://registry-1.docker.io"" --docker_remove_delay=""6hrs"" --docker_socket=""/var/run/docker.sock"" --docker_stop_timeout=""0ns"" --docker_store_dir=""/tmp/mesos/store/docker"" --enforce_container_disk_quota=""false"" --executor_registration_timeout=""1mins"" --executor_shutdown_grace_period=""5secs"" --fetcher_cache_dir=""/tmp/MasterAllocatorTest_1_RebalancedForUpdatedWeights_9aCAYa/fetch"" --fetcher_cache_size=""2GB"" --frameworks_home="""" --gc_delay=""1weeks"" --gc_disk_headroom=""0.1"" --hadoop_home="""" --help=""false"" --hostname_lookup=""true"" --http_authenticators=""basic"" --http_credentials=""/tmp/MasterAllocatorTest_1_RebalancedForUpdatedWeights_9aCAYa/http_credentials"" --image_provisioner_backend=""copy"" --initialize_driver_logging=""true"" --isolation=""posix/cpu,posix/mem"" --launcher_dir=""/mesos/mesos-0.29.0/_build/src"" --logbufsecs=""0"" --logging_level=""INFO"" --oversubscribed_resources_interval=""15secs"" --perf_duration=""10secs"" --perf_interval=""1mins"" --qos_correction_interval_min=""0ns"" --quiet=""false"" --recover=""reconnect"" --recovery_timeout=""15mins"" --registration_backoff_factor=""10ms"" --resources=""cpus:2;mem:1024;disk:4096;ports:[31000-32000]"" --revocable_cpu_low_priority=""true"" --sandbox_directory=""/mnt/mesos/sandbox"" --strict=""true"" --switch_user=""true"" --systemd_enable_support=""true"" --systemd_runtime_directory=""/run/systemd/system"" --version=""false"" --work_dir=""/tmp/MasterAllocatorTest_1_RebalancedForUpdatedWeights_9aCAYa"" I0407 22:34:10.906365 29302 credentials.hpp:86] Loading credential for authentication from '/tmp/MasterAllocatorTest_1_RebalancedForUpdatedWeights_9aCAYa/credential' I0407 22:34:10.906787 29302 slave.cpp:339] Agent using credential for: test-principal I0407 22:34:10.907202 29302 credentials.hpp:37] Loading credentials for authentication from '/tmp/MasterAllocatorTest_1_RebalancedForUpdatedWeights_9aCAYa/http_credentials' I0407 22:34:10.907713 29302 slave.cpp:391] Using default 'basic' HTTP authenticator I0407 22:34:10.908499 29302 resources.cpp:572] Parsing resources as JSON failed: cpus:2;mem:1024;disk:4096;ports:[31000-32000] Trying semicolon-delimited string format instead I0407 22:34:10.910189 29302 slave.cpp:590] Agent resources: cpus(*):2; mem(*):1024; disk(*):4096; ports(*):[31000-32000] I0407 22:34:10.910362 29302 slave.cpp:598] Agent attributes: [ ] I0407 22:34:10.910465 29302 slave.cpp:603] Agent hostname: 129e11060069 I0407 22:34:10.913280 29303 state.cpp:57] Recovering state from '/tmp/MasterAllocatorTest_1_RebalancedForUpdatedWeights_9aCAYa/meta' I0407 22:34:10.914621 29303 status_update_manager.cpp:200] Recovering status update manager I0407 22:34:10.915226 29303 containerizer.cpp:416] Recovering containerizer I0407 22:34:10.917246 29301 provisioner.cpp:245] Provisioner recovery complete I0407 22:34:10.917733 29301 slave.cpp:4784] Finished recovery I0407 22:34:10.918226 29301 slave.cpp:4956] Querying resource estimator for oversubscribable resources I0407 22:34:10.918529 29301 slave.cpp:4970] Received oversubscribable resources from the resource estimator I0407 22:34:10.918908 29304 slave.cpp:939] New master detected at master@172.17.0.3:35855 I0407 22:34:10.918988 29304 slave.cpp:1002] Authenticating with master master@172.17.0.3:35855 I0407 22:34:10.919098 29301 status_update_manager.cpp:174] Pausing sending status updates I0407 22:34:10.919309 29304 slave.cpp:1007] Using default CRAM-MD5 authenticatee I0407 22:34:10.919535 29304 slave.cpp:975] Detecting new master I0407 22:34:10.919747 29308 authenticatee.cpp:121] Creating new client SASL connection I0407 22:34:10.920413 29308 master.cpp:5695] Authenticating slave(111)@172.17.0.3:35855 I0407 22:34:10.920650 29308 authenticator.cpp:413] Starting authentication session for crammd5_authenticatee(278)@172.17.0.3:35855 I0407 22:34:10.921020 29308 authenticator.cpp:98] Creating new server SASL connection I0407 22:34:10.921308 29308 authenticatee.cpp:212] Received SASL authentication mechanisms: CRAM-MD5 I0407 22:34:10.921424 29308 authenticatee.cpp:238] Attempting to authenticate with mechanism 'CRAM-MD5' I0407 22:34:10.921596 29308 authenticator.cpp:203] Received SASL authentication start I0407 22:34:10.921752 29308 authenticator.cpp:325] Authentication requires more steps I0407 22:34:10.921957 29307 authenticatee.cpp:258] Received SASL authentication step I0407 22:34:10.922178 29308 authenticator.cpp:231] Received SASL authentication step I0407 22:34:10.922214 29308 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: '129e11060069' server FQDN: '129e11060069' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0407 22:34:10.922229 29308 auxprop.cpp:179] Looking up auxiliary property '*userPassword' I0407 22:34:10.922281 29308 auxprop.cpp:179] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0407 22:34:10.922309 29308 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: '129e11060069' server FQDN: '129e11060069' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0407 22:34:10.922322 29308 auxprop.cpp:129] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0407 22:34:10.922332 29308 auxprop.cpp:129] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0407 22:34:10.922353 29308 authenticator.cpp:317] Authentication success I0407 22:34:10.922436 29307 authenticatee.cpp:298] Authentication success I0407 22:34:10.922587 29308 master.cpp:5725] Successfully authenticated principal 'test-principal' at slave(111)@172.17.0.3:35855 I0407 22:34:10.922668 29299 authenticator.cpp:431] Authentication session cleanup for crammd5_authenticatee(278)@172.17.0.3:35855 I0407 22:34:10.923256 29307 slave.cpp:1072] Successfully authenticated with master master@172.17.0.3:35855 I0407 22:34:10.923429 29307 slave.cpp:1468] Will retry registration in 3.220345ms if necessary I0407 22:34:10.923707 29302 master.cpp:4406] Registering agent at slave(111)@172.17.0.3:35855 (129e11060069) with id f59f9057-a5c7-43e1-b129-96862e640a12-S0 I0407 22:34:10.924239 29309 registrar.cpp:463] Applied 1 operations in 105794ns; attempting to update the 'registry' I0407 22:34:10.925787 29309 log.cpp:683] Attempting to append 339 bytes to the log I0407 22:34:10.926028 29309 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 3 I0407 22:34:10.927139 29309 replica.cpp:537] Replica received write request for position 3 from (4797)@172.17.0.3:35855 I0407 22:34:10.929083 29305 slave.cpp:1468] Will retry registration in 39.293556ms if necessary I0407 22:34:10.929363 29305 master.cpp:4394] Ignoring register agent message from slave(111)@172.17.0.3:35855 (129e11060069) as admission is already in progress I0407 22:34:10.968843 29309 leveldb.cpp:341] Persisting action (358 bytes) to leveldb took 41.68025ms I0407 22:34:10.969005 29309 replica.cpp:712] Persisted action at 3 I0407 22:34:10.969741 29309 slave.cpp:1468] Will retry registration in 54.852242ms if necessary I0407 22:34:10.970118 29309 master.cpp:4394] Ignoring register agent message from slave(111)@172.17.0.3:35855 (129e11060069) as admission is already in progress I0407 22:34:10.970852 29306 replica.cpp:691] Replica received learned notice for position 3 from @0.0.0.0:0 I0407 22:34:11.010634 29306 leveldb.cpp:341] Persisting action (360 bytes) to leveldb took 39.680272ms I0407 22:34:11.010840 29306 replica.cpp:712] Persisted action at 3 I0407 22:34:11.011014 29306 replica.cpp:697] Replica learned APPEND action at position 3 I0407 22:34:11.014020 29306 registrar.cpp:508] Successfully updated the 'registry' in 89.684224ms I0407 22:34:11.014181 29296 log.cpp:702] Attempting to truncate the log to 3 I0407 22:34:11.014606 29296 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 4 I0407 22:34:11.015836 29298 replica.cpp:537] Replica received write request for position 4 from (4798)@172.17.0.3:35855 I0407 22:34:11.016973 29296 master.cpp:4474] Registered agent f59f9057-a5c7-43e1-b129-96862e640a12-S0 at slave(111)@172.17.0.3:35855 (129e11060069) with cpus(*):2; mem(*):1024; disk(*):4096; ports(*):[31000-32000] I0407 22:34:11.017518 29304 hierarchical.cpp:476] Added agent f59f9057-a5c7-43e1-b129-96862e640a12-S0 (129e11060069) with cpus(*):2; mem(*):1024; disk(*):4096; ports(*):[31000-32000] (allocated: ) I0407 22:34:11.017763 29311 slave.cpp:1116] Registered with master master@172.17.0.3:35855; given agent ID f59f9057-a5c7-43e1-b129-96862e640a12-S0 I0407 22:34:11.018362 29311 fetcher.cpp:81] Clearing fetcher cache I0407 22:34:11.018870 29311 slave.cpp:1139] Checkpointing SlaveInfo to '/tmp/MasterAllocatorTest_1_RebalancedForUpdatedWeights_9aCAYa/meta/slaves/f59f9057-a5c7-43e1-b129-96862e640a12-S0/slave.info' I0407 22:34:11.018890 29307 status_update_manager.cpp:181] Resuming sending status updates I0407 22:34:11.019182 29304 hierarchical.cpp:1491] No resources available to allocate! I0407 22:34:11.019304 29304 hierarchical.cpp:1165] Performed allocation for agent f59f9057-a5c7-43e1-b129-96862e640a12-S0 in 1.077349ms I0407 22:34:11.019493 29311 slave.cpp:1176] Forwarding total oversubscribed resources I0407 22:34:11.019726 29311 slave.cpp:3675] Received ping from slave-observer(112)@172.17.0.3:35855 I0407 22:34:11.019878 29299 master.cpp:4818] Received update of agent f59f9057-a5c7-43e1-b129-96862e640a12-S0 at slave(111)@172.17.0.3:35855 (129e11060069) with total oversubscribed resources I0407 22:34:11.020845 29305 hierarchical.cpp:534] Agent f59f9057-a5c7-43e1-b129-96862e640a12-S0 (129e11060069) updated with oversubscribed resources (total: cpus(*):2; mem(*):1024; disk(*):4096; ports(*):[31000-32000], allocated: ) I0407 22:34:11.021005 29305 hierarchical.cpp:1491] No resources available to allocate! I0407 22:34:11.021065 29305 hierarchical.cpp:1165] Performed allocation for agent f59f9057-a5c7-43e1-b129-96862e640a12-S0 in 173907ns I0407 22:34:11.022289 29278 containerizer.cpp:155] Using isolation: posix/cpu,posix/mem,filesystem/posix W0407 22:34:11.023422 29278 backend.cpp:66] Failed to create 'bind' backend: BindBackend requires root privileges I0407 22:34:11.026309 29309 slave.cpp:201] Agent started on 112)@172.17.0.3:35855 I0407 22:34:11.026410 29309 slave.cpp:202] Flags at startup: --appc_simple_discovery_uri_prefix=""http://"" --appc_store_dir=""/tmp/mesos/store/appc"" --authenticate_http=""true"" --authenticatee=""crammd5"" --cgroups_cpu_e...",1 MESOS-5152,"Add authentication to agent's /monitor/statistics endpoint","Operators may want to enforce that only authenticated users (and subsequently only specific authorized users) be able to view per-executor resource usage statistics. Since this endpoint is handled by the ResourceMonitorProcess, I would expect the work necessary to be similar to what was done for /files or /registry endpoint authn.",2 MESOS-5153,"Sandboxes contents should be protected from unauthorized users","MESOS-4956 introduced authentication support for the sandboxes. However, authentication can only go as far as to tell whether an user is known to mesos or not. An extra additional step is necessary to verify whether the known user is allowed to executed the requested operation on the sandbox (browse, read, download, debug).",8 MESOS-5155,"Consolidate authorization actions for quota.","We should have just a single authz action: {{UPDATE_QUOTA_WITH_ROLE}}. It was a mistake in retrospect to introduce multiple actions. Actions that are not symmetrical are register/teardown and dynamic reservations. The way they are implemented in this way is because entities that do one action differ from entities that do the other. For example, register framework is issued by a framework, teardown by an operator. What is a good way to identify a framework? A role it runs in, which may be different each launch and makes no sense in multi-role frameworks setup or better a sort of a group id, which is its principal. For dynamic reservations and persistent volumes, they can be both issued by frameworks and operators, hence similar reasoning applies. Now, quota is associated with a role and set only by operators. Do we need to care about principals that set it? Not that much. ",5 MESOS-5156,"Run mesos builds on PowerPC platform in ASF CI","This is the last step to declare official support for PowerPC. This is currently blocked on ASF INFRA adding PowerPC based Jenkins machines to the ASF CI. ",1 MESOS-5157,"Update webui for GPU metrics","After adding the GPU metrics and updating the resources JSON to include GPU information, the webui should be updated accordingly.",1 MESOS-5159,"Add test to verify error when requesting fractional GPUs","Fractional GPU requests should immediately cause a TASK_FAILED without ever launching the task.",1 MESOS-5160,"Make `network/cni` enabled as the default network isolator for `MesosContainerizer`.","Currently there are no default `network` isolators for `MesosContainerizer`. With the development of the `network/cni` isolator we have an interface to run Mesos on multitude of IP networks. Given that its based on an open standard (the CNI spec) which is gathering a lot of traction from vendors (calico, weave, coreOS) and already works on some default networks (bridge, ipvlan, macvlan) it makes sense to make it as the default network isolator. ",1 MESOS-5162,"Commit message hook behaves incorrectly when a message includes a ""*"".","If there is a ""\*"" in a commit message (there often is when we have bulleted lists), due to the current use of {{echo $LINE}}, the {{$LINE}} gets expanded with a ""*"" in it, which becomes a matcher in bash and therefore subsequently gets expanded into the list of files/directories in the current directory. In order to avoid this mess, we need to wrap such variables in quotes, like so: {{echo ""$LINE""}}.",2 MESOS-5164,"Add authorization to agent's /monitor/statistics endpoint.","Operators may want to enforce that only specific authorized users be able to view per-executor resource usage statistics. For 0.29 MVP, we can make this coarse-grained, and assume that only the operator or a operator-privileged monitoring service will be accessing the endpoint. For a future release, we can consider fine-grained authz that filters statistics like we plan to do for /tasks.",5 MESOS-5167,"Add tests for `network/cni` isolator","We need to add tests to verify the functionality of `network/cni` isolator.",5 MESOS-5168,"Benchmark overhead of authorization based filtering.","When adding authorization based filtering as outlined in MESOS-4931 we need to be careful especially for performance critical endpoints such as /state. We should ensure via a benchmark that performance does not degreade below an acceptable state.",3 MESOS-5169,"Introduce new Authorizer Actions for Authorized based filtering of endpoints.","For authorization based endpoint filtering we need to introduce the authorizer actions outlined via MESOS-4932.",3 MESOS-5170,"Adapt json creation for authorization based endpoint filtering.","For authorization based endpoint filtering we need to adapt the json endpoint creation as discussed in MESOS-4931.",5 MESOS-5171,"Expose state/state.hpp to public headers","We want the Modules to be able to use replicated log along with the APIs to communicate with Zookeeper. This change would require us to expose at least the following headers state/storage.hpp, and any additional files that state.hpp depends on (e.g., zookeeper/authentication.hpp).",3 MESOS-5172,"Registry puller cannot fetch blobs correctly from some private repos.","When the registry puller is pulling a private repository from some private registry (e.g., quay.io), errors may occur when fetching blobs, at which point fetching the manifest of the repo is finished correctly. The error message is `Unexpected HTTP response '400 Bad Request' when trying to download the blob`. This may arise from the logic of fetching blobs, or incorrect format of uri when requesting blobs.",3 MESOS-5173,"Allow master/agent to take multiple modules manifest files","When loading multiple modules into master/agent, one has to merge all module metadata (library name, module name, parameters, etc.) into a single json file which is then passed on to the --modules flag. This quickly becomes cumbersome especially if the modules are coming from different vendors/developers. An alternate would be to allow multiple invocations of --modules flag that can then be passed on to the module manager. That way, each flag corresponds to just one module library and modules from that library. Another approach is to create a new flag (e.g., --modules-dir) that contains a path to a directory that would contain multiple json files. One can think of it as an analogous to systemd units. The operator that drops a new file into this directory and the file would automatically be picked up by the master/agent module manager. Further, the naming scheme can also be inherited to prefix the filename with an ""NN_"" to signify oad order.",3 MESOS-5174,"Update the balloon-framework to run on test clusters","There are a couple of problems with the balloon framework that prevent it from being deployed (easily) on an actual cluster: * The framework accepts 100% of memory in an offer. This means the expected behavior (finish or OOM) is dependent on the offer size. * The framework assumes the {{balloon-executor}} binary is available on each agent. This is generally only true in the build environment or in single-agent test environments. * The framework does not specify CPUs with the executor. This is required by many isolators. * The executor's {{TASK_FINISHED}} logic path was untested and is flaky. * The framework has no metrics. * The framework only launches a single task and then exits. With this behavior, we can't have useful metrics. ",3 MESOS-5178,"Add logic to validate for non-fractional GPU requests in the master","We should not put this logic directly into the 'Resources::validate()' function. The primary reason is that the existing 'Resources::validate()' function doesn't consider the semantics of any particular resource when performing its validation (it only makes sure that the fields in the 'Resource' protobuf message are correctly formed). Since a fractional 'gpus' resources is actually well-formed (and only semantically incorrect), we should push this validation logic up into the master. Moreover, the existing logic to construct a 'Resources' object from a 'RepeatedPtrField' silently drops any resources that don't pass 'Resources::validate()'. This means that if we were to push the non-fractional 'gpus' validation into 'Resources::validate()', the 'gpus' resources would just be silently dropped rather than causing a TASK_ERROR in the master. This is obviously *not* the desired behaviour.",2 MESOS-5179,"Enhance the error message for Duration flag.","Enhance the error message for https://github.com/apache/mesos/blob/4dfa91fc21f80204f5125b2e2f35c489f8fb41d8/3rdparty/libprocess/3rdparty/stout/include/stout/duration.hpp#L70 to list all of the supported duration unit.",1 MESOS-5180,"Scheduler driver does not detect disconnection with master and reregister.","The existing implementation of the scheduler driver does not re-register with the master under some network partition cases. When a scheduler registers with the master: 1) master links to the framework 2) framework links to the master It is possible for either of these links to break *without* the master changing. (Currently, the scheduler driver will only re-register if the master changes). If both links break or if just link (1) breaks, the master views the framework as {{inactive}} and {{disconnected}}. This means the framework will not receive any more events (such as offers) from the master until it re-registers. There is currently no way for the scheduler to detect a one-way link breakage. if link (2) breaks, it makes (almost) no difference to the scheduler. The scheduler usually uses the link to send messages to the master, but libprocess will create another socket if the persistent one is not available. To fix link breakages for (1+2) and (2), the scheduler driver should implement a `::exited` event handler for the master's {{pid}} and trigger a master (re-)detection upon a disconnection. This in turn should make the driver (re)-register with the master. The scheduler library already does this: https://github.com/apache/mesos/blob/master/src/scheduler/scheduler.cpp#L395 See the related issue MESOS-5181 for link (1) breakage.",3 MESOS-5181,"Master should reject calls from the scheduler driver if the scheduler is not connected.","When a scheduler registers, the master will create a link from master to scheduler. If this link breaks, the master will consider the scheduler {{inactive}} and mark it as {{disconnected}}. This causes a couple problems: 1) Master does not send offers to {{inactive}} schedulers. But these schedulers might consider themselves ""registered"" in a one-way network partition scenario. 2) Any calls from the {{inactive}} scheduler is still accepted, which leaves the scheduler in a starved, but semi-functional state. See the related issue for more context: MESOS-5180 There should be an additional guard for registered, but {{inactive}} schedulers here: https://github.com/apache/mesos/blob/94f4f4ebb7d491ec6da1473b619600332981dd8e/src/master/master.cpp#L1977 The HTTP API already does this: https://github.com/apache/mesos/blob/94f4f4ebb7d491ec6da1473b619600332981dd8e/src/master/http.cpp#L459 Since the scheduler driver cannot return a 403, it may be necessary to return a {{Event::ERROR}} and force the scheduler to abort.",1 MESOS-5199,"The mesos-execute prints confusing message when launching tasks.","{code} root@mesos002:~/src/mesos/m2/mesos/build# src/mesos-execute --master=192.168.56.12:5050 --name=test --docker_image=ubuntu:14.04 --command=""ls /root"" I0413 07:28:03.833521 2295 scheduler.cpp:175] Version: 0.29.0 Subscribed with ID '3a1af11e-cf66-4ce2-826d-48b332977999-0001' Submitted task 'test' to agent '3a1af11e-cf66-4ce2-826d-48b332977999-S0' Received status update TASK_RUNNING for task 'test' source: SOURCE_EXECUTOR reason: REASON_COMMAND_EXECUTOR_FAILED <<< Received status update TASK_FINISHED for task 'test' message: 'Command exited with status 0' source: SOURCE_EXECUTOR reason: REASON_COMMAND_EXECUTOR_FAILED <<< root@mesos002:~/src/mesos/m2/mesos/build# {code}",1 MESOS-5212,"Allow any principal in ReservationInfo when HTTP authentication is off","Mesos currently provides no way for operators to pass their principal to HTTP endpoints when HTTP authentication is off. Since we enforce that {{ReservationInfo.principal}} be equal to the operator principal in requests to {{/reserve}}, this means that when HTTP authentication is disabled, the {{ReservationInfo.principal}} field cannot be set. To address this in the short-term, we should allow {{ReservationInfo.principal}} to hold any value when HTTP authentication is disabled.",1 MESOS-5213,"Operator endpoints should accept a principal without HTTP authentication","Mesos currently provides no way for operators to include their principal with HTTP endpoint requests when HTTP authentication is disabled. To remedy this, we should add optional {{principal}} fields to the relevant protobuf messages. When HTTP authentication is enabled, we can allow the user to leave this field empty and populate it with the principal from their HTTP Auth header.",3 MESOS-5214,"Populate FrameworkInfo.principal for authenticated frameworks","If a framework authenticates and then does not provide a {{principal}} in its {{FrameworkInfo}}, we currently allow this and leave {{FrameworkInfo.principal}} unset. Instead, we should populate {{FrameworkInfo.principal}} for them automatically in that case to ensure that the two principals are equal.",2 MESOS-5215,"Update the documentation for '/reserve' and '/create-volumes'","There are a couple issues related to the {{principal}} field in {{DiskInfo}} and {{ReservationInfo}} (see linked JIRAs) that should be better documented. We need to help users understand the purpose of these fields and how they interact with the principal provided in the HTTP authentication header. See linked tickets for background.",1 MESOS-5216,"Document docker volume driver isolator.","Should include the followings: 1. What features (driver options) are supported in docker volume driver isolator. 2. How to use docker volume driver isolator. *related agent flags introduction and usage. *isolator dependency clarification (e.g., filesystem/linux). *related driver daemon preprocess. *volumes pre-specified by users and volume cleanup.",5 MESOS-5221,"Add Documentation for Nvidia GPU support",https://reviews.apache.org/r/46220/,5 MESOS-5222,"Create a benchmark for scale testing HTTP frameworks","It would be good to add a benchmark for scale testing the HTTP frameworks wrt driver based frameworks. The benchmark can be as simple as trying to launch N tasks (parameterized) with the old/new API. We can then focus on fixing performance issues that we find as a result of this exercise.",3 MESOS-5227,"Implement HTTP Docker Executor that uses the Executor Library","Similar to what we did with the HTTP command executor in MESOS-3558 we should have a HTTP docker executor that can speak the v1 Executor API.",5 MESOS-5228,"Add tests for Capability API.","Add basic tests for the capability API.",3 MESOS-5232,"Add capability information to ContainerInfo protobuf message.","To enable support for capability as first class framework entity, we need to add capabilities related information to the ContainerInfo protobuf.",1 MESOS-5237,"The windows version of `os::access` has differing behavior than the POSIX version.","The POSIX version of {{os::access}} looks like this: {code} inline Try access(const std::string& path, int how) { if (::access(path.c_str(), how) < 0) { if (errno == EACCES) { return false; } else { return ErrnoError(); } } return true; } {code} Compare this to the Windows version of {{os::access}} which looks like this following: {code} inline Try access(const std::string& fileName, int how) { if (::_access(fileName.c_str(), how) != 0) { return ErrnoError(""access: Could not access path '"" + fileName + ""'""); } return true; } {code} As we can see, the case where {{errno}} is set to {{EACCES}} is handled differently between the 2 functions. We can actually consolidate the 2 functions by simply using the POSIX version. The challenge is that on POSIX, we should use {{::access}} and {{::_access}} on Windows. Note however, that this problem is already solved, as we have an implementation of {{::access}} for Windows in {{3rdparty/libprocess/3rdparty/stout/include/stout/windows.hpp}} which simply defers to {{::_access}}. Thus, I propose to simply consolidate the 2 implementations.",2 MESOS-5238,"CHECK failure in AppcProvisionerIntegrationTest.ROOT_SimpleLinuxImageTest","Observed on the Mesosphere internal CI: {noformat} [22:56:28]W: [Step 10/10] F0420 22:56:28.056788 629 containerizer.cpp:1634] Check failed: containers_.contains(containerId) {noformat} Complete test log will be attached as a file.",2 MESOS-5239,"Persistent volume DockerContainerizer support assumes proper mount propagation setup on the host.","We recently added persistent volume support in DockerContainerizer (MESOS-3413). To understand the problem, we first need to understand how persistent volumes are supported in DockerContainerizer. To support persistent volumes in DockerContainerizer, we bind mount persistent volumes under a container's sandbox ('container_path' has to be relative for persistent volumes). When the Docker container is launched, since we always add a volume (-v) for the sandbox, the persistent volumes will be bind mounted into the container as well (since Docker does a 'rbind'). The assumption that the above works is that the Docker daemon should see those persistent volume mounts that Mesos mounts on the host mount table. It's not a problem if Docker daemon itself is using the host mount namespace. However, on systemd enabled systems, Docker daemon is running in a separate mount namespace and all mounts in that mount namespace will be marked as slave mounts due to this [patch|https://github.com/docker/docker/commit/eb76cb2301fc883941bc4ca2d9ebc3a486ab8e0a]. So what that means is that: in order for it to work, the parent mount of agent's work_dir should be a shared mount when docker daemon starts. This is typically true on CentOS7, CoreOS as all mounts are shared mounts by default. However, this causes an issue with the 'filesystem/linux' isolator. To understand why, first I need to show you a typical problem when dealing with shared mounts. Let me explain that using the following commands on a CentOS7 machine: {noformat} [root@core-dev run]# cat /proc/self/mountinfo 24 60 0:19 / /run rw,nosuid,nodev shared:22 - tmpfs tmpfs rw,seclabel,mode=755 [root@core-dev run]# mkdir /run/netns [root@core-dev run]# mount --bind /run/netns /run/netns [root@core-dev run]# cat /proc/self/mountinfo 24 60 0:19 / /run rw,nosuid,nodev shared:22 - tmpfs tmpfs rw,seclabel,mode=755 121 24 0:19 /netns /run/netns rw,nosuid,nodev shared:22 - tmpfs tmpfs rw,seclabel,mode=755 [root@core-dev run]# ip netns add test [root@core-dev run]# cat /proc/self/mountinfo 24 60 0:19 / /run rw,nosuid,nodev shared:22 - tmpfs tmpfs rw,seclabel,mode=755 121 24 0:19 /netns /run/netns rw,nosuid,nodev shared:22 - tmpfs tmpfs rw,seclabel,mode=755 162 121 0:3 / /run/netns/test rw,nosuid,nodev,noexec,relatime shared:5 - proc proc rw 163 24 0:3 / /run/netns/test rw,nosuid,nodev,noexec,relatime shared:5 - proc proc rw {noformat} As you can see above, there're two entries (/run/netns/test) in the mount table (unexpected). This will confuse some systems sometimes. The reason is because when we create a self bind mount (/run/netns -> /run/netns), the mount will be put into the same shared mount peer group (shared:22) as its parent (/run). Then, when you create another mount underneath that (/run/netns/test), that mount operation will be propagated to all mounts in the same peer group (shared:22), resulting an unexpected additional mount being created. The reason we need to do a self bind mount in Mesos is that sometimes, we need to make sure some mounts are shared so that it does not get copied when a new mount namespace is created. However, on some systems, mounts are private by default (e.g., Ubuntu 14.04). In those cases, since we cannot change the system mounts, we have to do a self bind mount so that we can set mount propagation to shared. For instance, in filesytem/linux isolator, we do a self bind mount on agent's work_dir. To avoid the self bind mount pitfall mentioned above, in filesystem/linux isolator, after we created the mount, we do a make-slave + make-shared so that the mount is its own shared mount peer group. In that way, any mounts underneath it will not be propagated back. However, that operation will break the assumption that the persistent volume DockerContainerizer support makes. As a result, we're seeing problem with persistent volumes in DockerContainerizer when filesystem/linux isolator is turned on.",3 MESOS-5240,"Command executor may escalate after the task is reaped.","In command executor, {{escalated()}} may be scheduled before the task has been killed, i.e. {{reaped()}}, but called after. In this case {{escalated()}} should be a no-op.",1 MESOS-5243,"Remove '/system/stats.json' endpoint","The {{/system/stats.json}} endpoint was deprecated by MESOS-2058. This endpoint can now be removed.",1 MESOS-5249,"Update CMake files to reflect reorganized 3rdparty",NULL,2 MESOS-5250,"Move 3rdparty/libprocess/3rdparty/* to 3rdparty/",NULL,5 MESOS-5253,"Isolator cleanup should not be invoked if they are not prepared yet.","If the mesos containerizer destroys a container in PROVISIONING state, isolator cleanup is still called, which is incorrect because there is no isolator prepared yet. In this case, there no need to clean up any isolator, call provisioner destroy directly.",2 MESOS-5254,"Add URI parsing function/library","The {{uri::Fetcher}} theoretically supports all URIs, per [RFC3986|http://tools.ietf.org/html/rfc3986]. To do this, we need a spec-compliant parser from string to URI. [uriparser|http://uriparser.sourceforge.net/] appears to fit the bill.",2 MESOS-5255,"Add GPUs to container resource consumption metrics.","Currently the usage callback in the Nvidia GPU isolator is unimplemented: {noformat} src/slave/containerizer/mesos/isolators/cgroups/devices/gpus/nvidia.cpp {noformat} It should use functionality from NVML to gather the current GPU usage and add it to a ResourceStatistics object. It is still an open question as to exactly what information we want to expose here (power, memory consumption, current load, etc.). Whatever we decide on should be standard across different GPU types, different GPU vendors, etc.",3 MESOS-5256,"Add support for per-containerizer resource enumeration","Currently the top level containerizer includes a static function for enumerating the resources available on a given agent. Ideally, this functionality should be the responsibility of individual containerizers (and specifically the responsibility of each isolator used to control access to those resources). Adding support for this will involve making the `Containerizer::resources()` function virtual instead of static and then implementing it on a per-containerizer basis. We should consider providing a default to make this easier in cases where there is only really one good way of enumerating a given set of resources.",3 MESOS-5257,"Add autodiscovery for GPU resources","Right now, the only way to enumerate the available GPUs on an agent is to use the `--nvidia_gpu_devices` flag and explicitly list them out. Instead, we should leverage NVML to autodiscover the GPUs that are available and only use this flag as a way to explicitly list out the GPUs you want to make available in order to restrict access to some of them.",3 MESOS-5258,"Turn the Nvidia GPU isolator into a module","The Nvidia GPU isolator has an external dependence on `libnvidia-ml.so`. As it currently stands, this forces *all* binaries that link with `libmesos.so` to also link with `libnvidia-ml.so` (including master, agents on machines without GPUs, scheduler, exectors, etc.). By turning the Nvidia GPU isolator into a module, it will be loaded at runtime only when an agent has explicitly including the the Nvidia GPU isolator in its `--isolation` flag.",5 MESOS-5259,"Refactor the mesos-fetcher binary to use the uri::Fetcher as a backend","This is an intermediate step for combining the {{mesos-fetcher}} binary and {{uri::Fetcher}}. The {{download}} method should be replaced with {{uri::Fetcher::fetch}}. https://github.com/apache/mesos/blob/653eca74f1080f5f55cd5092423506163e65d402/src/launcher/fetcher.cpp#L179 Combining the two will: * Attach the {{uri::Fetcher}} to the existing Fetcher caching logic. * Remove some code duplication for downloading URIs.",3 MESOS-5260,"Extend the uri::Fetcher::Plugin interface to include a ""fetchSize""","In order to replace the {{mesos-fetcher}} binary with the {{uri::Fetcher}}, each plugin must be able to determine/estimate the size of a download. This is used by the Fetcher cache when it creates cache entries and such. The logic for each of the four {{Fetcher::Plugin}}s can be taken and refactored from the existing fetcher. https://github.com/apache/mesos/blob/653eca74f1080f5f55cd5092423506163e65d402/src/slave/containerizer/fetcher.cpp#L267",2 MESOS-5261,"Combine the internal::slave::Fetcher class and mesos-fetcher binary","After [MESOS-5259], the {{mesos-fetcher}} will no longer need to be a separate binary and can be safely folded back into the agent process. (It was a separate binary because libcurl has synchronous/blocking calls.) This will likely mean: * A change to the {{fetch}} continuation chain: https://github.com/apache/mesos/blob/653eca74f1080f5f55cd5092423506163e65d402/src/slave/containerizer/fetcher.cpp#L315 * This protobuf can be deprecated (or just removed): https://github.com/apache/mesos/blob/653eca74f1080f5f55cd5092423506163e65d402/include/mesos/fetcher/fetcher.proto",3 MESOS-5263,"pivot_root is not available on ARM","When compile on ARM, it will through error. The current code logic in src/linux/fs.cpp is: {code} #ifdef __NR_pivot_root int ret = ::syscall(__NR_pivot_root, newRoot.c_str(), putOld.c_str()); #elif __x86_64__ // A workaround for systems that have an old glib but have a new // kernel. The magic number '155' is the syscall number for // 'pivot_root' on the x86_64 architecture, see // arch/x86/syscalls/syscall_64.tbl int ret = ::syscall(155, newRoot.c_str(), putOld.c_str()); #elif __powerpc__ || __ppc__ || __powerpc64__ || __ppc64__ // A workaround for powerpc. The magic number '203' is the syscall // number for 'pivot_root' on the powerpc architecture, see // https://w3challs.com/syscalls/?arch=powerpc_64 int ret = ::syscall(203, newRoot.c_str(), putOld.c_str()); #else #error ""pivot_root is not available"" #endif {code} Possible sollution is to add `unistd.h` header",1 MESOS-5265,"Update mesos-execute to support docker volume isolator.","The mesos-execute needs to be updated to support docker volume isolator.",3 MESOS-5266,"add test cases for docker volume driver",NULL,5 MESOS-5271,"Add alias support for Flags","Currently there is no support for a flag to have an alias. Such support would be useful to rename/deprecate a flag. For example, for MESOS-4386, we could let the flag have `--authenticate` name and a `--authenticate_frameworks` alias. The alias can be marked as deprecated (need to add support for this as well). This support will also be useful for slave/agent flag rename. See MESOS-3781 for details. ",5 MESOS-5272,"Support docker image labels.","Docker image labels should be supported in unified containerizer, which can be used for applying custom metadata. Image labels are necessary for mesos features to support docker in unified containerizer (e.g., for mesos GPU device isolator).",3 MESOS-5273,"Need support for Authorization information via HELP.","We should add information about authentication to the help message and thereby endpoint documentation (similarly as MESOS-4934 has done for authentication).",3 MESOS-5275,"Add capabilities support for unified containerizer.","Add capabilities support for unified containerizer. Requirements: 1. Use the mesos capabilities API. 2. Frameworks be able to add capability requests for containers. 3. Agents be able to add maximum allowed capabilities for all containers launched. Design document: https://docs.google.com/document/d/1YiTift8TQla2vq3upQr7K-riQ_pQ-FKOCOsysQJROGc/edit#heading=h.rgfwelqrskmd ",5 MESOS-5277,"Need to add REMOVE semantics to the copy backend","Some Dockerfiles run the `rm` command to remove files from the base image using the ""RUN"" directive in the Dockerfile. An example can be found here: https://github.com/ngineered/nginx-php-fpm.git In the final rootfs the removed files should not be present. Presence of these files in the final image can make the container misbehave. For example, the nginx-php-fpm docker image that is referenced tries to remove the default nginx config and replaces it with its own config to point to a different HTML root. If the default nginx config is still present after the building the image, nginx will start pointing to a different HTML root than the one set in the Dockerfile. Currently the copy backend cannot handle removal of files from intermediate layers. This can cause issues with docker images built using a Dockerfile similar to the one listed here. Hence, we need to add REMOVE semantics to the copy backend. ",5 MESOS-5286,"Add authorization to libprocess HTTP endpoints","Now that the libprocess-level HTTP endpoints have had authentication added to them in MESOS-4902, we can add authorization to them as well. As a first step, we can implement a ""coarse-grained"" approach, in which a principal is granted or denied access to a given endpoint. We will likely need to register an authorizer with libprocess.",5 MESOS-5294,"Status updates after a health check are incomplete or invalid","With command health checks enabled via marathon, mesos-dns will resolve the task correctly until the task is reported as ""healthy"". At that point, mesos-dns stops resolving the task correctly. -Digging through src/docker/executor.cpp, I found that in the {{taskHealthUpdated()}} function is attempting to copy the taskID to the new status instance with- {code}status.mutable_task_id()->CopyFrom(taskID);{code} -but other instances of status updates have a similar line- {code}status.mutable_task_id()->CopyFrom(taskID.get());{code} -My assumption is that this difference is causing the status update after a health check to not have a proper taskID, which in turn is causing an incorrect state.json output.- -I'll try to get a patch together soon.- UPDATE: None of the above assumption are correct. Something else is causing the issue.",1 MESOS-5296,"Split Resource and Inverse offer protobufs for V1 API","The protobufs for the V1 api regarding inverse offers initially re-used the existing offer / rescind / accept / decline messages for regular offers. We should split these out the be more explicit, and provide the ability to augment the messages with particulars to either resource or inverse offers.",5 MESOS-5297,"Add authorization to the master's ""/flags"" endpoint.","Coarse HTTP endpoint authorization using the {{GET_ENDPOINT_WITH_PATH}} ACL rule needs to be added to the ""/flags"" endpoint of the master.",3 MESOS-5301,"Add synchronous validation for all types of Calls.","Currently, we do a best effort validation for all calls sent to the master from the scheduler by invoking {{validation::scheduler::call::validate(call, principal)}}. This is a generic validation helper for all calls. However, for more fine grained validation for a particular call, we invoke the validation as part of the call handle itself. {code} Option validationError = roles::validate(frameworkInfo.role()); {code} This in turn makes all validations asynchronous i.e. the framework gets them as {{Event::ERROR}} events later. It would be good if such validations can be handled while processing the {{Call}} message itself synchronously.",5 MESOS-5302,"Consider adding an Executor Shim/Adapter for the new/old API","Currently, all the business logic for HTTP based command executor/driver based command executor lives in 2 different files. As more features are added/bugs are discovered in the executor itself, they need to be fixed in two places. It would be nice to have some kind of a shim/adapter that abstracts away the underlying library details from the executor. Hence, the executor can toggle between whether it wants to use the driver or the new API via an environment variable.",5 MESOS-5303,"Add capabilities support for mesos execute cli.","Add support for `user` and `capabilities` to execute cli. This will help in testing the `capabilities` feature for unified containerizer.",3 MESOS-5304,"/metrics/snapshot endpoint help disappeared on agent.","After https://github.com/apache/mesos/commit/066fc4bd0df6690a5e1a929d3836e307c1e22586 the help for the /metrics/snapshot endpoint on the agent doesn't appear anymore (Master endpoint help is unchanged).",1 MESOS-5307,"Sandbox mounts should not be in the host mount namespace.","Currently, if a container uses container image, we'll do a bind mount of its sandbox ( -> /mnt/mesos/sandbox) in the host mount namespace. However, doing the mounts in the host mount table is not ideal. That complicates both the cleanup path and the recovery path. Instead, we can do the sandbox bind mount in the container's mount namespace so that cleanup and recovery will be greatly simplified. We can setup mount propagation properly so that persistent volumes mounted at /xxx can be propagated into the container. Here is a simple proof of concept: Console 1: {noformat} vagrant@vagrant-ubuntu-trusty-64:~/tmp/mesos$ ll . total 12 drwxrwxr-x 3 vagrant vagrant 4096 Apr 25 16:05 ./ drwxrwxr-x 6 vagrant vagrant 4096 Apr 25 23:17 ../ drwxrwxr-x 5 vagrant vagrant 4096 Apr 25 23:17 slave/ vagrant@vagrant-ubuntu-trusty-64:~/tmp/mesos$ ll slave/ total 20 drwxrwxr-x 5 vagrant vagrant 4096 Apr 25 23:17 ./ drwxrwxr-x 3 vagrant vagrant 4096 Apr 25 16:05 ../ drwxrwxr-x 6 vagrant vagrant 4096 Apr 26 21:06 directory/ drwxr-xr-x 12 vagrant vagrant 4096 Apr 25 23:20 rootfs/ drwxrwxr-x 2 vagrant vagrant 4096 Apr 25 16:09 volume/ vagrant@vagrant-ubuntu-trusty-64:~/tmp/mesos$ sudo mount --bind slave/ slave/ vagrant@vagrant-ubuntu-trusty-64:~/tmp/mesos$ sudo mount --make-shared slave/ vagrant@vagrant-ubuntu-trusty-64:~/tmp/mesos$ cat /proc/self/mountinfo 50 22 8:1 /home/vagrant/tmp/mesos/slave /home/vagrant/tmp/mesos/slave rw,relatime shared:1 - ext4 /dev/disk/by-uuid/baf292e5-0bb6-4e58-8a71-5b912e0f09b6 rw,data=ordered {noformat} Console 2: {noformat} vagrant@vagrant-ubuntu-trusty-64:~/tmp/mesos$ cd slave/ vagrant@vagrant-ubuntu-trusty-64:~/tmp/mesos/slave$ sudo unshare -m /bin/bash root@vagrant-ubuntu-trusty-64:~/tmp/mesos/slave# sudo mount --make-rslave . root@vagrant-ubuntu-trusty-64:~/tmp/mesos/slave# cat /proc/self/mountinfo 124 63 8:1 /home/vagrant/tmp/mesos/slave /home/vagrant/tmp/mesos/slave rw,relatime master:1 - ext4 /dev/disk/by-uuid/baf292e5-0bb6-4e58-8a71-5b912e0f09b6 rw,data=ordered root@vagrant-ubuntu-trusty-64:~/tmp/mesos/slave# mount --rbind directory/ rootfs/mnt/mesos/sandbox/ root@vagrant-ubuntu-trusty-64:~/tmp/mesos/slave# mount --rbind rootfs/ rootfs/ root@vagrant-ubuntu-trusty-64:~/tmp/mesos/slave# mount -t proc proc rootfs/proc root@vagrant-ubuntu-trusty-64:~/tmp/mesos/slave# pivot_root rootfs rootfs/tmp/.rootfs root@vagrant-ubuntu-trusty-64:~/tmp/mesos/slave# cd / root@vagrant-ubuntu-trusty-64:/# cat /proc/self/mountinfo 126 61 8:1 /home/vagrant/tmp/mesos/slave/rootfs / rw,relatime master:1 - ext4 /dev/disk/by-uuid/baf292e5-0bb6-4e58-8a71-5b912e0f09b6 rw,data=ordered 127 126 8:1 /home/vagrant/tmp/mesos/slave/directory /mnt/mesos/sandbox rw,relatime master:1 - ext4 /dev/disk/by-uuid/baf292e5-0bb6-4e58-8a71-5b912e0f09b6 rw,data=ordered 128 126 0:3 / /proc rw,relatime - proc proc rw {noformat} Console 1: {noformat} agrant@vagrant-ubuntu-trusty-64:~/tmp/mesos$ cd slave/ vagrant@vagrant-ubuntu-trusty-64:~/tmp/mesos/slave$ sudo mount --bind volume/ directory/v1 vagrant@vagrant-ubuntu-trusty-64:~/tmp/mesos/slave$ cat /proc/self/mountinfo 50 22 8:1 /home/vagrant/tmp/mesos/slave /home/vagrant/tmp/mesos/slave rw,relatime shared:1 - ext4 /dev/disk/by-uuid/baf292e5-0bb6-4e58-8a71-5b912e0f09b6 rw,data=ordered 129 50 8:1 /home/vagrant/tmp/mesos/slave/volume /home/vagrant/tmp/mesos/slave/directory/v1 rw,relatime shared:1 - ext4 /dev/disk/by-uuid/baf292e5-0bb6-4e58-8a71-5b912e0f09b6 rw,data=ordered {noformat} Console 2: {noformat} root@vagrant-ubuntu-trusty-64:/# cat /proc/self/mountinfo 126 61 8:1 /home/vagrant/tmp/mesos/slave/rootfs / rw,relatime master:1 - ext4 /dev/disk/by-uuid/baf292e5-0bb6-4e58-8a71-5b912e0f09b6 rw,data=ordered 127 126 8:1 /home/vagrant/tmp/mesos/slave/directory /mnt/mesos/sandbox rw,relatime master:1 - ext4 /dev/disk/by-uuid/baf292e5-0bb6-4e58-8a71-5b912e0f09b6 rw,data=ordered 128 126 0:3 / /proc rw,relatime - proc proc rw 132 127 8:1 /home/vagrant/tmp/mesos/slave/volume /mnt/mesos/sandbox/v1 rw,relatime shared:4 master:1 - ext4 /dev/disk/by-uuid/baf292e5-0bb6-4e58-8a71-5b912e0f09b6 rw,data=ordered {noformat}",5 MESOS-5310,"Enable `network/cni` isolator to allow modifications and deletion of CNI config","Currently the `network/cni` isolator can only load the CNI configs at startup. This makes the CNI networks immutable. From an operational standpoint this can make deployments painful for operators. To make CNI more flexible the `network/cni` isolator should be able to load configs at run time. The proposal is to add an endpoint to the `network/cni` isolator, to which when the operator sends a PUT request the `network/cni` isolator will reload CNI configs. ",5 MESOS-5312,"Env `MESOS_SANDBOX` is not set properly for command tasks that changes rootfs.","This is in the context of Mesos containerizer (a.k.a., unified containerizer). I did a simple test: {noformat} sudo sbin/mesos-master --work_dir=/tmp/mesos/master sudo GLOG_v=1 sbin/mesos-slave --master=10.0.2.15:5050 --isolation=docker/runtime,filesystem/linux --work_dir=/tmp/mesos/slave/ --image_providers=docker --executor_environment_variables=""{}"" sudo bin/mesos-execute --master=10.0.2.15:5050 --name=test --docker_image=alpine --command=""env"" MESOS_EXECUTOR_ID=test SHLVL=1 MESOS_CHECKPOINT=0 MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD=5secs LIBPROCESS_PORT=0 MESOS_AGENT_ENDPOINT=10.0.2.15:5051 MESOS_SANDBOX=/tmp/mesos/slave/slaves/2d7e44bb-3282-4193-bdc4-eeab9e0943c2-S0/frameworks/1a1cad18-2d87-43dd-97b6-1dbf2d229061-0000/executors/test/runs/bb8dd72c-fb4c-426a-be18-51b0621339f6 MESOS_NATIVE_JAVA_LIBRARY=/home/vagrant/dist/mesos/lib/libmesos-0.29.0.so MESOS_FRAMEWORK_ID=1a1cad18-2d87-43dd-97b6-1dbf2d229061-0000 MESOS_SLAVE_ID=2d7e44bb-3282-4193-bdc4-eeab9e0943c2-S0 MESOS_NATIVE_LIBRARY=/home/vagrant/dist/mesos/lib/libmesos-0.29.0.so MESOS_DIRECTORY=/tmp/mesos/slave/slaves/2d7e44bb-3282-4193-bdc4-eeab9e0943c2-S0/frameworks/1a1cad18-2d87-43dd-97b6-1dbf2d229061-0000/executors/test/runs/bb8dd72c-fb4c-426a-be18-51b0621339f6 PWD=/mnt/mesos/sandbox MESOS_SLAVE_PID=slave(1)@10.0.2.15:5051 {noformat} `MESOS_SANDBOX` above should be `/mnt/mesos/sandbox`.",2 MESOS-5313,"Failed to set quota and update weight according to document","{code} root@mesos002:~/test# curl -d jsonMessageBody -X POST http://192.168.56.12:5050/quota Failed to parse set quota request JSON 'jsonMessageBody': syntax error at line 1 near: jsonMessageBodyroot@mesos002:~/test# cat jsonMessageBody { ""role"": ""role1"", ""guarantee"": [{ ""name"": ""cpus"", ""type"": ""SCALAR"", ""scalar"": { ""value"": 1 } }, { ""name"": ""mem"", ""type"": ""SCALAR"", ""scalar"": { ""value"": 128 } }] } root@mesos002:~/test# curl -d weight.json -X PUT http://192.168.56.12:5050/weights Failed to parse update weights request JSON ('weight.json'): syntax error at line 1 near: weight.js root@mesos002:~/test# cat weight.json [ { ""role"": ""role1"", ""weight"": 2.0 }, { ""role"": ""role2"", ""weight"": 3.5 } ] {code} The right command should be adding {{@}} before the quota json file {{jsonMessageBody}}.",1 MESOS-5316,"Authenticate the agent's '/containers' endpoint.","The {{/containers}} endpoint was recently added to the agent. Authentication should be enabled on this endpoint.",2 MESOS-5317,"Authorize the agent's '/containers' endpoint.","After the agent's {{/containers}} endpoint is authenticated, we should enabled authorization as well.",2 MESOS-5318,"Make `os::close` always catch structured exceptions on Windows",NULL,2 MESOS-5335,"Add authorization to GET /weights.","We already authorize which http users can update weights for particular roles, but even knowing of the existence of these roles (let alone their weights) may be sensitive information. We should add authz around GET operations on /weights. Easy option: GET_ENDPOINT_WITH_PATH /weights - Pro: No new verb - Con: All or nothing Complex option: GET_WEIGHTS_WITH_ROLE - Pro: Filters contents based on roles the user is authorized to see - Con: More authorize calls (one per role in each /weights request)",3 MESOS-5336,"Add authorization to GET /quota.","We already authorize which http users can set/remove quota for particular roles, but even knowing of the existence of these roles (let alone their quotas) may be sensitive information. We should add authz around GET operations on /quota.",3 MESOS-5337,"Add Master Flag to enable fine-grained filtering of HTTP endpoints.","As the fine-grained filtering of endpoints can the rather expensive, we should create a master flag to enable/disable this feature.",1 MESOS-5338,"Add `user` to `Task` protobuf message.","The LocalAuthorizer is supposed to use the OS `user` under which tasks are running for authorization. As the master keeps track of running and completed processes we need access to this information in Task in order to authorize such tasks.",1 MESOS-5339,"Create Tests for testing fine-grained HTTP endpoint filtering.",NULL,3 MESOS-5343,"Behavior of custom HTTP authenticators with disabled HTTP authentication is inconsistent between master and agent","When setting a custom authenticator with {{http_authenticators}} and also specifying {{authenticate_http=false}} currently agents refuse to start with {code} A custom HTTP authenticator was specified with the '--http_authenticators' flag, but HTTP authentication was not enabled via '--authenticate_http' {code} Masters on the other hand accept this setting. Having differing behavior between master and agents is confusing, and we should decide on whether we want to accept these settings or not, and make the implementations consistent. ",3 MESOS-5345,"Design doc for TASK_LOST_PENDING","The TASK_LOST task status describes two different situations: (a) the task was not launched because of an error (e.g., insufficient available resources), or (b) the master lost contact with a running task (e.g., due to a network partition); the master will kill the task when it can (e.g., when the network partition heals), but in the meantime the task may still be running. This has two problems: 1. Using the same task status for two fairly different situations is confusing. 2. In the partitioned-but-still-running case, frameworks have no easy way to determine when a task has truly terminated. To address these problems, we propose introducing a new task status, TASK_LOST_PENDING. If a framework opts into this behavior using a new capability, TASK_LOST would mean ""the task is definitely not running"", whereas TASK_LOST_PENDING would mean ""the task may or may not be running (we've lost contact with the agent), but the master will try to shut it down when possible.""",5 MESOS-5347,"Enhance the log message when launching mesos containerizer.","Log the launch flag which includes the executor command, pre-launch commands and other information when launching the mesos containerizer. ",2 MESOS-5348,"Enhance the log message when launching docker containerizer.","Log the launch flag which includes the executor command and other information when launching the docker containerizer.",2 MESOS-5350,"Add asynchronous hook for validating docker containerizer tasks","It is possible to plug in custom validation logic for the MesosContainerizer via an {{Isolator}} module, but the same is not true of the DockerContainerizer. Basic logic can be plugged into the DockerContainerizer via {{Hooks}}, but this has some notable differences compared to isolators: * Hooks are synchronous. * Modifications to tasks via Hooks have lower priority compared to the task itself. i.e. If both the {{TaskInfo}} and {{slaveExecutorEnvironmentDecorator}} define the same environment variable, the {{TaskInfo}} wins. * Hooks have no effect if they fail (short of segfaulting) i.e. The {{slavePreLaunchDockerHook}} has a return type of {{Try}}: https://github.com/apache/mesos/blob/628ccd23501078b04fb21eee85060a6226a80ef8/include/mesos/hook.hpp#L90 But the effect of returning an {{Error}} is a log message: https://github.com/apache/mesos/blob/628ccd23501078b04fb21eee85060a6226a80ef8/src/hook/manager.cpp#L227-L230 We should add a hook to the DockerContainerizer to narrow this gap. This new hook would: * Be called at roughly the same place as {{slavePreLaunchDockerHook}} https://github.com/apache/mesos/blob/628ccd23501078b04fb21eee85060a6226a80ef8/src/slave/containerizer/docker.cpp#L1022 * Return a {{Future}} and require splitting up {{DockerContainerizer::launch}}. * Prevent a task from launching if it returns a {{Failure}}.",5 MESOS-5353,"Use `Connection` abstraction to compare stale connections in scheduler library.","Previously, we had a bug in the {{Connection}} abstraction in libprocess that hindered the ability to pass it onto {{defer}} callbacks since it could sometimes lead to deadlock (MESOS-4658). Now that it is resolved, we might consider not using {{UUID}} objects for stale connection checks but directly using the {{Connection}} abstraction in the scheduler library.",3 MESOS-5356,"Add Windows support for StopWatch",NULL,2 MESOS-5359,"The scheduler library should have a delay before initiating a connection with master.","Currently, the scheduler library {{src/scheduler/scheduler.cpp}} does have an artificially induced delay when trying to initially establish a connection with the master. In the event of a master failover or ZK disconnect, a large number of frameworks can get disconnected and then thereby overwhelm the master with TCP SYN requests. On a large cluster with many agents, the master is already overwhelmed with handling connection requests from the agents. This compounds the issue further on the master.",3 MESOS-5360,"Set death signal for dvdcli subprocess in docker volume isolator.","If the slave crashes, we should kill the dvdcli subprocess. Otherwise, if the dvdcli subprocess gets stuck, it'll not be cleaned up.",2 MESOS-5362,"Add authentication to example frameworks","Some example frameworks do not have the ability to authenticate with the master. Adding authentication to the example frameworks that don't already have it implemented would allow us to use these frameworks for testing in authenticated/authorized scenarios.",2 MESOS-5365,"Introduce a timeout for docker volume driver mount/unmount operation.","'dvdcli' might hang indefinitely. We should introduce timeout for both mount/unmount operation so that launch/cleanup are not blocked forever.",2 MESOS-5370,"Add deprecation support for Flags","MESOS-5271 adds support for a flag name to have an alias. This ticket captures the work need to add deprecation support. The idea is for the caller to explicitly specify deprecation via `FlagsBase::add()` and get a list of deprecation warnings when doing `FlagsBase::load()`.",5 MESOS-5372,"Add random() to os:: namespace ","The function ""random()"" is not available in Windows. After this improvement the calls to ""os::random()"" will result in calls to ""::random()"" on POSIX and ""::rand()"" on Windows. ",1 MESOS-5373,"Remove `Zookeeper's` NTDDI_VERSION define","Zookeeper client library defines NTDDI_VERSION to 0x0400 in ""winconfig.h"". While this API level is suficient to compile the client library, Mesos have to use a newer API set. After this improvement the code will compile with the latest NTDDI_VERSION. ",2 MESOS-5374,"Add support for Console Ctrl handling in `slave.cpp`","Extract supporting code to handle POSIX signals in a separate header and add support for CTRL handler when running on Windows. ",3 MESOS-5375,"Implement stout/os/windows/kill.hpp","Implement equivalent functionality on Windows ",5 MESOS-5378,"Terminating a framework during master failover leads to orphaned tasks","Repro steps: 1) Setup: {code} bin/mesos-master.sh --work_dir=/tmp/master bin/mesos-slave.sh --work_dir=/tmp/slave --master=localhost:5050 src/mesos-execute --checkpoint --command=""sleep 1000"" --master=localhost:5050 --name=""test"" {code} 2) Kill all three from (1), in the order they were started. 3) Restart the master and agent. Do not restart the framework. Result) * The agent will reconnect to an orphaned task. * The Web UI will report no memory usage * {{curl localhost:5050/metrics/snapshot}} will say: {{""master/mem_used"": 128,}} Cause) When a framework registers with the master, it provides a {{failover_timeout}}, in case the framework disconnects. If the framework disconnects and does not reconnect within this {{failover_timeout}}, the master will kill all tasks belonging to the framework. However, the master does not persist this {{failover_timeout}} across master failover. The master will ""forget"" about a framework if: 1) The master dies before {{failover_timeout}} passes. 2) The framework dies while the master is dead. When the master comes back up, the agent will re-register. The agent will report the orphaned task(s). Because the master failed over, it does not know these tasks are orphans (i.e. it thinks the frameworks might re-register). Proposed solution) The master should save the {{FrameworkID}} and {{failover_timeout}} in the registry. Upon recovery, the master should resume the {{failover_timeout}} timers.",3 MESOS-5380,"Killing a queued task can cause the corresponding command executor to never terminate.","We observed this in our testing environment. Sequence of events: 1) A command task is queued since the executor has not registered yet. 2) The framework issues a killTask. 3) Since executor is in REGISTERING state, agent calls `statusUpdate(TASK_KILLED, UPID())` 4) `statusUpdate` now will call `containerizer->status()` before calling `executor->terminateTask(status.task_id(), status);` which will remove the queued task. (Introduced in this patch: https://reviews.apache.org/r/43258). 5) Since the above is async, it's possible that the task is still in queued task when we trying to see if we need to kill unregistered executor in `killTask`: {code} // TODO(jieyu): Here, we kill the executor if it no longer has // any task to run and has not yet registered. This is a // workaround for those single task executors that do not have a // proper self terminating logic when they haven't received the // task within a timeout. if (executor->queuedTasks.empty()) { CHECK(executor->launchedTasks.empty()) << "" Unregistered executor '"" << executor->id << ""' has launched tasks""; LOG(WARNING) << ""Killing the unregistered executor "" << *executor << "" because it has no tasks""; executor->state = Executor::TERMINATING; containerizer->destroy(executor->containerId); } {code} 6) Consequently, the executor will never be terminated by Mesos. Attaching the relevant agent log: {noformat} May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: I0513 15:36:13.640527 1342 slave.cpp:1361] Got assigned task mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6 for framework a3ad8418-cb77-4705-b353-4b514ceca52c-0000 May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: I0513 15:36:13.641034 1342 slave.cpp:1480] Launching task mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6 for framework a3ad8418-cb77-4705-b353-4b514ceca52c-0000 May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: I0513 15:36:13.641440 1342 paths.cpp:528] Trying to chown '/var/lib/mesos/slave/slaves/a3ad8418-cb77-4705-b353-4b514ceca52c-S0/frameworks/a3ad8418-cb77-4705-b353-4b514ceca52c-0000/executors/mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6/runs/24762d43-2134-475e-b724-caa72110497a' to user 'root' May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: I0513 15:36:13.644664 1342 slave.cpp:5389] Launching executor mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6 of framework a3ad8418-cb77-4705-b353-4b514ceca52c-0000 with resources cpus(*):0.1; mem(*):32 in work directory '/var/lib/mesos/slave/slaves/a3ad8418-cb77-4705-b353-4b514ceca52c-S0/frameworks/a3ad8418-cb77-4705-b353-4b514ceca52c-0000/executors/mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6/runs/24762d43-2134-475e-b724-caa72110497a' May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: I0513 15:36:13.645195 1342 slave.cpp:1698] Queuing task 'mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6' for executor 'mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6' of framework a3ad8418-cb77-4705-b353-4b514ceca52c-0000 May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: I0513 15:36:13.645491 1338 containerizer.cpp:671] Starting container '24762d43-2134-475e-b724-caa72110497a' for executor 'mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6' of framework 'a3ad8418-cb77-4705-b353-4b514ceca52c-0000' May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: I0513 15:36:13.647897 1345 cpushare.cpp:389] Updated 'cpu.shares' to 1126 (cpus 1.1) for container 24762d43-2134-475e-b724-caa72110497a May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: I0513 15:36:13.648619 1345 cpushare.cpp:411] Updated 'cpu.cfs_period_us' to 100ms and 'cpu.cfs_quota_us' to 110ms (cpus 1.1) for container 24762d43-2134-475e-b724-caa72110497a May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: I0513 15:36:13.650180 1341 mem.cpp:602] Started listening for OOM events for container 24762d43-2134-475e-b724-caa72110497a May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: I0513 15:36:13.650718 1341 mem.cpp:722] Started listening on low memory pressure events for container 24762d43-2134-475e-b724-caa72110497a May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: I0513 15:36:13.651147 1341 mem.cpp:722] Started listening on medium memory pressure events for container 24762d43-2134-475e-b724-caa72110497a May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: I0513 15:36:13.651599 1341 mem.cpp:722] Started listening on critical memory pressure events for container 24762d43-2134-475e-b724-caa72110497a May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: I0513 15:36:13.652015 1341 mem.cpp:353] Updated 'memory.soft_limit_in_bytes' to 160MB for container 24762d43-2134-475e-b724-caa72110497a May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: I0513 15:36:13.652719 1341 mem.cpp:388] Updated 'memory.limit_in_bytes' to 160MB for container 24762d43-2134-475e-b724-caa72110497a May 13 15:36:25 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: I0513 15:36:25.508930 1342 slave.cpp:1891] Asked to kill task mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6 of framework a3ad8418-cb77-4705-b353-4b514ceca52c-0000 May 13 15:36:25 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: I0513 15:36:25.509063 1342 slave.cpp:3048] Handling status update TASK_KILLED (UUID: f9d15955-6c9a-4a73-98c3-97c0128510ba) for task mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6 of framework a3ad8418-cb77-4705-b353-4b514ceca52c-0000 from @0.0.0.0:0 May 13 15:36:25 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: I0513 15:36:25.509702 1340 disk.cpp:169] Updating the disk resources for container 24762d43-2134-475e-b724-caa72110497a to cpus(*):0.1; mem(*):32 May 13 15:36:25 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: I0513 15:36:25.510298 1343 mem.cpp:353] Updated 'memory.soft_limit_in_bytes' to 32MB for container 24762d43-2134-475e-b724-caa72110497a May 13 15:36:25 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: I0513 15:36:25.510349 1341 cpushare.cpp:389] Updated 'cpu.shares' to 102 (cpus 0.1) for container 24762d43-2134-475e-b724-caa72110497a May 13 15:36:25 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: I0513 15:36:25.511102 1343 mem.cpp:388] Updated 'memory.limit_in_bytes' to 32MB for container 24762d43-2134-475e-b724-caa72110497a May 13 15:36:25 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: I0513 15:36:25.511495 1341 cpushare.cpp:411] Updated 'cpu.cfs_period_us' to 100ms and 'cpu.cfs_quota_us' to 10ms (cpus 0.1) for container 24762d43-2134-475e-b724-caa72110497a May 13 15:36:25 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: I0513 15:36:25.511715 1341 status_update_manager.cpp:320] Received status update TASK_KILLED (UUID: f9d15955-6c9a-4a73-98c3-97c0128510ba) for task mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6 of framework a3ad8418-cb77-4705-b353-4b514ceca52c-0000 May 13 15:36:25 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: I0513 15:36:25.512032 1341 status_update_manager.cpp:824] Checkpointing UPDATE for status update TASK_KILLED (UUID: f9d15955-6c9a-4a73-98c3-97c0128510ba) for task mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6 of framework a3ad8418-cb77-4705-b353-4b514ceca52c-0000 May 13 15:36:25 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: I0513 15:36:25.513849 1343 slave.cpp:3446] Forwarding the update TASK_KILLED (UUID: f9d15955-6c9a-4a73-98c3-97c0128510ba) for task mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6 of framework a3ad8418-cb77-4705-b353-4b514ceca52c-0000 to master@10.0.5.79:5050 May 13 15:36:25 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: I0513 15:36:25.528929 1344 status_update_manager.cpp:392] Received status update acknowledgement (UUID: f9d15955-6c9a-4a73-98c3-97c0128510ba) for task mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6 of framework a3ad8418-cb77-4705-b353-4b514ceca52c-0000 May 13 15:36:25 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: I0513 15:36:25.529002 1344 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_KILLED (UUID: f9d15955-6c9a-4a73-98c3-97c0128510ba) for task mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6 of framework a3ad8418-cb77-4705-b353-4b514ceca52c-0000 May 13 15:36:28 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: I0513 15:36:28.199105 1345 isolator.cpp:469] Mounting docker volume mount point '//var/lib/rexray/volumes/jdef-test-125/data' to '/var/lib/mesos/slave/slaves/a3ad8418-cb77-4705-b353-4b514ceca52c-S0/frameworks/a3ad8418-cb77-4705-b353-4b514ceca52c-0000/executors/mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6/runs/24762d43-2134-475e-b724-caa72110497a/data' for container 24762d43-2134-475e-b724-caa72110497a May 13 15:36:28 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: I0513 15:36:28.207062 1338 containerizer.cpp:1184] Checkpointing executor's forked pid 5810 to '/var/lib/mesos/slave/meta/slaves/a3ad8418-cb77-4705-b353-4b514ceca52c-S0/frameworks/a3ad8418-cb77-4705-b353-4b514ceca52c-0000/executors/mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6/runs/24762d43-2134-475e-b724-caa72110497a/pids/forked.pid' May 13 15:36:28 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: I0513 15:36:28.832330 1338 slave.cpp:2689] Got registration for executor 'mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6' of framework a3ad8418-cb77-4705-b353-4b514ceca52c-0000 from executor(1)@10.0.2.74:46154 May 13 15:36:28 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: I0513 15:36:28.833149 1345 disk.cpp:169] Updating the disk resources for container 24762d43-2134-475e-b724-caa72110497a to cpus(*):0.1; mem(*):32 May 13 15:36:28 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: I0513 15:36:28.833804 1342 mem.cpp:353] Updated 'memory.soft_limit_in_bytes' to 32MB for container 24762d43-2134-475e-b724-caa72110497a May 13 15:36:28 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: I0513 15:36:28.833871 1340 cpushare.cpp:389] Updated 'cpu.shares' to 102 (cpus 0.1) for container 24762d43-2134-475e-b724-caa72110497a May 13 15:36:28 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: I0513 15:36:28.835160 1340 cpushare.cpp:411] Updated 'cpu.cfs_period_us' to 100ms and 'cpu.cfs_quota_us' to 10ms (cpus 0.1) for container 24762d43-2134-475e-b724-caa72110497a May 13 16:58:30 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[30985]: 5804 'mesos-logrotate-logger --help=false --log_filename=/var/lib/mesos/slave/slaves/a3ad8418-cb77-4705-b353-4b514ceca52c-S0/frameworks/a3ad8418-cb77-4705-b353-4b514ceca52c-0000/executors/mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6/runs/24762d43-2134-475e-b724-caa72110497a/stdout --logrotate_options=rotate 9 --logrotate_path=logrotate --max_size=2MB ' May 13 16:58:30 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[30985]: 5809 'mesos-logrotate-logger --help=false --log_filename=/var/lib/mesos/slave/slaves/a3ad8418-cb77-4705-b353-4b514ceca52c-S0/frameworks/a3ad8418-cb77-4705-b353-4b514ceca52c-0000/executors/mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6/runs/24762d43-2134-475e-b724-caa72110497a/stderr --logrotate_options=rotate 9 --logrotate_path=logrotate --max_size=2MB ' May 13 16:58:30 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[30985]: 5804 'mesos-logrotate-logger --help=false --log_filename=/var/lib/mesos/slave/slaves/a3ad8418-cb77-4705-b353-4b514ceca52c-S0/frameworks/a3ad8418-cb77-4705-b353-4b514ceca52c-0000/executors/mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6/runs/24762d43-2134-475e-b724-caa72110497a/stdout --logrotate_options=rotate 9 --logrotate_path=logrotate --max_size=2MB ' May 13 16:58:30 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[30985]: 5809 'mesos-logrotate-logger --help=false --log_filename=/var/lib/mesos/slave/slaves/a3ad8418-cb77-4705-b353-4b514ceca52c-S0/frameworks/a3ad8418-cb77-4705-b353-4b514ceca52c-0000/executors/mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6/runs/24762d43-2134-475e-b724-caa72110497a/stderr --logrotate_options=rotate 9 --logrotate_path=logrotate --max_size=2MB ' May 13 16:58:30 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[30985]: I0513 16:58:30.374567 30993 slave.cpp:5498] Recovering executor 'mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6' of framework a3ad8418-cb77-4705-b353-4b514ceca52c-0000 May 13 16:58:30 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[30985]: I0513 16:58:30.420411 30990 status_update_manager.cpp:208] Recovering executor 'mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6' of framework a3ad8418-cb77-4705-b353-4b514ceca52c-0000 May 13 16:58:30 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[30985]: I0513 16:58:30.513164 30994 containerizer.cpp:467] Recovering container '24762d43-2134-475e-b724-caa72110497a' for executor 'mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6' of framework a3ad8418-cb77-4705-b353-4b514ceca52c-0000 May 13 16:58:30 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[30985]: I0513 16:58:30.533478 30988 mem.cpp:602] Started listening for OOM events for container 24762d43-2134-475e-b724-caa72110497a May 13 16:58:30 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[30985]: I0513 16:58:30.534553 30988 mem.cpp:722] Started listening on low memory pressure events for container 24762d43-2134-475e-b724-caa72110497a May 13 16:58:30 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[30985]: I0513 16:58:30.535269 30988 mem.cpp:722] Started listening on medium memory pressure events for container 24762d43-2134-475e-b724-caa72110497a May 13 16:58:30 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[30985]: I0513 16:58:30.536198 30988 mem.cpp:722] Started listening on critical memory pressure events for container 24762d43-2134-475e-b724-caa72110497a May 13 16:58:30 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[30985]: I0513 16:58:30.579385 30988 docker.cpp:859] Skipping recovery of executor 'mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6' of framework 'a3ad8418-cb77-4705-b353-4b514ceca52c-0000' because it was not launched from docker containerizer May 13 16:58:30 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[30985]: I0513 16:58:30.587158 30989 slave.cpp:4527] Sending reconnect request to executor 'mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6' of framework a3ad8418-cb77-4705-b353-4b514ceca52c-0000 at executor(1)@10.0.2.74:46154 May 13 16:58:30 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[30985]: I0513 16:58:30.588287 30990 slave.cpp:2838] Re-registering executor 'mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6' of framework a3ad8418-cb77-4705-b353-4b514ceca52c-0000 May 13 16:58:30 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[30985]: I0513 16:58:30.589736 30988 disk.cpp:169] Updating the disk resources for container 24762d43-2134-475e-b724-caa72110497a to cpus(*):0.1; mem(*):32 May 13 16:58:30 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[30985]: I0513 16:58:30.590117 30990 cpushare.cpp:389] Updated 'cpu.shares' to 102 (cpus 0.1) for container 24762d43-2134-475e-b724-caa72110497a May 13 16:58:30 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[30985]: I0513 16:58:30.591284 30990 cpushare.cpp:411] Updated 'cpu.cfs_period_us' to 100ms and 'cpu.cfs_quota_us' to 10ms (cpus 0.1) for container 24762d43-2134-475e-b724-caa72110497a May 13 16:58:30 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[30985]: I0513 16:58:30.595403 30992 mem.cpp:353] Updated 'memory.soft_limit_in_bytes' to 32MB for container 24762d43-2134-475e-b724-caa72110497a May 13 16:58:30 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[30985]: I0513 16:58:30.596102 30992 mem.cpp:388] Updated 'memory.limit_in_bytes' to 32MB for container 24762d43-2134-475e-b724-caa72110497a {noformat}",3 MESOS-5382,"Implement os::fsync",NULL,1 MESOS-5383,"Implement os::setHostname",NULL,1 MESOS-5386,"Add `HANDLE` overloads for functions that take a file descriptor",NULL,3 MESOS-5388,"MesosContainerizerLaunch flags execute arbitrary commands via shell","For example, the docker volume isolator's containerPath is appended (without sanitation) to a command that's executed in this manner. As such, it's possible to inject arbitrary shell commands to be executed by mesos. https://github.com/apache/mesos/blob/17260204c833c643adf3d8f36ad8a1a606ece809/src/slave/containerizer/mesos/launch.cpp#L206 Perhaps instead of strings these commands could/should be sent as string arrays that could be passed as argv arguments w/o shell interpretation?",3 MESOS-5389,"docker containerizer should prefix relative volume.container_path values with the path to the sandbox","docker containerizer currently requires absolute paths for values of volume.container_path. this is inconsistent with the mesos containerizer which requires relative container_path. it makes for a confusing API. both at the Mesos level as well as at the Marathon level. ideally the docker containerizer would allow a framework to specify a relative path for volume.container_path and in such cases automatically convert it to an absolute path by prepending the sandbox directory to it. /cc [~jieyu]",3 MESOS-5390,"v1 Executor Protos not included in maven jar","According to MESOS-4793 the Executor v1 HTTP API was released in Mesos 0.28.0 however the corresponding protos are not included in the maven jar for version 0.28.0 or 0.28.1. Script to verify {code} wget https://repo.maven.apache.org/maven2/org/apache/mesos/mesos/0.28.1/mesos-0.28.1.jar && unzip -lf mesos-0.28.1.jar | grep ""v1\/executor"" | wc -l {code}",1 MESOS-5391,"Add support for controlling resource limits in Mesos containerizer.","Currently, we dont have ability to control system resource limits. Add support for : - Frameworks to specify resource limits - Operators to override default resource limits.",5 MESOS-5392,"Design doc for adding resource limits support for Mesos containerizer","This will be the design doc for MESOS-5391.",3 MESOS-5397,"Slave/Agent Rename Phase 1: Update terms in the website","The following files need to be updated site/source/index.html.md ",1 MESOS-5398,"Rewrite os::read() to be friendlier to reading binary files","The existing read() implementation is based on calling getline() to read in chunks of data from a file. This is fine for text-based files, but is a little strange for binary files.",3 MESOS-5399,"Add utility for parsing ld.so.cache on linux.","The /etc/ld.so.cache file on linux contains a mapping of dynamic library names to their fully resolved paths for use by ld when linking. We should write a utility that knows how to parse this file so we can find the paths to these libraries as well. This is especially important for collecting libraries into a common location for supporting Nvidia GPUs in mesos.",5 MESOS-5400,"Add preliminary support for parsing ELF files in stout.","The upcoming Nvidia GPU support for docker containers in Mesos relies on consolidating all Nvidia shared libraries into a common location for injecting a volume into a container. As part of this, we need some preliminary parsing capabilities for ELF file to infer things about each shared library we are consolidating.",5 MESOS-5401,"Add ability to inject a Volume of Nvidia libraries/binaries into a docker-image container in mesos containerizer.","In order to support Nvidia GPUs with docker containers in Mesos, we need to be able to consolidate all Nvidia libraries into a common volume and inject that volume into the container. This tracks the support in the mesos containerizer. The docker containerizer support will be tracked separately. More info on why this is necessary here: https://github.com/NVIDIA/nvidia-docker/",5 MESOS-5403,"Introduce ObjectApprover Interface to Authorizer.","As outlined here (https://docs.google.com/document/d/1FuS79P8uj5PIBycrBlkJSBKOtmeO8ezAuiNXxwIA3qA) we plan to add the option of retrieving a FilterObject from the Authorizer with the goal of allowing for efficient authorization of a large number of (potentially large) objects. ",5 MESOS-5404,"Allow `Task` to be authorized.","As we need to be able to authorize `Tasks` (e.g., for deciding whether to include them in the /state endpoint when applying authorization based filtering) we need to expose it to the authorizer. Secondly we also need to include some additional information (`user` and `Env variables`) in order to provide the authorizer with meaning information.",3 MESOS-5405,"Make fields in authorization::Request protobuf optional.","Currently {{authorization::Request}} protobuf declares {{subject}} and {{object}} as required fields. However, in the codebase we not always set them, which renders the message in the uninitialized state, for example: * https://github.com/apache/mesos/blob/0bfd6999ebb55ddd45e2c8566db17ab49bc1ffec/src/common/http.cpp#L603 * https://github.com/apache/mesos/blob/0bfd6999ebb55ddd45e2c8566db17ab49bc1ffec/src/master/http.cpp#L2057 I believe that the reason why we don't see issues related to this is because we never send authz requests over the wire, i.e., never serialize/deserialize them. However, they are still invalid protobuf messages. Moreover, some external authorizers may serialize these messages. We can either ensure all required fields are set or make both {{subject}} and {{object}} fields optional. This will also require updating local authorizer, which should properly handle the situation when these fields are absent. We may also want to notify authors of external authorizers to update their code accordingly. It looks like no deprecation is necessary, mainly because we already—erroneously!—treat these fields as optional.",3 MESOS-5406,"Validate ACLs on creating an instance of local authorizer.","Some combinations of ACLs are not allowed, for example, specifying both {{SetQuota}} and {{UpdateQuota}}. We should capture such issues and error out early. This ticket aims to add as many validations as possible to a dedicated {{validate()}} routine, instead of having them implicitly in the codebase.",3 MESOS-5408,"Delete the /observe HTTP endpoint","The ""/observe"" endpoint was introduced a long time ago for supporting functionality that was never implemented. We should just kill this endpoint and associated code to avoid tech debt.",2 MESOS-5413,"`network/cni` isolator should skip the bind mounting of the CNI network information root directory if possible","Currently in the create() method `network/cni` isolator, for the CNI network information root directory (i.e., {{/var/run/mesos/isolators/network/cni}}), we do a self bind mount and make sure it is a shared mount of its own peer group. However, we should not do a self bind mount if the mount containing the CNI network information root directory is already a shared mount in its own share peer group, just like what we did for `filesystem/linux` isolator in [MESOS-5239 | https://issues.apache.org/jira/browse/MESOS-5239].",3 MESOS-5419,"Document all known client libraries for the Scheduler/Executor API","Previously during various community syncs, we had decided that we would only be supporting the C++ scheduler/executor library in the Mesos code base going forward. We should however, still document the client libraries available in various languages to drive adoption/have a recommended list for users to look up. This can be similar to the already existing frameworks doc: http://mesos.apache.org/documentation/latest/frameworks/ Other projects also seem to have been following a similar practice: https://docs.docker.com/engine/reference/api/remote_api_client_libraries/ https://github.com/kubernetes/kubernetes/blob/master/docs/devel/client-libraries.md",2 MESOS-5420,"Implement os::exists for processes","os::exists returns true if the process identified by the parameter is still running or was running and we are able to get information about it, such us the exit code. In Windows after obtaining a handle to the process it is possible perform those operations. ",1 MESOS-5425,"Consider using IntervalSet for Port range resource math","Follow-up JIRA for comments raised in MESOS-3051 (see comments there). We should consider utilizing [{{IntervalSet}}|https://github.com/apache/mesos/blob/a0b798d2fac39445ce0545cfaf05a682cd393abe/3rdparty/stout/include/stout/interval.hpp] in [Port range resource math|https://github.com/apache/mesos/blob/a0b798d2fac39445ce0545cfaf05a682cd393abe/src/common/values.cpp#L143].",3 MESOS-5426,"Relax version compatibility requirement for some modules","Some module interfaces such as authenticatee, have not changed for a while and so we should be able to relax the version compatibility checks. This needs to be done on a case-by-case basis. I am also hoping, this change will also provide a framework for updating the version requirement for other modules as we go towards a stable module API. [cc: [~adam-mesos] [~tillt] ]",5 MESOS-5435,"Add default implementations to all Isolator virtual functions","Currently, all of the virtual functions in `mesos::slave::Isolator` are pure virtual (expect status()). For many isolators, however, it doesn't make sense to implement all of these virtual functions. Each isolator has to provide its own default implementation of these functions even if they aren't really relying on them. This adds unnecessary extra code to many isolators that don't need them. Moreover, the `MesosIsolatorProcess` has the same problem for each of its virtual functions. We should provide defaults for these instead of making each and every isolator implement even in cases when it doesn't make sense.",1 MESOS-5436,"GPU resource broke framework data table in webUI","In agent_framework.html and master/static/agent.html, we add {{GPUs (Used / Allocated)}} in table header. But we didn't add the corresponding column to the table body as well. On the other hand, we didn't provide statistics for gpus on monitor endpoints. To provide those data in webui, it requires we implement gpus statistics in monitor endpoints firstly. ",1 MESOS-5437,"AppC appc_simple_discovery_uri_prefix is lost in configuration.md","AppC appc_simple_discovery_uri_prefix is lost in configuration.md",1 MESOS-5445,"Allow libprocess/stout to build without first doing `make` in 3rdparty.","After the 3rdparty reorg, libprocess/stout are enable to build their dependencies and so one has to do `make` in 3rdpart/ before building libprocess/stout.",2 MESOS-5450,"Make the SASL dependency optional.","Right now there is a hard dependency on SASL, which probably won't work well on Windows (at least) in the near future for our use cases. In the future, it would be nice to have a pluggable authentication layer.",2 MESOS-5452,"Agent modules should be initialized before all components except firewall.","On Mesos Agents Anonymous modules should not have any dependencies, by design, on any other Mesos components. This implies that Anonymous modules should be initialized before all other Mesos components other than `Firewall`. The dependency on `Firewall` is primarily to enforce any policies to secure endpoints that might be owned by the Anonymous module.",1 MESOS-5453,"CNI should not store subnet of address in NetworkInfo","When the CNI isolator executes the CNI plugin, that CNI plugin will return an IP Address and Subnet (192.168.0.1/32). Mesos should strip the subnet before storing the address in the Task.NetworkInfo.IPAddress. Reason being - most current mesos components are not expecting a subnet in the Task's NetworkInfo.IPAddress, and instead expect just the IP address. This can cause errors in those components, such as Mesos-DNS failing to return a NetworkInfo address (and instead defaulting to the next configured IPSource), and Marathon generating invalid links to tasks (as it includes /32 in the link)",2 MESOS-5456,"Master anonymous modules should initialized before any other components.","Anonymous modules on the Master are by design supposed to be independent of any Mesos components. However, there might be a dependency in the reverse direction. For e.g., Anonymous modules might want to influence the behavior of Mesos components (say by generating configuration, that might be consumed later by the components). The Anonymous modules on the Master therefore need to be initialized before other Mesos components. ",1 MESOS-5459,"Update RUN_TASK_WITH_USER to use additional metadata","Currently, the `authorization::Action` `RUN_TASK_WITH_USER` will pass the user as its `Object.value` string, but some authorizers may want to make authorization decisions based on additional task attributes, like role, resources, labels, container type, etc. We should create a new Action `RUN_TASK` that passes FrameworkInfo and TaskInfo in its Object, and the LocalAuthorizer's RunTaskWithUser ACL can be implemented using the user found in TaskInfo/FrameworkInfo. We may need to leave the old _WITH_USER action around, but it's arguable whether we should call the authorizer once for RUN_TASK and once for RUN_TASK_WITH_USER, or only use the new action and deprecate the old one?",5 MESOS-5469,"Remove hard-coded principals in `PersistentVolumeEndpointsTest.SlavesEndpointFullResources`","In the test {{PersistentVolumeEndpointsTest.SlavesEndpointFullResources}}, the value {{test-principal}} is hard-coded into the JSON strings expected in HTTP responses. It would be more durable to use {{DEFAULT_CREDENTIAL.principal()}} instead.",1 MESOS-5470,"Confirm errors in authorized persistent volume tests","The tests {{PersistentVolumeTest.BadACLDropCreateAndDestroy}} and {{PersistentVolumeTest.BadACLNoPrincipal}} check for a failed Destroy operation by confirming that the persistent volume is still contained in an offer received after the attempted operation. We should also explicitly check that the operation did not succeed due to failed authorization.",1 MESOS-5471,"Enable `Option` to handle string literals gracefully","In {{FlagsBase::add}}, MESOS-5064 begins making use of template function parameters like {{T2*}} for the default flag value rather than {{Option&}}. This is because in some places in the code base, we pass string literals for this argument. If an {{Option}} type is used, the compiler infers a {{char [x]}} type for {{T2}}, which breaks {{Option::getOrElse}}, which attempts to return that same type, since returning arrays is disallowed. To fix this, we could employ {{std::decay}}, which would convert a return type of {{char [x]}} into {{const char *}}.",2 MESOS-5531,"Re-enable style-check for stout.","After the 3rdparty reorg, the mesos-style checker stopped checking stout.",1 MESOS-5532,"Maven build is too verbose for batch builds","During a non-interactive (without terminal) Mesos build, maven generates several thousands of log lines when downloading artifacts. This often makes several web-based log viewers unresponsive. Further, these several thousand line long progress indicator logs don't provide any meaningful information either. From a user's point of view, just knowing that the artifact download succeeded/failed is often enough. We should be using '--batch-mode' flag to disable these additionals log lines.",1 MESOS-5537,"http v1 SUBSCRIBED scheduler event always has nil http_interval_seconds","I'm writing a controller in Go to monitor heartbeats. I'd like to use the interval as communicated by the master, which should be specified in the SUBSCRIBED event. But it's not. {code} 2016/06/03 18:34:04 {Type:SUBSCRIBED Subscribed:&Event_Subscribed{FrameworkID:&mesos.FrameworkID{Value:ffdb6d6e-0167-4fa2-98f9-2c3f8157fc25-0004,},HeartbeatIntervalSeconds:nil,} Offers:nil Rescind:nil Update:nil Message:nil Failure:nil Error:nil} {code} {code} $ dpkg -l |grep -e mesos ii mesos 0.28.0-2.0.16.ubuntu1404 amd64 Cluster resource manager with efficient resource isolation {code} I *am* seeing HEARTBEAT events. Just not seeing the interval specified in the SUBSCRIBED event.",1 MESOS-5549,"Document aufs provisioner backend.","We should update container-image.md with the newly supported backend.",2 MESOS-5550,"Remove Nvidia GPU Isolator's link-time dependence on `libnvidia-ml`","The current Nvidia GPU isolator has a dependence on `libnvidia-ml`, and as such, pulls a hard dependence on this library into `libmesos`. The consequence of this is that any process that relies on `libmesos` has to have `libnvidia-ml` available as well, even on machines where no GPUs are available. Since this library is not easily installable through standard package managers, having such a hard dependence can be burdensome. This ticket proposes to pull in `libnvidia-ml` as a run-time dependence instead of a link-time dependence. As such, only machines that actually have GPUs installed and would like to rely on this library need to have it installed.",2 MESOS-5551,"Move the Nvidia GPU isolator from `cgroups/devices/gpu/nvidia` to `gpu/nvidia`","Currently, the Nvidia GPU isolator lives in `src/slave/containerizers/mesos/isolators/cgroups/devices/gpu/nvidia`. However, in the future this isolator will do more than simply isolate GPUs using the cgroups devices subsystem (e.g. volume management for injecting machine specific Nvidia libraries into a container). For this reason, we should preemptively move this isolator up to `src/slave/containerizers/mesos/isolators/gpu/nvidia`. As part of this, we should update the string we pass to the `--isolator` agent flag to reflect this.",2 MESOS-5552,"Bundle NVML headers for Nvidia GPU support.","Currently, we rely on a script to install the Nvidia GDK as a build dependence for building Mesos with Nvidia GPU support. A previous ticket removed the Mesos build dependence on `libnvidia-ml` which comes as part of the GDK. This ticket proposes bundling the NVML headers with Mesos in order to completely remove the build dependence on the GDK. With this change it will be much simpler to configure and build with Nvidia GPU support. All that will be required is: {noformat} ../configure --enable-nvidia-gpu-support make -j {noformat} ",1 MESOS-5554,"Change major/minor device types for Nvidia GPUs to `unsigned int`","Currently, the GPU struct specifies the type of its `major` and `minor` fields as `dev_t`, which is actually a concatenation of both the major and minor device numbers accessible through the `major()` and `minor()` macros. These macros return an `unsigned int` when handed a `dev_t`, so it makes sense for these fields to be of that type instead.",1 MESOS-5555,"Always provide access to NVIDIA control devices within containers (if GPU isolation is enabled).","Currently, access to `/dev/nvidiactl` and `/dev/nvidia-uvm` is only granted to / revoked from a container as GPUs are added and removed from them. On some level, this makes sense because most jobs don't need access to these devices unless they are also using a GPU. However, there are cases when access to these files is appropriate, even when not making use of a GPU. Running `nvidia-smi` to control the global state of the underlying nvidia driver, for example. We should add `/dev/nvidiactl` and `/dev/nvidia-uvm` to the default whitelist of devices to include in every container when the `gpu/nvidia` isolator is enabled. This will allow a container to run standard nvidia driver tools (such as `nvidia-smi`) without failing with abnormal errors when no GPUs have been granted to it. As such, these tools will now report that no GPUs are installed instead of failing abnormally.",3 MESOS-5556,"Fix method of populating device entries for `/dev/nvidia-uvm`, etc.","Currently, the major/minor numbers of `/dev/nvidiactl` and `/dev/nvidia-uvm` are hard-coded. This causes problems for `/dev/nvidia-uvm` because its major number is part of the ""Experimental"" device range on Linux. Because this range is experimental, there is no guarantee which device number will be assigned to it on a given machine. We should use `os:stat::rdev()` to extract the major/minor numbers programatically.",2 MESOS-5557,"Add `NvidiaGpuAllocator` component for cross-containerizer GPU allocation","We need some way of allocating GPUs from a centralized location to allow both the mesos containerizer and the docker containerizer to pull from central pool. We propose to build a `NvidiaGpuAllocator` for this purpose. This component should also be overloaded to do resource enumeration of GPUs based on the agent flags. This keeps all code for enumerating GPUs and the resources they represent in a single centralized location.",5 MESOS-5558,"Update `Containerizer::resources()` to use the `NvidiaGpuAllocator`","With the introduction of the shared `NvidiaGpuAllocator` component, `Containerizer::resources()` should be updated to use it.",2 MESOS-5559,"Integrate the `NvidiaGpuAllocator` into the `NvidiaGpuIsolator`",NULL,3 MESOS-5561,"Need to remove references to ""messages/messages.hpp"" from `State` API","In order to expose the `State` API for using replicated log in Mesos modules it is necessary that the `State` API does not reference headers that are not exposed as part of the Mesos installation. Currently include/mesos/state/protobuf.hpp references src/messages/messages.hpp making the `State` API unusable in a module. We need to move the protobuf `serialize`/`deserialize` functions out of messages.hpp and move them to `stout/protobuf.hpp`. This will help us remove references to messages.hpp from the `State` API.",2 MESOS-5562,"Add class to share Nvidia-specific components between containerizers","Once we have an `NvidiaGPUAllocator` component, we need some way to share it across multiple containerizers. Moreover, we anticipate needing other Nvidia components to share across multiple containerizers as well (e.g. an `NvidiaVolumeManager` component). As such, we should add a wrapper class around these components to make it easily passable to each containerizer without having to continually add a bunch of parameters to the Containerizer interface.",2 MESOS-5563,"Rearrange Nvidia GPU files to cleanup semantics for header inclusion.","Currently, components outside of `src/slave/containerizers/mesos/isolators/gpu` have to protect their #includes for certain Nvidia header files with the ENABLE_NVIDIA_GPU_SUPPORT flag. Other headers strictly *could not* be wrapped in this flag. We need to clean up this header madness, by creating a common ""nvidia.hpp"" header that takes care of all the dependencies. All componenents outside of `src/slave/containerizers/mesos/isolators/gpu` should only need to #include this one header instead of managing everything themselves.",1 MESOS-5564,"Document common use cases of authorization","Our authorization documentation covers the existing functionality, but it doesn't provide a practical how-to guide to help users accomplish common authorized use cases. For example, a user recently reported that to gain full use of the web UI after upgrading to Mesos 1.0, six new ACL rules needed to be added: {{get_endpoints, view_frameworks, view_tasks, view_executors, access_sandboxes, and access_mesos_logs}}. Rather than expecting users to figure this out on their own, we should document the ACLs needed to accomplish a common goal like this. Similarly, authorizing a stateful framework to accomplish the actions it would usually be expected to perform would involve setting rules for {{register_frameworks, run_tasks, shutdown_frameworks, reserve_resources, unreserve_resources, create_volumes, and destroy_volumes}}.",1 MESOS-5570,"Improve CHANGELOG and upgrades.md","Currently we have a lot of data duplication between the CHANGELOG and upgrades.md. We should try to improve this and potentially make the CHANGLOG a markdown file as well. For inspiration see the Hadoop changelog: https://github.com/apache/hadoop/blob/2e1d0ff4e901b8313c8d71869735b94ed8bc40a0/hadoop-common-project/hadoop-common/src/site/markdown/release/1.2.0/CHANGES.1.2.0.md ",3 MESOS-5576,"Masters may drop the first message they send between masters after a network partition","We observed the following situation in a cluster of five masters: || Time || Master 1 || Master 2 || Master 3 || Master 4 || Master 5 || | 0 | Follower | Follower | Follower | Follower | Leader | | 1 | Follower | Follower | Follower | Follower || Partitioned from cluster by downing this VM's network || | 2 || Elected Leader by ZK | Voting | Voting | Voting | Suicides due to lost leadership | | 3 | Performs consensus | Replies to leader | Replies to leader | Replies to leader | Still down | | 4 | Performs writing | Acks to leader | Acks to leader | Acks to leader | Still down | | 5 | Leader | Follower | Follower | Follower | Still down | | 6 | Leader | Follower | Follower | Follower | Comes back up | | 7 | Leader | Follower | Follower | Follower | Follower | | 8 || Partitioned in the same way as Master 5 | Follower | Follower | Follower | Follower | | 9 | Suicides due to lost leadership || Elected Leader by ZK | Follower | Follower | Follower | | 10 | Still down | Performs consensus | Replies to leader | Replies to leader || Doesn't get the message! || | 11 | Still down | Performs writing | Acks to leader | Acks to leader || Acks to leader || | 12 | Still down | Leader | Follower | Follower | Follower | Master 2 sends a series of messages to the recently-restarted Master 5. The first message is dropped, but subsequent messages are not dropped. This appears to be due to a stale link between the masters. Before leader election, the replicated log actors create a network watcher, which adds links to masters that join the ZK group: https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/network.hpp#L157-L159 This link does not appear to break (Master 2 -> 5) when Master 5 goes down, perhaps due to how the network partition was induced (in the hypervisor layer, rather than in the VM itself). When Master 2 tries to send an {{PromiseRequest}} to Master 5, we do not observe the [expected log message|https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/replica.cpp#L493-L494] Instead, we see a log line in Master 2: {code} process.cpp:2040] Failed to shutdown socket with fd 27: Transport endpoint is not connected {code} The broken link is removed by the libprocess {{socket_manager}} and the following {{WriteRequest}} from Master 2 to Master 5 succeeds via a new socket.",5 MESOS-5577,"Modules using replicated log state API require zookeeper headers","The state API uses zookeeper client headers and hence the bundled zookeeper headers need to be installed during Mesos installation. ",1 MESOS-5578,"Support static address allocation in CNI","Currently a framework can't specify a static IP address for the container when using the network/cni isolator. The `ipaddress` field in the `NetworkInfo` protobuf was designed for this specific purpose but since the CNI spec does not specify a means to allocate an IP address to the container the `network/cni` isolator cannot honor this field even when it is filled in by the framework. Creating this ticket to act as a place holder to track this limitation. As and when the CNI spec allows us to specify a static IP address for the container, we can resolve this ticket. ",1 MESOS-5579,"Support static IP address allocation with `DockerContainerizer`","Docker run supports the `--ip` option to allocate a specific IPv4 address to the container. Also, the `NetworkInfo` protobuf has an `ipaddress` field that all frameworks to specify an IP address for the container. The docker executor should therefore invoke the `docker run` command with the --ip option whenever the `ipaddress` field of the `NetworkInfo` is set allowing frameworks to try and assign a static IP address for their services.",1 MESOS-5580,"Implement authn/authz for the network/cni isolator","Currently any framework can launch containers on any CNI network irrespective of its role and principal. We need perform authn/authz in the network/cni isolator (or Master) to make sure that only roles/principals specified by the operator can launch containers on a given network. ",3 MESOS-5581,"Guarantee ordering between Isolators","Some isolators depend on other isolators. However, we currently do not have a generic method of expressing these dependencies. We special case the `filesystem/*` isolators to make sure that dependencies on them are satisfied, but no other dependencies can be expressed. Instead, we should use a vector to represent the pairing of isolator name to isolator creator function. This way, the relative dependencies between each isolator will be implicit in the ordering of the vector. Currently, a hashmap is used to hold this pairing, but this is inadequate because hashmaps are inherently unordered. The new implementation using a vector will ensure everything is processed in the order it is listed.",3 MESOS-5582,"Create a `cgroups/devices` isolator.","Currently, all the logic for the `cgroups/devices` isolator is bundled into the Nvidia GPU Isolator. We should abstract it out into it's own component and remove the redundant logic from the Nvidia GPU Isolator. Assuming the guaranteed ordering between isolators from MESOS-5581, we can be sure that the dependency order between the `cgroups/devices` and `gpu/nvidia` isolators is met.",2 MESOS-5583,"Improve authorization documentation when setting permissive flag.","A common problem for a users starting to use acls is that once they set `permisse = false` and not add acls allowing common operations (e.g., register_framework) their Mesos cluster don't behave as expected. ",1 MESOS-5588,"Improve error handling when parsing acls.","During parsing of the authorizer errors are ignored. This can lead to undetected security issues. Consider the following acl with an typo (usr instead of user) {code} ""view_frameworks"": [ { ""principals"": { ""type"": ""ANY"" }, ""usr"": { ""type"": ""NONE"" } } ] {code} When the master is started with these flags it will interprete the acl int he following way which gives any principal access to any framework. {noformat} view_frameworks { principals { type: ANY } } {noformat}",5 MESOS-5592,"Pass NetworkInfo to CNI Plugins","Mesos has adopted the Container Network Interface as a simple means of networking Mesos tasks launched by the Unified Containerizer. The CNI specification covers a minimum feature set, granting the flexibility to add customized networking functionality in the form of agreements made between the orchestrator and CNI plugin. This proposal is to pass NetworkInfo.Labels to the CNI plugin by injecting it into the CNI network configuration json during plugin invocation. Design Doc on this change: https://docs.google.com/document/d/1rxruCCcJqpppsQxQrzTbHFVnnW6CgQ2oTieYAmwL284/edit?usp=sharing reviewboard: https://reviews.apache.org/r/48527/",3 MESOS-5597,"Document Mesos ""health check"" feature","We don't talk about this feature at all.",5 MESOS-5605,"Improve documentation for using persistent volumes. ","When using persistent volumes at a arangoDB we ran into a few pitfalls. We should document them in order for others to avoid those issues.",2 MESOS-5609,"Put initial scaffolding in place for implementing SUBSCRIBE call on v1 Master API.","As discussed on MESOS-5498, this ticket is for tracking work to put the initial scaffolding in place for streaming task status update events to a client that has subscribed to the {{api/v1}} Operator API endpoint. Other events/support for snapshots would be done as part of MESOS-5498.",5 MESOS-5618,"Added a metric indicating if replicated log for the registrar has recovered or not.","This gives operator insight about the state of the replicated log for registrar. The operator needs to know when it is safe to move on to another master in the upgrade orchestration pipeline. ",3 MESOS-5629,"Agent segfaults after request to '/files/browse'","We observed a number of agent segfaults today on an internal testing cluster. Here is a log excerpt: {code} Jun 16 17:12:28 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:28.522925 24830 status_update_manager.cpp:392] Received status update acknowledgement (UUID: e79ab0f4-2fa2-4df2-9b59-89b97a482167) for task datadog-monitor.804b138b-33e5-11e6-ac16-566ccbdde23e of framework 6d4248cd-2832-4152-b5d0-defbf36f6759-0000 Jun 16 17:12:28 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:28.523006 24830 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_RUNNING (UUID: e79ab0f4-2fa2-4df2-9b59-89b97a482167) for task datadog-monitor.804b138b-33e5-11e6-ac16-566ccbdde23e of framework 6d4248cd-2832-4152-b5d0-defbf36f6759-0000 Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: I0616 17:12:29.147181 24824 http.cpp:192] HTTP GET for /slave(1)/state from 10.10.0.87:33356 Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: *** Aborted at 1466097149 (unix time) try ""date -d @1466097149"" if you are using GNU date *** Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: PC: @ 0x7ff4d68b12a3 (unknown) Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: *** SIGSEGV (@0x0) received by PID 24818 (TID 0x7ff4d31ab700) from PID 0; stack trace: *** Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6431100 (unknown) Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d68b12a3 (unknown) Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7eced33 process::dispatch<>() Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7e7aad7 _ZNSt17_Function_handlerIFN7process6FutureIbEERK6OptionISsEEZN5mesos8internal5slave9Framework15recoverExecutorERKNSA_5state13ExecutorStateEEUlS6_E_E9_M_invokeERKSt9_Any_dataS6_ Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd1752 mesos::internal::FilesProcess::authorize() Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd1bea mesos::internal::FilesProcess::browse() Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d7bd6e43 std::_Function_handler<>::_M_invoke() Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d85478cb _ZZZN7process11ProcessBase5visitERKNS_9HttpEventEENKUlRKNS_6FutureI6OptionINS_4http14authentication20AuthenticationResultEEEEE0_clESC_ENKUlRKNS4_IbEEE1_clESG_ Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d8551341 process::ProcessManager::resume() Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d8551647 _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6909220 (unknown) Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d6429dc5 start_thread Jun 16 17:12:29 ip-10-10-0-87 mesos-slave[24818]: @ 0x7ff4d615728d __clone Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service: main process exited, code=killed, status=11/SEGV Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: Unit dcos-mesos-slave.service entered failed state. Jun 16 17:12:29 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service failed. Jun 16 17:12:34 ip-10-10-0-87 systemd[1]: dcos-mesos-slave.service holdoff time over, scheduling restart. {code} In every case, the stack trace indicates one of the {{/files/*}} endpoints; I observed this a number of times coming from {{browse()}}, and twice from {{read()}}. The agent was built from the 1.0.0-rc1 branch, with two cherry-picks applied: [this|https://reviews.apache.org/r/48563/] and [this|https://reviews.apache.org/r/48566/], which were done to repair a different [segfault issue|https://issues.apache.org/jira/browse/MESOS-5587] on the master and agent. Thanks go to [~bmahler] for digging into this a bit and discovering a possible cause [here|https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L5737-L5745], where use of {{defer()}} may be necessary to keep execution in the correct context.",3 MESOS-5630,"Change build to always enable Nvidia GPU support for Linux","See Summary",2 MESOS-5634,"Add Framework Capability for GPU_RESOURCES","Due to the scarce resource problem described in MESOS-5377, we plan to introduce a GPU_RESOURCES Framework capability. This capability will allow the Mesos allocator to make better decisions about which frameworks should receive resources from GPU capable machines. In essence, the allocator will ONLY allocate resources from GPU capable machines to frameworks that have this capability. This is necessary to prevent non-GPU workloads from filling up the GPU machines and preventing GPU workloads to run.",3 MESOS-5638,"Check all omissions of 'defer' for safety","When registering callbacks with {{.then}}, {{.onAny}}, etc., we sometimes omit {{defer()}} in cases where it's deemed safe; for example, when the callback uses no process state and thus could be executed in an arbitrary context. Because of recent bugs due to the unsafe omission of {{defer()}}, we should do a sweep of the codebase for all such occurrences and evaluate their safety. We should also consider using {{defer()}} consistently in all such cases, as our [documentation|https://github.com/apache/mesos/tree/master/3rdparty/libprocess#defer] recommends.",5 MESOS-5639,"Add documentation about metadata for CNI plugins.","We need to document the behavior implemented in MESOS-5592.",2 MESOS-5646,"Build `network/cni` isolator with `libnl` support","Currently, the `network/cni` isolator does not have the ability to collect network statistics for containers launched on a CNI network. We need to give the `network/cni` isolator the ability to query interfaces, route tables and statistics in the containers network namespace. To achieve this the `network/cni` isolator will need to talk `netlink`. For enabling `netlink` API we need the `network/cni` isolator to be built with libnl support. ",3 MESOS-5647,"Expose a statistics endpoint on the `network/cni` isolator.","We need a statistics endpoint in the `network/cni` isolator to expose metrics relating to a containers network traffic. On receiving a request for a given container the `network/cni` isolator could use NETLINK system calls to query the kernel for interface and routing statistics for a given container's network namespace.",5 MESOS-5649,"Build an example framework to consume GPUs","This framework should show how to build a GPU capable framework that can accept offers with GPUs and launch tasks that use them.",3 MESOS-5650,"UNRESERVE operation causes master to crash.","{{RESERVE}} operation may cause a master failure: {noformat} I0619 05:02:02.298602 11194 http.cpp:312] HTTP GET for /master/slaves from 172.17.0.4:49617 with User-Agent='python-requests/2.9.1' I0619 05:02:02.305542 11193 http.cpp:312] HTTP POST for /master/destroy-volumes from 172.17.0.4:49618 with User-Agent='python-requests/2.9.1' I0619 05:02:02.306731 11191 master.cpp:6560] Sending checkpointed resources mem(kafkatest-role, kafkatest-principal, {resource_id: 7408cc53-183c-48c2-a07f-7087806219f3}):256; cpus(kafkatest-role, kafkatest-principal, {resource_id: d7888099-db8f-4018-9109-f70fb1174f53}):1.5; mem(kafkatest-role, kafkatest-principal, {resource_id: b5dd90fc-2c12-4199-9fc4-cf9f918e332b}):2304; ports(kafkatest-role, kafkatest-principal, {resource_id: a0ee4e01-803f-4b71-950d-483caeb01a57}):[9305-9305, 11596-11596]; cpus(kafkatest-role, kafkatest-principal, {resource_id: 8cd72abb-7089-4220-bb90-46b70c9953ab}):0.5; disk(kafkatest-role, kafkatest-principal, {resource_id: ed06ec6e-2d15-4d0e-bbc4-95a942e58596})[]:11204 to slave a80ff9dd-e046-43ab-b763-28365b136f6b-S0 at slave(1)@10.0.0.5:5051 (10.0.0.5) I0619 05:02:02.311069 11189 http.cpp:312] HTTP POST for /master/destroy-volumes from 172.17.0.4:49619 with User-Agent='python-requests/2.9.1' I0619 05:02:02.312191 11187 master.cpp:6560] Sending checkpointed resources cpus(kafkatest-role, kafkatest-principal, {resource_id: f1ff4806-0c24-4d60-ad2b-b06462ee4081}):1.5; mem(kafkatest-role, kafkatest-principal, {resource_id: cb8dc92d-64f0-4007-8520-1f63625b98c0}):2304; ports(kafkatest-role, kafkatest-principal, {resource_id: 225b4172-be77-453a-a94f-8845edc3f09a}):[9692-9692, 11824-11824]; cpus(kafkatest-role, kafkatest-principal, {resource_id: 942e102a-ca63-480d-9853-9a39e2695ec9}):0.5; mem(kafkatest-role, kafkatest-principal, {resource_id: cad57f8c-27f5-484c-a3fb-e80da74f0813}):256; disk(kafkatest-role, kafkatest-principal, {resource_id: e6563e09-e284-4aaf-8d53-72056695de41})[]:11204 to slave 489aa72f-ae07-4383-a56f-6fe9346ace37-S7 at slave(1)@10.0.0.7:5051 (10.0.0.7) I0619 05:02:02.316118 11189 http.cpp:312] HTTP GET for /master/slaves from 172.17.0.4:49620 with User-Agent='python-requests/2.9.1' I0619 05:02:02.321527 11189 http.cpp:312] HTTP POST for /master/unreserve from 172.17.0.4:49621 with User-Agent='python-requests/2.9.1' I0619 05:02:02.323523 11193 master.cpp:6560] Sending checkpointed resources to slave a80ff9dd-e046-43ab-b763-28365b136f6b-S0 at slave(1)@10.0.0.5:5051 (10.0.0.5) I0619 05:02:02.327658 11191 http.cpp:312] HTTP POST for /master/unreserve from 172.17.0.4:49622 with User-Agent='python-requests/2.9.1' F0619 05:02:02.329208 11190 sorter.cpp:284] Check failed: total_.scalarQuantities.contains(oldSlaveQuantity) {noformat} Possible reasons: * Recent improvements in allocator (b4d746f) * Bug in bookkeeping during the previous {{UNRESERVE}} * Network partition that happened after {{RESERVE}} and before {{UNRESERVE}}",5 MESOS-5657,"Executors should not inherit environment variables from the agent.","Currently executors are inheriting environment variables form the slave in mesos containerizer. This is problematic, because of two reasons: 1. When we use docker images (such as `mongo`) in unified containerizer, duplicated environment variables inherited from the slave lead to initialization failures, because LANG and/or LC_* environment variables are not set correctly. 2. When we are looking at the environment variables from the executor tasks, there are pages of environment variables listed, which is redundant and dangerous. Depending on the reasons above, we propose that no longer allow executors to inherit environment variables from the slave. Instead, users should specify all environment variables they need by setting the slave flag `--executor_environment_variables` as a JSON format.",3 MESOS-5659,"Design doc for TASK_UNREACHABLE","See MESOS-4049.",5 MESOS-5660,"ContainerizerTest.ROOT_CGROUPS_BalloonFramework fails because executor environment isn't inherited","A recent change forbits the executor to inherit environment variables from the agent's environment. As a regression this break {{ContainerizerTest.ROOT_CGROUPS_BalloonFramework}}.",2 MESOS-5661,"Use snake casing for flag names consistently","Historically, we have always used snake casing for the flag variables e.g., {{docker_config}} etc. However, there are some instances in our .cpp code where we define the flag name in the .cpp file in camel case e.g., {{modulesDir}} but still have the flag name as {{modules_dir}} when taking arguments from the user. It would be good to audit all such occurrences and consistently uses snake casing in our .cpp/.hpp files everywhere.",1 MESOS-5663,"Remove hard dependence on libelf for Linux"," We recently added a hard dependency for `libelf` on Linux. This was in preparation for some upcoming Nvidia GPU support for injecting volumes into containers. Since this dependence is not actually necessary for the upcoming release, we should remove it for now, and rethink the best way to add it back in later (possibly as a runtime dependence instead of a linktime one).",1 MESOS-5664,"Invalid resources sent to '/reserve' are silently dropped","If an invalid resource is passed to the master's {{/reserve}} endpoint, it will be silently dropped and not cause an error. This can lead, for example, to a {{/reserve}} request containing a single invalid resource receiving a 200 OK response, despite the fact that no resources were reserved as a result of the request. This is due to the fact that the {{+=}} operator for {{Resources}} silently drops invalid resources, and this operator is used when parsing the resources in the HTTP request. This could be addressed by validating the resource objects one at a time as they are parsed.",1 MESOS-5666,"Deprecate camel case proto field in isolator ContainerConfig.","Currently there are extra ExecutorInfo and TaskInfo in isolator ContaienrConfig, because a deprecation cycle is needed to deprecate camel cased proto field names. This JIRA is used for tracking this issue, which should address the TODO in isolator.proto.",2 MESOS-5667,"CniIsolatorTest.ROOT_INTERNET_CURL_LaunchCommandTask fails on CentOS 7.","{noformat} [22:41:54] : [Step 10/10] [ RUN ] CniIsolatorTest.ROOT_INTERNET_CURL_LaunchCommandTask [22:41:54]W: [Step 10/10] I0619 22:41:54.348641 30896 cluster.cpp:155] Creating default 'local' authorizer [22:41:54]W: [Step 10/10] I0619 22:41:54.353384 30896 leveldb.cpp:174] Opened db in 4.634552ms [22:41:54]W: [Step 10/10] I0619 22:41:54.354763 30896 leveldb.cpp:181] Compacted db in 1.360201ms [22:41:54]W: [Step 10/10] I0619 22:41:54.354784 30896 leveldb.cpp:196] Created db iterator in 3421ns [22:41:54]W: [Step 10/10] I0619 22:41:54.354790 30896 leveldb.cpp:202] Seeked to beginning of db in 633ns [22:41:54]W: [Step 10/10] I0619 22:41:54.354797 30896 leveldb.cpp:271] Iterated through 0 keys in the db in 401ns [22:41:54]W: [Step 10/10] I0619 22:41:54.354811 30896 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned [22:41:54]W: [Step 10/10] I0619 22:41:54.354990 30913 recover.cpp:451] Starting replica recovery [22:41:54]W: [Step 10/10] I0619 22:41:54.355123 30915 recover.cpp:477] Replica is in EMPTY status [22:41:54]W: [Step 10/10] I0619 22:41:54.355391 30915 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (18695)@172.30.2.105:40724 [22:41:54]W: [Step 10/10] I0619 22:41:54.355479 30912 recover.cpp:197] Received a recover response from a replica in EMPTY status [22:41:54]W: [Step 10/10] I0619 22:41:54.355581 30914 recover.cpp:568] Updating replica status to STARTING [22:41:54]W: [Step 10/10] I0619 22:41:54.356091 30910 master.cpp:382] Master 27c796db-6f98-4d61-96c0-f583f22787ff (ip-172-30-2-105.mesosphere.io) started on 172.30.2.105:40724 [22:41:54]W: [Step 10/10] I0619 22:41:54.356104 30910 master.cpp:384] Flags at startup: --acls="""" --agent_ping_timeout=""15secs"" --agent_reregister_timeout=""10mins"" --allocation_interval=""1secs"" --allocator=""HierarchicalDRF"" --authenticate_agents=""true"" --authenticate_frameworks=""true"" --authenticate_http=""true"" --authenticate_http_frameworks=""true"" --authenticators=""crammd5"" --authorizers=""local"" --credentials=""/tmp/KhgYrQ/credentials"" --framework_sorter=""drf"" --help=""false"" --hostname_lookup=""true"" --http_authenticators=""basic"" --http_framework_authenticators=""basic"" --initialize_driver_logging=""true"" --log_auto_initialize=""true"" --logbufsecs=""0"" --logging_level=""INFO"" --max_agent_ping_timeouts=""5"" --max_completed_frameworks=""50"" --max_completed_tasks_per_framework=""1000"" --quiet=""false"" --recovery_agent_removal_limit=""100%"" --registry=""replicated_log"" --registry_fetch_timeout=""1mins"" --registry_store_timeout=""100secs"" --registry_strict=""true"" --root_submissions=""true"" --user_sorter=""drf"" --version=""false"" --webui_dir=""/usr/local/share/mesos/webui"" --work_dir=""/tmp/KhgYrQ/master"" --zk_session_timeout=""10secs"" [22:41:54]W: [Step 10/10] I0619 22:41:54.356237 30910 master.cpp:434] Master only allowing authenticated frameworks to register [22:41:54]W: [Step 10/10] I0619 22:41:54.356245 30910 master.cpp:448] Master only allowing authenticated agents to register [22:41:54]W: [Step 10/10] I0619 22:41:54.356247 30910 master.cpp:461] Master only allowing authenticated HTTP frameworks to register [22:41:54]W: [Step 10/10] I0619 22:41:54.356251 30910 credentials.hpp:37] Loading credentials for authentication from '/tmp/KhgYrQ/credentials' [22:41:54]W: [Step 10/10] I0619 22:41:54.356351 30910 master.cpp:506] Using default 'crammd5' authenticator [22:41:54]W: [Step 10/10] I0619 22:41:54.356389 30910 master.cpp:578] Using default 'basic' HTTP authenticator [22:41:54]W: [Step 10/10] I0619 22:41:54.356439 30910 master.cpp:658] Using default 'basic' HTTP framework authenticator [22:41:54]W: [Step 10/10] I0619 22:41:54.356467 30910 master.cpp:705] Authorization enabled [22:41:54]W: [Step 10/10] I0619 22:41:54.356531 30913 whitelist_watcher.cpp:77] No whitelist given [22:41:54]W: [Step 10/10] I0619 22:41:54.356549 30912 hierarchical.cpp:142] Initialized hierarchical allocator process [22:41:54]W: [Step 10/10] I0619 22:41:54.356868 30916 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 1.232816ms [22:41:54]W: [Step 10/10] I0619 22:41:54.356884 30916 replica.cpp:320] Persisted replica status to STARTING [22:41:54]W: [Step 10/10] I0619 22:41:54.356945 30916 recover.cpp:477] Replica is in STARTING status [22:41:54]W: [Step 10/10] I0619 22:41:54.357100 30917 master.cpp:1969] The newly elected leader is master@172.30.2.105:40724 with id 27c796db-6f98-4d61-96c0-f583f22787ff [22:41:54]W: [Step 10/10] I0619 22:41:54.357115 30917 master.cpp:1982] Elected as the leading master! [22:41:54]W: [Step 10/10] I0619 22:41:54.357122 30917 master.cpp:1669] Recovering from registrar [22:41:54]W: [Step 10/10] I0619 22:41:54.357213 30910 registrar.cpp:332] Recovering registrar [22:41:54]W: [Step 10/10] I0619 22:41:54.357429 30913 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (18698)@172.30.2.105:40724 [22:41:54]W: [Step 10/10] I0619 22:41:54.357549 30914 recover.cpp:197] Received a recover response from a replica in STARTING status [22:41:54]W: [Step 10/10] I0619 22:41:54.357728 30913 recover.cpp:568] Updating replica status to VOTING [22:41:54]W: [Step 10/10] I0619 22:41:54.358937 30913 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 1.14792ms [22:41:54]W: [Step 10/10] I0619 22:41:54.358952 30913 replica.cpp:320] Persisted replica status to VOTING [22:41:54]W: [Step 10/10] I0619 22:41:54.358986 30913 recover.cpp:582] Successfully joined the Paxos group [22:41:54]W: [Step 10/10] I0619 22:41:54.359041 30913 recover.cpp:466] Recover process terminated [22:41:54]W: [Step 10/10] I0619 22:41:54.359180 30916 log.cpp:553] Attempting to start the writer [22:41:54]W: [Step 10/10] I0619 22:41:54.359578 30917 replica.cpp:493] Replica received implicit promise request from (18699)@172.30.2.105:40724 with proposal 1 [22:41:54]W: [Step 10/10] I0619 22:41:54.360752 30917 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 1.157449ms [22:41:54]W: [Step 10/10] I0619 22:41:54.360767 30917 replica.cpp:342] Persisted promised to 1 [22:41:54]W: [Step 10/10] I0619 22:41:54.360982 30914 coordinator.cpp:238] Coordinator attempting to fill missing positions [22:41:54]W: [Step 10/10] I0619 22:41:54.361426 30910 replica.cpp:388] Replica received explicit promise request from (18700)@172.30.2.105:40724 for position 0 with proposal 2 [22:41:54]W: [Step 10/10] I0619 22:41:54.362571 30910 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 1.124969ms [22:41:54]W: [Step 10/10] I0619 22:41:54.362587 30910 replica.cpp:712] Persisted action at 0 [22:41:54]W: [Step 10/10] I0619 22:41:54.362999 30911 replica.cpp:537] Replica received write request for position 0 from (18701)@172.30.2.105:40724 [22:41:54]W: [Step 10/10] I0619 22:41:54.363030 30911 leveldb.cpp:436] Reading position from leveldb took 14967ns [22:41:54]W: [Step 10/10] I0619 22:41:54.364264 30911 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 1.214497ms [22:41:54]W: [Step 10/10] I0619 22:41:54.364279 30911 replica.cpp:712] Persisted action at 0 [22:41:54]W: [Step 10/10] I0619 22:41:54.364470 30910 replica.cpp:691] Replica received learned notice for position 0 from @0.0.0.0:0 [22:41:54]W: [Step 10/10] I0619 22:41:54.365622 30910 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 1.131398ms [22:41:54]W: [Step 10/10] I0619 22:41:54.365636 30910 replica.cpp:712] Persisted action at 0 [22:41:54]W: [Step 10/10] I0619 22:41:54.365643 30910 replica.cpp:697] Replica learned NOP action at position 0 [22:41:54]W: [Step 10/10] I0619 22:41:54.365769 30915 log.cpp:569] Writer started with ending position 0 [22:41:54]W: [Step 10/10] I0619 22:41:54.366080 30913 leveldb.cpp:436] Reading position from leveldb took 8794ns [22:41:54]W: [Step 10/10] I0619 22:41:54.366284 30915 registrar.cpp:365] Successfully fetched the registry (0B) in 9.053952ms [22:41:54]W: [Step 10/10] I0619 22:41:54.366315 30915 registrar.cpp:464] Applied 1 operations in 3436ns; attempting to update the 'registry' [22:41:54]W: [Step 10/10] I0619 22:41:54.366487 30911 log.cpp:577] Attempting to append 209 bytes to the log [22:41:54]W: [Step 10/10] I0619 22:41:54.366539 30917 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 1 [22:41:54]W: [Step 10/10] I0619 22:41:54.366839 30917 replica.cpp:537] Replica received write request for position 1 from (18702)@172.30.2.105:40724 [22:41:54]W: [Step 10/10] I0619 22:41:54.367966 30917 leveldb.cpp:341] Persisting action (228 bytes) to leveldb took 1.106053ms [22:41:54]W: [Step 10/10] I0619 22:41:54.367982 30917 replica.cpp:712] Persisted action at 1 [22:41:54]W: [Step 10/10] I0619 22:41:54.368201 30915 replica.cpp:691] Replica received learned notice for position 1 from @0.0.0.0:0 [22:41:54]W: [Step 10/10] I0619 22:41:54.371786 30915 leveldb.cpp:341] Persisting action (230 bytes) to leveldb took 3.566076ms [22:41:54]W: [Step 10/10] I0619 22:41:54.371803 30915 replica.cpp:712] Persisted action at 1 [22:41:54]W: [Step 10/10] I0619 22:41:54.371809 30915 replica.cpp:697] Replica learned APPEND action at position 1 [22:41:54]W: [Step 10/10] I0619 22:41:54.372032 30910 registrar.cpp:509] Successfully updated the 'registry' in 5.693952ms [22:41:54]W: [Step 10/10] I0619 22:41:54.372097 30910 registrar.cpp:395] Successfully recovered registrar [22:41:54]W: [Step 10/10] I0619 22:41:54.372107 30911 log.cpp:596] Attempting to truncate the log to 1 [22:41:54]W: [Step 10/10] I0619 22:41:54.372151 30910 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 2 [22:41:54]W: [Step 10/10] I0619 22:41:54.372218 30911 master.cpp:1777] Recovered 0 agents from the Registry (170B) ; allowing 10mins for agents to re-register [22:41:54]W: [Step 10/10] I0619 22:41:54.372242 30915 hierarchical.cpp:169] Skipping recovery of hierarchical allocator: nothing to recover [22:41:54]W: [Step 10/10] I0619 22:41:54.372467 30914 replica.cpp:537] Replica received write request for position 2 from (18703)@172.30.2.105:40724 [22:41:54]W: [Step 10/10] I0619 22:41:54.373693 30914 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 1.207676ms [22:41:54]W: [Step 10/10] I0619 22:41:54.373708 30914 replica.cpp:712] Persisted action at 2 [22:41:54]W: [Step 10/10] I0619 22:41:54.373920 30913 replica.cpp:691] Replica received learned notice for position 2 from @0.0.0.0:0 [22:41:54]W: [Step 10/10] I0619 22:41:54.375115 30913 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 1.17978ms [22:41:54]W: [Step 10/10] I0619 22:41:54.375145 30913 leveldb.cpp:399] Deleting ~1 keys from leveldb took 14216ns [22:41:54]W: [Step 10/10] I0619 22:41:54.375154 30913 replica.cpp:712] Persisted action at 2 [22:41:54]W: [Step 10/10] I0619 22:41:54.375159 30913 replica.cpp:697] Replica learned TRUNCATE action at position 2 [22:41:54]W: [Step 10/10] I0619 22:41:54.383839 30896 containerizer.cpp:201] Using isolation: docker/runtime,filesystem/linux,network/cni [22:41:54]W: [Step 10/10] I0619 22:41:54.388789 30896 linux_launcher.cpp:101] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher [22:41:54]W: [Step 10/10] E0619 22:41:54.393234 30896 shell.hpp:106] Command 'hadoop version 2>&1' failed; this is the output: [22:41:54]W: [Step 10/10] sh: hadoop: command not found [22:41:54]W: [Step 10/10] I0619 22:41:54.393265 30896 fetcher.cpp:62] Skipping URI fetcher plugin 'hadoop' as it could not be created: Failed to create HDFS client: Failed to execute 'hadoop version 2>&1'; the command was either not found or exited with a non-zero exit status: 127 [22:41:54]W: [Step 10/10] I0619 22:41:54.393316 30896 registry_puller.cpp:111] Creating registry puller with docker registry 'https://registry-1.docker.io' [22:41:54]W: [Step 10/10] I0619 22:41:54.395668 30896 cluster.cpp:432] Creating default 'local' authorizer [22:41:54]W: [Step 10/10] I0619 22:41:54.396100 30914 slave.cpp:203] Agent started on 469)@172.30.2.105:40724 [22:41:54]W: [Step 10/10] I0619 22:41:54.396116 30914 slave.cpp:204] Flags at startup: --acls="""" --appc_simple_discovery_uri_prefix=""http://"" --appc_store_dir=""/tmp/mesos/store/appc"" --authenticate_http=""true"" --authenticatee=""crammd5"" --authorizer=""local"" --cgroups_cpu_enable_pids_and_tids_count=""false"" --cgroups_enable_cfs=""false"" --cgroups_hierarchy=""/sys/fs/cgroup"" --cgroups_limit_swap=""false"" --cgroups_root=""mesos"" --container_disk_watch_interval=""15secs"" --containerizers=""mesos"" --credential=""/mnt/teamcity/temp/buildTmp/CniIsolatorTest_ROOT_INTERNET_CURL_LaunchCommandTask_GcX6XI/credential"" --default_role=""*"" --disk_watch_interval=""1mins"" --docker=""docker"" --docker_kill_orphans=""true"" --docker_registry=""https://registry-1.docker.io"" --docker_remove_delay=""6hrs"" --docker_socket=""/var/run/docker.sock"" --docker_stop_timeout=""0ns"" --docker_store_dir=""/tmp/KhgYrQ/store"" --docker_volume_checkpoint_dir=""/var/run/mesos/isolators/docker/volume"" --enforce_container_disk_quota=""false"" --executor_registration_timeout=""1mins"" --executor_shutdown_grace_period=""5secs"" --fetcher_cache_dir=""/mnt/teamcity/temp/buildTmp/CniIsolatorTest_ROOT_INTERNET_CURL_LaunchCommandTask_GcX6XI/fetch"" --fetcher_cache_size=""2GB"" --frameworks_home="""" --gc_delay=""1weeks"" --gc_disk_headroom=""0.1"" --hadoop_home="""" --help=""false"" --hostname_lookup=""true"" --http_authenticators=""basic"" --http_command_executor=""false"" --http_credentials=""/mnt/teamcity/temp/buildTmp/CniIsolatorTest_ROOT_INTERNET_CURL_LaunchCommandTask_GcX6XI/http_credentials"" --image_providers=""docker"" --image_provisioner_backend=""copy"" --initialize_driver_logging=""true"" --isolation=""docker/runtime,filesystem/linux,network/cni"" --launcher_dir=""/mnt/teamcity/work/4240ba9ddd0997c3/build/src"" --logbufsecs=""0"" --logging_level=""INFO"" --network_cni_config_dir=""/tmp/KhgYrQ/configs"" --network_cni_plugins_dir=""/tmp/KhgYrQ/plugins"" --oversubscribed_resources_interval=""15secs"" --perf_duration=""10secs"" --perf_interval=""1mins"" --qos_correction_interval_min=""0ns"" --quiet=""false"" --recover=""reconnect"" --recovery_timeout=""15mins"" --registration_backoff_factor=""10ms"" --resources=""cpus:2;gpus:0;mem:1024;disk:1024;ports:[31000-32000]"" --revocable_cpu_low_priority=""true"" --sandbox_directory=""/mnt/mesos/sandbox"" --strict=""true"" --switch_user=""true"" --systemd_enable_support=""true"" --systemd_runtime_directory=""/run/systemd/system"" --version=""false"" --work_dir=""/mnt/teamcity/temp/buildTmp/CniIsolatorTest_ROOT_INTERNET_CURL_LaunchCommandTask_GcX6XI"" [22:41:54]W: [Step 10/10] I0619 22:41:54.396380 30914 credentials.hpp:86] Loading credential for authentication from '/mnt/teamcity/temp/buildTmp/CniIsolatorTest_ROOT_INTERNET_CURL_LaunchCommandTask_GcX6XI/credential' [22:41:54]W: [Step 10/10] I0619 22:41:54.396495 30914 slave.cpp:341] Agent using credential for: test-principal [22:41:54]W: [Step 10/10] I0619 22:41:54.396509 30914 credentials.hpp:37] Loading credentials for authentication from '/mnt/teamcity/temp/buildTmp/CniIsolatorTest_ROOT_INTERNET_CURL_LaunchCommandTask_GcX6XI/http_credentials' [22:41:54]W: [Step 10/10] I0619 22:41:54.396586 30914 slave.cpp:393] Using default 'basic' HTTP authenticator [22:41:54]W: [Step 10/10] I0619 22:41:54.396698 30914 resources.cpp:572] Parsing resources as JSON failed: cpus:2;gpus:0;mem:1024;disk:1024;ports:[31000-32000] [22:41:54]W: [Step 10/10] Trying semicolon-delimited string format instead [22:41:54]W: [Step 10/10] I0619 22:41:54.396780 30896 sched.cpp:224] Version: 1.0.0 [22:41:54]W: [Step 10/10] I0619 22:41:54.396991 30914 slave.cpp:592] Agent resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] [22:41:54]W: [Step 10/10] I0619 22:41:54.397020 30914 slave.cpp:600] Agent attributes: [ ] [22:41:54]W: [Step 10/10] I0619 22:41:54.397029 30914 slave.cpp:605] Agent hostname: ip-172-30-2-105.mesosphere.io [22:41:54]W: [Step 10/10] I0619 22:41:54.397040 30916 sched.cpp:328] New master detected at master@172.30.2.105:40724 [22:41:54]W: [Step 10/10] I0619 22:41:54.397068 30916 sched.cpp:394] Authenticating with master master@172.30.2.105:40724 [22:41:54]W: [Step 10/10] I0619 22:41:54.397078 30916 sched.cpp:401] Using default CRAM-MD5 authenticatee [22:41:54]W: [Step 10/10] I0619 22:41:54.397188 30916 authenticatee.cpp:121] Creating new client SASL connection [22:41:54]W: [Step 10/10] I0619 22:41:54.397467 30914 state.cpp:57] Recovering state from '/mnt/teamcity/temp/buildTmp/CniIsolatorTest_ROOT_INTERNET_CURL_LaunchCommandTask_GcX6XI/meta' [22:41:54]W: [Step 10/10] I0619 22:41:54.397476 30912 master.cpp:5943] Authenticating scheduler-af10d6a3-1ebc-4377-b44d-8c0dfbffcb8e@172.30.2.105:40724 [22:41:54]W: [Step 10/10] I0619 22:41:54.397544 30913 authenticator.cpp:414] Starting authentication session for crammd5_authenticatee(953)@172.30.2.105:40724 [22:41:54]W: [Step 10/10] I0619 22:41:54.397614 30915 status_update_manager.cpp:200] Recovering status update manager [22:41:54]W: [Step 10/10] I0619 22:41:54.397668 30912 authenticator.cpp:98] Creating new server SASL connection [22:41:54]W: [Step 10/10] I0619 22:41:54.397709 30915 containerizer.cpp:514] Recovering containerizer [22:41:54]W: [Step 10/10] I0619 22:41:54.397869 30912 authenticatee.cpp:213] Received SASL authentication mechanisms: CRAM-MD5 [22:41:54]W: [Step 10/10] I0619 22:41:54.397886 30912 authenticatee.cpp:239] Attempting to authenticate with mechanism 'CRAM-MD5' [22:41:54]W: [Step 10/10] I0619 22:41:54.397927 30912 authenticator.cpp:204] Received SASL authentication start [22:41:54]W: [Step 10/10] I0619 22:41:54.397964 30912 authenticator.cpp:326] Authentication requires more steps [22:41:54]W: [Step 10/10] I0619 22:41:54.398000 30912 authenticatee.cpp:259] Received SASL authentication step [22:41:54]W: [Step 10/10] I0619 22:41:54.398052 30912 authenticator.cpp:232] Received SASL authentication step [22:41:54]W: [Step 10/10] I0619 22:41:54.398066 30912 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: 'ip-172-30-2-105.mesosphere.io' server FQDN: 'ip-172-30-2-105.mesosphere.io' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false [22:41:54]W: [Step 10/10] I0619 22:41:54.398073 30912 auxprop.cpp:179] Looking up auxiliary property '*userPassword' [22:41:54]W: [Step 10/10] I0619 22:41:54.398087 30912 auxprop.cpp:179] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' [22:41:54]W: [Step 10/10] I0619 22:41:54.398098 30912 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: 'ip-172-30-2-105.mesosphere.io' server FQDN: 'ip-172-30-2-105.mesosphere.io' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true [22:41:54]W: [Step 10/10] I0619 22:41:54.398103 30912 auxprop.cpp:129] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true [22:41:54]W: [Step 10/10] I0619 22:41:54.398108 30912 auxprop.cpp:129] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true [22:41:54]W: [Step 10/10] I0619 22:41:54.398116 30912 authenticator.cpp:318] Authentication success [22:41:54]W: [Step 10/10] I0619 22:41:54.398162 30914 authenticatee.cpp:299] Authentication success [22:41:54]W: [Step 10/10] I0619 22:41:54.398181 30913 authenticator.cpp:432] Authentication session cleanup for crammd5_authenticatee(953)@172.30.2.105:40724 [22:41:54]W: [Step 10/10] I0619 22:41:54.398200 30912 master.cpp:5973] Successfully authenticated principal 'test-principal' at scheduler-af10d6a3-1ebc-4377-b44d-8c0dfbffcb8e@172.30.2.105:40724 [22:41:54]W: [Step 10/10] I0619 22:41:54.398270 30914 sched.cpp:484] Successfully authenticated with master master@172.30.2.105:40724 [22:41:54]W: [Step 10/10] I0619 22:41:54.398280 30914 sched.cpp:800] Sending SUBSCRIBE call to m...",2 MESOS-5668,"Add CGROUP namespace to linux ns helper.","Since linux kernel 4.6, CGROUP namespace is added. we need to support the handle for the cgroup namespace of the process. This also relates to two test failures on Ubuntu 16: {noformat} [22:41:26] : [Step 10/10] [ RUN ] NsTest.ROOT_setns [22:41:26] : [Step 10/10] ../../src/tests/containerizer/ns_tests.cpp:75: Failure [22:41:26] : [Step 10/10] nstype: Unknown namespace 'cgroup' [22:41:26] : [Step 10/10] [ FAILED ] NsTest.ROOT_setns (1 ms) {noformat} {noformat} [22:41:26] : [Step 10/10] [ RUN ] NsTest.ROOT_getns [22:41:26] : [Step 10/10] ../../src/tests/containerizer/ns_tests.cpp:160: Failure [22:41:26] : [Step 10/10] nstype: Unknown namespace 'cgroup' [22:41:26] : [Step 10/10] [ FAILED ] NsTest.ROOT_getns (0 ms) {noformat}",3 MESOS-5669,"CNI isolator should not return failure if /etc/hostname does not exist on host.","/etc/hostname may not necessarily exist on every system (e.g., CentOS 6). Currently CNI isolator just return a failure if it does not exist on host, because the isolator need to mount it into the container. This is fine for /etc/host and /etc/resolv.conf, but we should make an exception for /etc/hostname, because hostname may still be accessible even if /etc/hostname doesn't exist. This issue relates to 3 failure tests: {noformat} [22:45:21] : [Step 10/10] [ RUN ] CniIsolatorTest.ROOT_INTERNET_CURL_LaunchCommandTask [22:45:21]W: [Step 10/10] I0619 22:45:21.647611 24647 cluster.cpp:155] Creating default 'local' authorizer [22:45:21]W: [Step 10/10] I0619 22:45:21.655230 24647 leveldb.cpp:174] Opened db in 7.510408ms [22:45:21]W: [Step 10/10] I0619 22:45:21.657680 24647 leveldb.cpp:181] Compacted db in 2.427309ms [22:45:21]W: [Step 10/10] I0619 22:45:21.657702 24647 leveldb.cpp:196] Created db iterator in 6209ns [22:45:21]W: [Step 10/10] I0619 22:45:21.657709 24647 leveldb.cpp:202] Seeked to beginning of db in 692ns [22:45:21]W: [Step 10/10] I0619 22:45:21.657713 24647 leveldb.cpp:271] Iterated through 0 keys in the db in 431ns [22:45:21]W: [Step 10/10] I0619 22:45:21.657727 24647 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned [22:45:21]W: [Step 10/10] I0619 22:45:21.657888 24662 recover.cpp:451] Starting replica recovery [22:45:21]W: [Step 10/10] I0619 22:45:21.658051 24668 recover.cpp:477] Replica is in EMPTY status [22:45:21]W: [Step 10/10] I0619 22:45:21.658495 24664 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (18401)@172.30.2.247:42024 [22:45:21]W: [Step 10/10] I0619 22:45:21.658583 24662 recover.cpp:197] Received a recover response from a replica in EMPTY status [22:45:21]W: [Step 10/10] I0619 22:45:21.658687 24664 recover.cpp:568] Updating replica status to STARTING [22:45:21]W: [Step 10/10] I0619 22:45:21.659111 24664 master.cpp:382] Master 9a4a353b-91c5-43b9-8c37-19245c37758c (ip-172-30-2-247.mesosphere.io) started on 172.30.2.247:42024 [22:45:21]W: [Step 10/10] I0619 22:45:21.659126 24664 master.cpp:384] Flags at startup: --acls="""" --agent_ping_timeout=""15secs"" --agent_reregister_timeout=""10mins"" --allocation_interval=""1secs"" --allocator=""HierarchicalDRF"" --authenticate_agents=""true"" --authenticate_frameworks=""true"" --authenticate_http=""true"" --authenticate_http_frameworks=""true"" --authenticators=""crammd5"" --authorizers=""local"" --credentials=""/tmp/l8346Z/credentials"" --framework_sorter=""drf"" --help=""false"" --hostname_lookup=""true"" --http_authenticators=""basic"" --http_framework_authenticators=""basic"" --initialize_driver_logging=""true"" --log_auto_initialize=""true"" --logbufsecs=""0"" --logging_level=""INFO"" --max_agent_ping_timeouts=""5"" --max_completed_frameworks=""50"" --max_completed_tasks_per_framework=""1000"" --quiet=""false"" --recovery_agent_removal_limit=""100%"" --registry=""replicated_log"" --registry_fetch_timeout=""1mins"" --registry_store_timeout=""100secs"" --registry_strict=""true"" --root_submissions=""true"" --user_sorter=""drf"" --version=""false"" --webui_dir=""/usr/local/share/mesos/webui"" --work_dir=""/tmp/l8346Z/master"" --zk_session_timeout=""10secs"" [22:45:21]W: [Step 10/10] I0619 22:45:21.659267 24664 master.cpp:434] Master only allowing authenticated frameworks to register [22:45:21]W: [Step 10/10] I0619 22:45:21.659276 24664 master.cpp:448] Master only allowing authenticated agents to register [22:45:21]W: [Step 10/10] I0619 22:45:21.659278 24664 master.cpp:461] Master only allowing authenticated HTTP frameworks to register [22:45:21]W: [Step 10/10] I0619 22:45:21.659282 24664 credentials.hpp:37] Loading credentials for authentication from '/tmp/l8346Z/credentials' [22:45:21]W: [Step 10/10] I0619 22:45:21.659375 24664 master.cpp:506] Using default 'crammd5' authenticator [22:45:21]W: [Step 10/10] I0619 22:45:21.659415 24664 master.cpp:578] Using default 'basic' HTTP authenticator [22:45:21]W: [Step 10/10] I0619 22:45:21.659495 24664 master.cpp:658] Using default 'basic' HTTP framework authenticator [22:45:21]W: [Step 10/10] I0619 22:45:21.659569 24664 master.cpp:705] Authorization enabled [22:45:21]W: [Step 10/10] I0619 22:45:21.659684 24666 hierarchical.cpp:142] Initialized hierarchical allocator process [22:45:21]W: [Step 10/10] I0619 22:45:21.659696 24665 whitelist_watcher.cpp:77] No whitelist given [22:45:21]W: [Step 10/10] I0619 22:45:21.660269 24666 master.cpp:1969] The newly elected leader is master@172.30.2.247:42024 with id 9a4a353b-91c5-43b9-8c37-19245c37758c [22:45:21]W: [Step 10/10] I0619 22:45:21.660281 24666 master.cpp:1982] Elected as the leading master! [22:45:21]W: [Step 10/10] I0619 22:45:21.660290 24666 master.cpp:1669] Recovering from registrar [22:45:21]W: [Step 10/10] I0619 22:45:21.660342 24662 registrar.cpp:332] Recovering registrar [22:45:21]W: [Step 10/10] I0619 22:45:21.661232 24669 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 2.48585ms [22:45:21]W: [Step 10/10] I0619 22:45:21.661254 24669 replica.cpp:320] Persisted replica status to STARTING [22:45:21]W: [Step 10/10] I0619 22:45:21.661326 24669 recover.cpp:477] Replica is in STARTING status [22:45:21]W: [Step 10/10] I0619 22:45:21.661667 24668 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (18404)@172.30.2.247:42024 [22:45:21]W: [Step 10/10] I0619 22:45:21.661758 24665 recover.cpp:197] Received a recover response from a replica in STARTING status [22:45:21]W: [Step 10/10] I0619 22:45:21.661893 24664 recover.cpp:568] Updating replica status to VOTING [22:45:21]W: [Step 10/10] I0619 22:45:21.663851 24664 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 1.915617ms [22:45:21]W: [Step 10/10] I0619 22:45:21.663866 24664 replica.cpp:320] Persisted replica status to VOTING [22:45:21]W: [Step 10/10] I0619 22:45:21.663899 24664 recover.cpp:582] Successfully joined the Paxos group [22:45:21]W: [Step 10/10] I0619 22:45:21.663944 24664 recover.cpp:466] Recover process terminated [22:45:21]W: [Step 10/10] I0619 22:45:21.664088 24668 log.cpp:553] Attempting to start the writer [22:45:21]W: [Step 10/10] I0619 22:45:21.664556 24668 replica.cpp:493] Replica received implicit promise request from (18405)@172.30.2.247:42024 with proposal 1 [22:45:21]W: [Step 10/10] I0619 22:45:21.666551 24668 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 1.971938ms [22:45:21]W: [Step 10/10] I0619 22:45:21.666566 24668 replica.cpp:342] Persisted promised to 1 [22:45:21]W: [Step 10/10] I0619 22:45:21.666767 24667 coordinator.cpp:238] Coordinator attempting to fill missing positions [22:45:21]W: [Step 10/10] I0619 22:45:21.667230 24668 replica.cpp:388] Replica received explicit promise request from (18406)@172.30.2.247:42024 for position 0 with proposal 2 [22:45:21]W: [Step 10/10] I0619 22:45:21.669271 24668 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 2.02399ms [22:45:21]W: [Step 10/10] I0619 22:45:21.669287 24668 replica.cpp:712] Persisted action at 0 [22:45:21]W: [Step 10/10] I0619 22:45:21.669656 24669 replica.cpp:537] Replica received write request for position 0 from (18407)@172.30.2.247:42024 [22:45:21]W: [Step 10/10] I0619 22:45:21.669680 24669 leveldb.cpp:436] Reading position from leveldb took 10808ns [22:45:21]W: [Step 10/10] I0619 22:45:21.671674 24669 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 1.977316ms [22:45:21]W: [Step 10/10] I0619 22:45:21.671689 24669 replica.cpp:712] Persisted action at 0 [22:45:21]W: [Step 10/10] I0619 22:45:21.671907 24665 replica.cpp:691] Replica received learned notice for position 0 from @0.0.0.0:0 [22:45:21]W: [Step 10/10] I0619 22:45:21.673920 24665 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 1.991274ms [22:45:21]W: [Step 10/10] I0619 22:45:21.673935 24665 replica.cpp:712] Persisted action at 0 [22:45:21]W: [Step 10/10] I0619 22:45:21.673941 24665 replica.cpp:697] Replica learned NOP action at position 0 [22:45:21]W: [Step 10/10] I0619 22:45:21.674190 24665 log.cpp:569] Writer started with ending position 0 [22:45:21]W: [Step 10/10] I0619 22:45:21.674489 24663 leveldb.cpp:436] Reading position from leveldb took 9059ns [22:45:21]W: [Step 10/10] I0619 22:45:21.674718 24663 registrar.cpp:365] Successfully fetched the registry (0B) in 14.355968ms [22:45:21]W: [Step 10/10] I0619 22:45:21.674747 24663 registrar.cpp:464] Applied 1 operations in 3070ns; attempting to update the 'registry' [22:45:21]W: [Step 10/10] I0619 22:45:21.674935 24665 log.cpp:577] Attempting to append 209 bytes to the log [22:45:21]W: [Step 10/10] I0619 22:45:21.674978 24665 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 1 [22:45:21]W: [Step 10/10] I0619 22:45:21.675242 24666 replica.cpp:537] Replica received write request for position 1 from (18408)@172.30.2.247:42024 [22:45:21]W: [Step 10/10] I0619 22:45:21.677088 24666 leveldb.cpp:341] Persisting action (228 bytes) to leveldb took 1.823904ms [22:45:21]W: [Step 10/10] I0619 22:45:21.677103 24666 replica.cpp:712] Persisted action at 1 [22:45:21]W: [Step 10/10] I0619 22:45:21.677299 24667 replica.cpp:691] Replica received learned notice for position 1 from @0.0.0.0:0 [22:45:21]W: [Step 10/10] I0619 22:45:21.679270 24667 leveldb.cpp:341] Persisting action (230 bytes) to leveldb took 1.952303ms [22:45:21]W: [Step 10/10] I0619 22:45:21.679286 24667 replica.cpp:712] Persisted action at 1 [22:45:21]W: [Step 10/10] I0619 22:45:21.679291 24667 replica.cpp:697] Replica learned APPEND action at position 1 [22:45:21]W: [Step 10/10] I0619 22:45:21.679481 24663 registrar.cpp:509] Successfully updated the 'registry' in 4.715264ms [22:45:21]W: [Step 10/10] I0619 22:45:21.679503 24666 log.cpp:596] Attempting to truncate the log to 1 [22:45:21]W: [Step 10/10] I0619 22:45:21.679560 24667 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 2 [22:45:21]W: [Step 10/10] I0619 22:45:21.679581 24663 registrar.cpp:395] Successfully recovered registrar [22:45:21]W: [Step 10/10] I0619 22:45:21.679745 24664 master.cpp:1777] Recovered 0 agents from the Registry (170B) ; allowing 10mins for agents to re-register [22:45:21]W: [Step 10/10] I0619 22:45:21.679774 24662 hierarchical.cpp:169] Skipping recovery of hierarchical allocator: nothing to recover [22:45:21]W: [Step 10/10] I0619 22:45:21.679986 24662 replica.cpp:537] Replica received write request for position 2 from (18409)@172.30.2.247:42024 [22:45:21]W: [Step 10/10] I0619 22:45:21.681895 24662 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 1.891877ms [22:45:21]W: [Step 10/10] I0619 22:45:21.681910 24662 replica.cpp:712] Persisted action at 2 [22:45:21]W: [Step 10/10] I0619 22:45:21.682160 24666 replica.cpp:691] Replica received learned notice for position 2 from @0.0.0.0:0 [22:45:21]W: [Step 10/10] I0619 22:45:21.684331 24666 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 2.153217ms [22:45:21]W: [Step 10/10] I0619 22:45:21.684375 24666 leveldb.cpp:399] Deleting ~1 keys from leveldb took 26973ns [22:45:21]W: [Step 10/10] I0619 22:45:21.684383 24666 replica.cpp:712] Persisted action at 2 [22:45:21]W: [Step 10/10] I0619 22:45:21.684389 24666 replica.cpp:697] Replica learned TRUNCATE action at position 2 [22:45:21]W: [Step 10/10] I0619 22:45:21.691529 24647 containerizer.cpp:201] Using isolation: docker/runtime,filesystem/linux,network/cni [22:45:21]W: [Step 10/10] I0619 22:45:21.694491 24647 linux_launcher.cpp:101] Using /cgroup/freezer as the freezer hierarchy for the Linux launcher [22:45:21]W: [Step 10/10] E0619 22:45:21.699741 24647 shell.hpp:106] Command 'hadoop version 2>&1' failed; this is the output: [22:45:21]W: [Step 10/10] sh: hadoop: command not found [22:45:21]W: [Step 10/10] I0619 22:45:21.699769 24647 fetcher.cpp:62] Skipping URI fetcher plugin 'hadoop' as it could not be created: Failed to create HDFS client: Failed to execute 'hadoop version 2>&1'; the command was either not found or exited with a non-zero exit status: 127 [22:45:21]W: [Step 10/10] I0619 22:45:21.699823 24647 registry_puller.cpp:111] Creating registry puller with docker registry 'https://registry-1.docker.io' [22:45:21]W: [Step 10/10] I0619 22:45:21.700865 24647 linux.cpp:146] Bind mounting '/mnt/teamcity/temp/buildTmp/CniIsolatorTest_ROOT_INTERNET_CURL_LaunchCommandTask_CVAWpG' and making it a shared mount [22:45:21]W: [Step 10/10] I0619 22:45:21.707801 24647 cni.cpp:286] Bind mounting '/var/run/mesos/isolators/network/cni' and making it a shared mount [22:45:21]W: [Step 10/10] I0619 22:45:21.714337 24647 cluster.cpp:432] Creating default 'local' authorizer [22:45:21]W: [Step 10/10] I0619 22:45:21.714825 24668 slave.cpp:203] Agent started on 468)@172.30.2.247:42024 [22:45:21]W: [Step 10/10] I0619 22:45:21.714839 24668 slave.cpp:204] Flags at startup: --acls="""" --appc_simple_discovery_uri_prefix=""http://"" --appc_store_dir=""/tmp/mesos/store/appc"" --authenticate_http=""true"" --authenticatee=""crammd5"" --authorizer=""local"" --cgroups_cpu_enable_pids_and_tids_count=""false"" --cgroups_enable_cfs=""false"" --cgroups_hierarchy=""/sys/fs/cgroup"" --cgroups_limit_swap=""false"" --cgroups_root=""mesos"" --container_disk_watch_interval=""15secs"" --containerizers=""mesos"" --credential=""/mnt/teamcity/temp/buildTmp/CniIsolatorTest_ROOT_INTERNET_CURL_LaunchCommandTask_CVAWpG/credential"" --default_role=""*"" --disk_watch_interval=""1mins"" --docker=""docker"" --docker_kill_orphans=""true"" --docker_registry=""https://registry-1.docker.io"" --docker_remove_delay=""6hrs"" --docker_socket=""/var/run/docker.sock"" --docker_stop_timeout=""0ns"" --docker_store_dir=""/tmp/l8346Z/store"" --docker_volume_checkpoint_dir=""/var/run/mesos/isolators/docker/volume"" --enforce_container_disk_quota=""false"" --executor_registration_timeout=""1mins"" --executor_shutdown_grace_period=""5secs"" --fetcher_cache_dir=""/mnt/teamcity/temp/buildTmp/CniIsolatorTest_ROOT_INTERNET_CURL_LaunchCommandTask_CVAWpG/fetch"" --fetcher_cache_size=""2GB"" --frameworks_home="""" --gc_delay=""1weeks"" --gc_disk_headroom=""0.1"" --hadoop_home="""" --help=""false"" --hostname_lookup=""true"" --http_authenticators=""basic"" --http_command_executor=""false"" --http_credentials=""/mnt/teamcity/temp/buildTmp/CniIsolatorTest_ROOT_INTERNET_CURL_LaunchCommandTask_CVAWpG/http_credentials"" --image_providers=""docker"" --image_provisioner_backend=""copy"" --initialize_driver_logging=""true"" --isolation=""docker/runtime,filesystem/linux,network/cni"" --launcher_dir=""/mnt/teamcity/work/4240ba9ddd0997c3/build/src"" --logbufsecs=""0"" --logging_level=""INFO"" --network_cni_config_dir=""/tmp/l8346Z/configs"" --network_cni_plugins_dir=""/tmp/l8346Z/plugins"" --oversubscribed_resources_interval=""15secs"" --perf_duration=""10secs"" --perf_interval=""1mins"" --qos_correction_interval_min=""0ns"" --quiet=""false"" --recover=""reconnect"" --recovery_timeout=""15mins"" --registration_backoff_factor=""10ms"" --resources=""cpus:2;gpus:0;mem:1024;disk:1024;ports:[31000-32000]"" --revocable_cpu_low_priority=""true"" --sandbox_directory=""/mnt/mesos/sandbox"" --strict=""true"" --switch_user=""true"" --systemd_enable_support=""true"" --systemd_runtime_directory=""/run/systemd/system"" --version=""false"" --work_dir=""/mnt/teamcity/temp/buildTmp/CniIsolatorTest_ROOT_INTERNET_CURL_LaunchCommandTask_CVAWpG"" [22:45:21]W: [Step 10/10] I0619 22:45:21.715116 24668 credentials.hpp:86] Loading credential for authentication from '/mnt/teamcity/temp/buildTmp/CniIsolatorTest_ROOT_INTERNET_CURL_LaunchCommandTask_CVAWpG/credential' [22:45:21]W: [Step 10/10] I0619 22:45:21.715195 24668 slave.cpp:341] Agent using credential for: test-principal [22:45:21]W: [Step 10/10] I0619 22:45:21.715214 24668 credentials.hpp:37] Loading credentials for authentication from '/mnt/teamcity/temp/buildTmp/CniIsolatorTest_ROOT_INTERNET_CURL_LaunchCommandTask_CVAWpG/http_credentials' [22:45:21]W: [Step 10/10] I0619 22:45:21.715296 24668 slave.cpp:393] Using default 'basic' HTTP authenticator [22:45:21]W: [Step 10/10] I0619 22:45:21.715400 24668 resources.cpp:572] Parsing resources as JSON failed: cpus:2;gpus:0;mem:1024;disk:1024;ports:[31000-32000] {noformat} {noformat} [22:45:38] : [Step 10/10] [ RUN ] CniIsolatorTest.ROOT_VerifyCheckpointedInfo [22:45:38]W: [Step 10/10] I0619 22:45:38.459836 24647 cluster.cpp:155] Creating default 'local' authorizer [22:45:38]W: [Step 10/10] I0619 22:45:38.470319 24647 leveldb.cpp:174] Opened db in 10.34226ms [22:45:38]W: [Step 10/10] I0619 22:45:38.472771 24647 leveldb.cpp:181] Compacted db in 2.403554ms [22:45:38]W: [Step 10/10] I0619 22:45:38.472795 24647 leveldb.cpp:196] Created db iterator in 4446ns [22:45:38]W: [Step 10/10] I0619 22:45:38.472801 24647 leveldb.cpp:202] Seeked to beginning of db in 810ns [22:45:38]W: [Step 10/10] I0619 22:45:38.472806 24647 leveldb.cpp:271] Iterated through 0 keys in the db in 393ns [22:45:38]W: [Step 10/10] I0619 22:45:38.472822 24647 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned [22:45:38]W: [Step 10/10] I0619 22:45:38.473093 24665 recover.cpp:451] Starting replica recovery [22:45:38]W: [Step 10/10] I0619 22:45:38.473260 24663 recover.cpp:477] Replica is in EMPTY status [22:45:38]W: [Step 10/10] I0619 22:45:38.473647 24663 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (18464)@172.30.2.247:42024 [22:45:38]W: [Step 10/10] I0619 22:45:38.473752 24665 recover.cpp:197] Received a recover response from a replica in EMPTY status [22:45:38]W: [Step 10/10] I0619 22:45:38.473896 24667 recover.cpp:568] Updating replica status to STARTING [22:45:38]W: [Step 10/10] I0619 22:45:38.474319 24663 master.cpp:382] Master 64f1f7ac-e810-4fb1-b549-6e29fc62622b (ip-172-30-2-247.mesosphere.io) started on 172.30.2.247:42024 [22:45:38]W: [Step 10/10] I0619 22:45:38.474329 24663 master.cpp:384] Flags at startup: --acls="""" --agent_ping_timeout=""15secs"" --agent_reregister_timeout=""10mins"" --allocation_interval=""1secs"" --allocator=""HierarchicalDRF"" --authenticate_agents=""true"" --authenticate_frameworks=""true"" --authenticate_http=""true"" --authenticate_http_frameworks=""true"" --authenticators=""crammd5"" --authorizers=""local"" --credentials=""/tmp/qJWqSY/credentials"" --framework_sorter=""drf"" --help=""false"" --hostname_lookup=""true"" --http_authenticators=""basic"" --http_framework_authenticators=""basic"" --initialize_driver_logging=""true"" --log_auto_initialize=""true"" --logbufsecs=""0"" --logging_level=""INFO"" --max_agent_ping_timeouts=""5"" --max_completed_frameworks=""50"" --max_completed_tasks_per_framework=""1000"" --quiet=""false"" --recovery_agent_removal_limit=""100%"" --registry=""replicated_log"" --registry_fetch_timeout=""1mins"" --registry_store_timeout=""100secs"" --registry_strict=""true"" --root_submissions=""true"" --user_sorter=""drf"" --version=""false"" --webui_dir=""/usr/local/share/mesos/webui"" --work_dir=""/tmp/qJWqSY/master"" --zk_session_timeout=""10secs"" [22:45:38]W: [Step 10/10] I0619 22:45:38.474452 24663 master.cpp:434] Master only allowing authenticated frameworks to register [22:45:38]W: [Step 10/10] I0619 22:45:38.474457 24663 master.cpp:448] Master only allowing authenticated agents to register [22:45:38]W: [Step 10/10] I0619 22:45:38.474459 24663 master.cpp:461] Master only allowing authenticated HTTP frameworks to register [22:45:38]W: [Step 10/10] I0619 22:45:38.474463 24663 credentials.hpp:37] Loading credentials for authentication from '/tmp/qJWqSY/credentials' [22:45:38]W: [Step 10/10] I0619 22:45:38.474551 24663 master.cpp:506] Using default 'crammd5' authenticator [22:45:38]W: [Step 10/10] I0619 22:45:38.474598 24663 master.cpp:578] Using default 'basic' HTTP authenticator [22:45:38]W: [Step 10/10] I0619 22:45:38.474643 24663 master.cpp:658] Using default 'basic' HTTP framework authenticator [22:45:38]W: [Step 10/10] I0619 22:45:38.474674 2466...",3 MESOS-5670,"MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery is flaky.","{noformat} [03:36:29] : [Step 10/10] [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery [03:36:29]W: [Step 10/10] I0618 03:36:29.461802 2797 cluster.cpp:155] Creating default 'local' authorizer [03:36:29]W: [Step 10/10] I0618 03:36:29.469468 2797 leveldb.cpp:174] Opened db in 7.527163ms [03:36:29]W: [Step 10/10] I0618 03:36:29.470188 2797 leveldb.cpp:181] Compacted db in 699544ns [03:36:29]W: [Step 10/10] I0618 03:36:29.470206 2797 leveldb.cpp:196] Created db iterator in 4293ns [03:36:29]W: [Step 10/10] I0618 03:36:29.470211 2797 leveldb.cpp:202] Seeked to beginning of db in 535ns [03:36:29]W: [Step 10/10] I0618 03:36:29.470216 2797 leveldb.cpp:271] Iterated through 0 keys in the db in 321ns [03:36:29]W: [Step 10/10] I0618 03:36:29.470230 2797 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned [03:36:29]W: [Step 10/10] I0618 03:36:29.470510 2815 recover.cpp:451] Starting replica recovery [03:36:29]W: [Step 10/10] I0618 03:36:29.470592 2817 recover.cpp:477] Replica is in EMPTY status [03:36:29]W: [Step 10/10] I0618 03:36:29.471029 2813 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (19800)@172.30.2.29:37328 [03:36:29]W: [Step 10/10] I0618 03:36:29.471139 2816 recover.cpp:197] Received a recover response from a replica in EMPTY status [03:36:29]W: [Step 10/10] I0618 03:36:29.471271 2818 recover.cpp:568] Updating replica status to STARTING [03:36:29]W: [Step 10/10] I0618 03:36:29.471606 2811 master.cpp:382] Master 6d44b7c1-ac0b-4409-97df-a53fa2e39d09 (ip-172-30-2-29.mesosphere.io) started on 172.30.2.29:37328 [03:36:29]W: [Step 10/10] I0618 03:36:29.471619 2811 master.cpp:384] Flags at startup: --acls="""" --agent_ping_timeout=""15secs"" --agent_reregister_timeout=""10mins"" --allocation_interval=""1secs"" --allocator=""HierarchicalDRF"" --authenticate_agents=""true"" --authenticate_frameworks=""true"" --authenticate_http=""true"" --authenticate_http_frameworks=""true"" --authenticators=""crammd5"" --authorizers=""local"" --credentials=""/tmp/baXWq5/credentials"" --framework_sorter=""drf"" --help=""false"" --hostname_lookup=""true"" --http_authenticators=""basic"" --http_framework_authenticators=""basic"" --initialize_driver_logging=""true"" --log_auto_initialize=""true"" --logbufsecs=""0"" --logging_level=""INFO"" --max_agent_ping_timeouts=""5"" --max_completed_frameworks=""50"" --max_completed_tasks_per_framework=""1000"" --quiet=""false"" --recovery_agent_removal_limit=""100%"" --registry=""replicated_log"" --registry_fetch_timeout=""1mins"" --registry_store_timeout=""100secs"" --registry_strict=""true"" --root_submissions=""true"" --user_sorter=""drf"" --version=""false"" --webui_dir=""/usr/local/share/mesos/webui"" --work_dir=""/tmp/baXWq5/master"" --zk_session_timeout=""10secs"" [03:36:29]W: [Step 10/10] I0618 03:36:29.471745 2811 master.cpp:434] Master only allowing authenticated frameworks to register [03:36:29]W: [Step 10/10] I0618 03:36:29.471753 2811 master.cpp:448] Master only allowing authenticated agents to register [03:36:29]W: [Step 10/10] I0618 03:36:29.471757 2811 master.cpp:461] Master only allowing authenticated HTTP frameworks to register [03:36:29]W: [Step 10/10] I0618 03:36:29.471761 2811 credentials.hpp:37] Loading credentials for authentication from '/tmp/baXWq5/credentials' [03:36:29]W: [Step 10/10] I0618 03:36:29.471829 2811 master.cpp:506] Using default 'crammd5' authenticator [03:36:29]W: [Step 10/10] I0618 03:36:29.471868 2811 master.cpp:578] Using default 'basic' HTTP authenticator [03:36:29]W: [Step 10/10] I0618 03:36:29.471941 2811 master.cpp:658] Using default 'basic' HTTP framework authenticator [03:36:29]W: [Step 10/10] I0618 03:36:29.471977 2811 master.cpp:705] Authorization enabled [03:36:29]W: [Step 10/10] I0618 03:36:29.472034 2817 hierarchical.cpp:142] Initialized hierarchical allocator process [03:36:29]W: [Step 10/10] I0618 03:36:29.472038 2814 whitelist_watcher.cpp:77] No whitelist given [03:36:29]W: [Step 10/10] I0618 03:36:29.472506 2811 master.cpp:1969] The newly elected leader is master@172.30.2.29:37328 with id 6d44b7c1-ac0b-4409-97df-a53fa2e39d09 [03:36:29]W: [Step 10/10] I0618 03:36:29.472522 2811 master.cpp:1982] Elected as the leading master! [03:36:29]W: [Step 10/10] I0618 03:36:29.472527 2811 master.cpp:1669] Recovering from registrar [03:36:29]W: [Step 10/10] I0618 03:36:29.472573 2812 registrar.cpp:332] Recovering registrar [03:36:29]W: [Step 10/10] I0618 03:36:29.473511 2816 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 2.195002ms [03:36:29]W: [Step 10/10] I0618 03:36:29.473527 2816 replica.cpp:320] Persisted replica status to STARTING [03:36:29]W: [Step 10/10] I0618 03:36:29.473578 2816 recover.cpp:477] Replica is in STARTING status [03:36:29]W: [Step 10/10] I0618 03:36:29.473877 2815 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (19803)@172.30.2.29:37328 [03:36:29]W: [Step 10/10] I0618 03:36:29.473989 2814 recover.cpp:197] Received a recover response from a replica in STARTING status [03:36:29]W: [Step 10/10] I0618 03:36:29.474126 2817 recover.cpp:568] Updating replica status to VOTING [03:36:29]W: [Step 10/10] I0618 03:36:29.474735 2811 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 547332ns [03:36:29]W: [Step 10/10] I0618 03:36:29.474748 2811 replica.cpp:320] Persisted replica status to VOTING [03:36:29]W: [Step 10/10] I0618 03:36:29.474783 2811 recover.cpp:582] Successfully joined the Paxos group [03:36:29]W: [Step 10/10] I0618 03:36:29.474829 2811 recover.cpp:466] Recover process terminated [03:36:29]W: [Step 10/10] I0618 03:36:29.474969 2818 log.cpp:553] Attempting to start the writer [03:36:29]W: [Step 10/10] I0618 03:36:29.475361 2811 replica.cpp:493] Replica received implicit promise request from (19804)@172.30.2.29:37328 with proposal 1 [03:36:29]W: [Step 10/10] I0618 03:36:29.475944 2811 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 559444ns [03:36:29]W: [Step 10/10] I0618 03:36:29.475956 2811 replica.cpp:342] Persisted promised to 1 [03:36:29]W: [Step 10/10] I0618 03:36:29.476215 2815 coordinator.cpp:238] Coordinator attempting to fill missing positions [03:36:29]W: [Step 10/10] I0618 03:36:29.476660 2816 replica.cpp:388] Replica received explicit promise request from (19805)@172.30.2.29:37328 for position 0 with proposal 2 [03:36:29]W: [Step 10/10] I0618 03:36:29.477262 2816 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 584333ns [03:36:29]W: [Step 10/10] I0618 03:36:29.477273 2816 replica.cpp:712] Persisted action at 0 [03:36:29]W: [Step 10/10] I0618 03:36:29.477699 2815 replica.cpp:537] Replica received write request for position 0 from (19806)@172.30.2.29:37328 [03:36:29]W: [Step 10/10] I0618 03:36:29.477726 2815 leveldb.cpp:436] Reading position from leveldb took 8842ns [03:36:29]W: [Step 10/10] I0618 03:36:29.478277 2815 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 537361ns [03:36:29]W: [Step 10/10] I0618 03:36:29.478291 2815 replica.cpp:712] Persisted action at 0 [03:36:29]W: [Step 10/10] I0618 03:36:29.478569 2811 replica.cpp:691] Replica received learned notice for position 0 from @0.0.0.0:0 [03:36:29]W: [Step 10/10] I0618 03:36:29.479132 2811 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 545208ns [03:36:29]W: [Step 10/10] I0618 03:36:29.479146 2811 replica.cpp:712] Persisted action at 0 [03:36:29]W: [Step 10/10] I0618 03:36:29.479152 2811 replica.cpp:697] Replica learned NOP action at position 0 [03:36:29]W: [Step 10/10] I0618 03:36:29.479317 2814 log.cpp:569] Writer started with ending position 0 [03:36:29]W: [Step 10/10] I0618 03:36:29.479568 2811 leveldb.cpp:436] Reading position from leveldb took 8325ns [03:36:29]W: [Step 10/10] I0618 03:36:29.479786 2814 registrar.cpp:365] Successfully fetched the registry (0B) in 7.192064ms [03:36:29]W: [Step 10/10] I0618 03:36:29.479822 2814 registrar.cpp:464] Applied 1 operations in 3018ns; attempting to update the 'registry' [03:36:29]W: [Step 10/10] I0618 03:36:29.479995 2818 log.cpp:577] Attempting to append 205 bytes to the log [03:36:29]W: [Step 10/10] I0618 03:36:29.480044 2818 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 1 [03:36:29]W: [Step 10/10] I0618 03:36:29.480309 2811 replica.cpp:537] Replica received write request for position 1 from (19807)@172.30.2.29:37328 [03:36:29]W: [Step 10/10] I0618 03:36:29.480928 2811 leveldb.cpp:341] Persisting action (224 bytes) to leveldb took 596433ns [03:36:29]W: [Step 10/10] I0618 03:36:29.480942 2811 replica.cpp:712] Persisted action at 1 [03:36:29]W: [Step 10/10] I0618 03:36:29.481148 2815 replica.cpp:691] Replica received learned notice for position 1 from @0.0.0.0:0 [03:36:29]W: [Step 10/10] I0618 03:36:29.481710 2815 leveldb.cpp:341] Persisting action (226 bytes) to leveldb took 545656ns [03:36:29]W: [Step 10/10] I0618 03:36:29.481722 2815 replica.cpp:712] Persisted action at 1 [03:36:29]W: [Step 10/10] I0618 03:36:29.481727 2815 replica.cpp:697] Replica learned APPEND action at position 1 [03:36:29]W: [Step 10/10] I0618 03:36:29.481958 2816 registrar.cpp:509] Successfully updated the 'registry' in 2.119168ms [03:36:29]W: [Step 10/10] I0618 03:36:29.482014 2816 registrar.cpp:395] Successfully recovered registrar [03:36:29]W: [Step 10/10] I0618 03:36:29.482045 2817 log.cpp:596] Attempting to truncate the log to 1 [03:36:29]W: [Step 10/10] I0618 03:36:29.482117 2817 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 2 [03:36:29]W: [Step 10/10] I0618 03:36:29.482166 2816 master.cpp:1777] Recovered 0 agents from the Registry (166B) ; allowing 10mins for agents to re-register [03:36:29]W: [Step 10/10] I0618 03:36:29.482177 2817 hierarchical.cpp:169] Skipping recovery of hierarchical allocator: nothing to recover [03:36:29]W: [Step 10/10] I0618 03:36:29.482404 2817 replica.cpp:537] Replica received write request for position 2 from (19808)@172.30.2.29:37328 [03:36:29]W: [Step 10/10] I0618 03:36:29.482975 2817 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 552763ns [03:36:29]W: [Step 10/10] I0618 03:36:29.482986 2817 replica.cpp:712] Persisted action at 2 [03:36:29]W: [Step 10/10] I0618 03:36:29.483301 2813 replica.cpp:691] Replica received learned notice for position 2 from @0.0.0.0:0 [03:36:29]W: [Step 10/10] I0618 03:36:29.483870 2813 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 547529ns [03:36:29]W: [Step 10/10] I0618 03:36:29.483896 2813 leveldb.cpp:399] Deleting ~1 keys from leveldb took 12161ns [03:36:29]W: [Step 10/10] I0618 03:36:29.483904 2813 replica.cpp:712] Persisted action at 2 [03:36:29]W: [Step 10/10] I0618 03:36:29.483911 2813 replica.cpp:697] Replica learned TRUNCATE action at position 2 [03:36:29]W: [Step 10/10] I0618 03:36:29.492995 2797 containerizer.cpp:201] Using isolation: cgroups/mem,filesystem/posix,network/cni [03:36:29]W: [Step 10/10] I0618 03:36:29.496548 2797 linux_launcher.cpp:101] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher [03:36:29]W: [Step 10/10] I0618 03:36:29.503572 2797 cluster.cpp:432] Creating default 'local' authorizer [03:36:29]W: [Step 10/10] I0618 03:36:29.503936 2817 slave.cpp:203] Agent started on 488)@172.30.2.29:37328 [03:36:29]W: [Step 10/10] I0618 03:36:29.503952 2817 slave.cpp:204] Flags at startup: --acls="""" --appc_simple_discovery_uri_prefix=""http://"" --appc_store_dir=""/tmp/mesos/store/appc"" --authenticate_http=""true"" --authenticatee=""crammd5"" --authorizer=""local"" --cgroups_cpu_enable_pids_and_tids_count=""false"" --cgroups_enable_cfs=""false"" --cgroups_hierarchy=""/sys/fs/cgroup"" --cgroups_limit_swap=""false"" --cgroups_root=""mesos_test_ecfecccd-6714-4ec7-b5eb-a3071b772617"" --container_disk_watch_interval=""15secs"" --containerizers=""mesos"" --credential=""/mnt/teamcity/temp/buildTmp/MemoryPressureMesosTest_CGROUPS_ROOT_SlaveRecovery_MBzwwL/credential"" --default_role=""*"" --disk_watch_interval=""1mins"" --docker=""docker"" --docker_kill_orphans=""true"" --docker_registry=""https://registry-1.docker.io"" --docker_remove_delay=""6hrs"" --docker_socket=""/var/run/docker.sock"" --docker_stop_timeout=""0ns"" --docker_store_dir=""/tmp/mesos/store/docker"" --docker_volume_checkpoint_dir=""/var/run/mesos/isolators/docker/volume"" --enforce_container_disk_quota=""false"" --executor_registration_timeout=""1mins"" --executor_shutdown_grace_period=""5secs"" --fetcher_cache_dir=""/mnt/teamcity/temp/buildTmp/MemoryPressureMesosTest_CGROUPS_ROOT_SlaveRecovery_MBzwwL/fetch"" --fetcher_cache_size=""2GB"" --frameworks_home="""" --gc_delay=""1weeks"" --gc_disk_headroom=""0.1"" --hadoop_home="""" --help=""false"" --hostname_lookup=""true"" --http_authenticators=""basic"" --http_command_executor=""false"" --http_credentials=""/mnt/teamcity/temp/buildTmp/MemoryPressureMesosTest_CGROUPS_ROOT_SlaveRecovery_MBzwwL/http_credentials"" --image_provisioner_backend=""copy"" --initialize_driver_logging=""true"" --isolation=""cgroups/mem"" --launcher_dir=""/mnt/teamcity/work/4240ba9ddd0997c3/build/src"" --logbufsecs=""0"" --logging_level=""INFO"" --oversubscribed_resources_interval=""15secs"" --perf_duration=""10secs"" --perf_interval=""1mins"" --qos_correction_interval_min=""0ns"" --quiet=""false"" --recover=""reconnect"" --recovery_timeout=""15mins"" --registration_backoff_factor=""10ms"" --resources=""cpus:2;gpus:0;mem:1024;disk:1024;ports:[31000-32000]"" --revocable_cpu_low_priority=""true"" --sandbox_directory=""/mnt/mesos/sandbox"" --strict=""true"" --switch_user=""true"" --systemd_enable_support=""true"" --systemd_runtime_directory=""/run/systemd/system"" --version=""false"" --work_dir=""/mnt/teamcity/temp/buildTmp/MemoryPressureMesosTest_CGROUPS_ROOT_SlaveRecovery_MBzwwL"" [03:36:29]W: [Step 10/10] I0618 03:36:29.504148 2817 credentials.hpp:86] Loading credential for authentication from '/mnt/teamcity/temp/buildTmp/MemoryPressureMesosTest_CGROUPS_ROOT_SlaveRecovery_MBzwwL/credential' [03:36:29]W: [Step 10/10] I0618 03:36:29.504189 2817 slave.cpp:341] Agent using credential for: test-principal [03:36:29]W: [Step 10/10] I0618 03:36:29.504199 2817 credentials.hpp:37] Loading credentials for authentication from '/mnt/teamcity/temp/buildTmp/MemoryPressureMesosTest_CGROUPS_ROOT_SlaveRecovery_MBzwwL/http_credentials' [03:36:29]W: [Step 10/10] I0618 03:36:29.504245 2817 slave.cpp:393] Using default 'basic' HTTP authenticator [03:36:29]W: [Step 10/10] I0618 03:36:29.504410 2797 sched.cpp:224] Version: 1.0.0 [03:36:29]W: [Step 10/10] I0618 03:36:29.504416 2817 resources.cpp:572] Parsing resources as JSON failed: cpus:2;gpus:0;mem:1024;disk:1024;ports:[31000-32000] [03:36:29]W: [Step 10/10] Trying semicolon-delimited string format instead [03:36:29]W: [Step 10/10] I0618 03:36:29.504580 2818 sched.cpp:328] New master detected at master@172.30.2.29:37328 [03:36:29]W: [Step 10/10] I0618 03:36:29.504613 2818 sched.cpp:394] Authenticating with master master@172.30.2.29:37328 [03:36:29]W: [Step 10/10] I0618 03:36:29.504622 2818 sched.cpp:401] Using default CRAM-MD5 authenticatee [03:36:29]W: [Step 10/10] I0618 03:36:29.504649 2817 slave.cpp:592] Agent resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] [03:36:29]W: [Step 10/10] I0618 03:36:29.504673 2817 slave.cpp:600] Agent attributes: [ ] [03:36:29]W: [Step 10/10] I0618 03:36:29.504678 2817 slave.cpp:605] Agent hostname: ip-172-30-2-29.mesosphere.io [03:36:29]W: [Step 10/10] I0618 03:36:29.504703 2816 authenticatee.cpp:121] Creating new client SASL connection [03:36:29]W: [Step 10/10] I0618 03:36:29.504830 2818 master.cpp:5943] Authenticating scheduler-3e992438-052b-45f0-af6a-851091145739@172.30.2.29:37328 [03:36:29]W: [Step 10/10] I0618 03:36:29.504887 2816 authenticator.cpp:414] Starting authentication session for crammd5_authenticatee(991)@172.30.2.29:37328 [03:36:29]W: [Step 10/10] I0618 03:36:29.504982 2811 authenticator.cpp:98] Creating new server SASL connection [03:36:29]W: [Step 10/10] I0618 03:36:29.505004 2816 state.cpp:57] Recovering state from '/mnt/teamcity/temp/buildTmp/MemoryPressureMesosTest_CGROUPS_ROOT_SlaveRecovery_MBzwwL/meta' [03:36:29]W: [Step 10/10] I0618 03:36:29.505105 2813 authenticatee.cpp:213] Received SASL authentication mechanisms: CRAM-MD5 [03:36:29]W: [Step 10/10] I0618 03:36:29.505131 2813 authenticatee.cpp:239] Attempting to authenticate with mechanism 'CRAM-MD5' [03:36:29]W: [Step 10/10] I0618 03:36:29.505138 2818 status_update_manager.cpp:200] Recovering status update manager [03:36:29]W: [Step 10/10] I0618 03:36:29.505167 2813 authenticator.cpp:204] Received SASL authentication start [03:36:29]W: [Step 10/10] I0618 03:36:29.505200 2813 authenticator.cpp:326] Authentication requires more steps [03:36:29]W: [Step 10/10] I0618 03:36:29.505200 2814 containerizer.cpp:514] Recovering containerizer [03:36:29]W: [Step 10/10] I0618 03:36:29.505241 2813 authenticatee.cpp:259] Received SASL authentication step [03:36:29]W: [Step 10/10] I0618 03:36:29.505300 2812 authenticator.cpp:232] Received SASL authentication step [03:36:29]W: [Step 10/10] I0618 03:36:29.505317 2812 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: 'ip-172-30-2-29.mesosphere.io' server FQDN: 'ip-172-30-2-29.mesosphere.io' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false [03:36:29]W: [Step 10/10] I0618 03:36:29.505323 2812 auxprop.cpp:179] Looking up auxiliary property '*userPassword' [03:36:29]W: [Step 10/10] I0618 03:36:29.505331 2812 auxprop.cpp:179] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' [03:36:29]W: [Step 10/10] I0618 03:36:29.505337 2812 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: 'ip-172-30-2-29.mesosphere.io' server FQDN: 'ip-172-30-2-29.mesosphere.io' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true [03:36:29]W: [Step 10/10] I0618 03:36:29.505342 2812 auxprop.cpp:129] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true [03:36:29]W: [Step 10/10] I0618 03:36:29.505347 2812 auxprop.cpp:129] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true [03:36:29]W: [Step 10/10] I0618 03:36:29.505355 2812 authenticator.cpp:318] Authentication success [03:36:29]W: [Step 10/10] I0618 03:36:29.505399 2813 authenticatee.cpp:299] Authentication success [03:36:29]W: [Step 10/10] I0618 03:36:29.505421 2811 authenticator.cpp:432] Authentication session cleanup for crammd5_authenticatee(991)@172.30.2.29:37328 [03:36:29]W: [Step 10/10] I0618 03:36:29.505436 2812 master.cpp:5973] Successfully authenticated principal 'test-principal' at scheduler-3e992438-052b-45f0-af6a-851091145739@172.30.2.29:37328 [03:36:29]W: [Step 10/10] I0618 03:36:29.505534 2816 sched.cpp:484] Successfully authenticated with master master@172.30.2.29:37328 [03:36:29]W: [Step 10/10] I0618 03:36:29.505553 2816 sched.cpp:800] Sending SUBSCRIBE call to master@172.30.2.29:37328 [03:36:29]W: [Step 10/10] I0618 03:36:29.505591 2816 sched.cpp:833] Will retry registration in 11.319315ms if necessary [03:36:29]W: [Step 10/10] I0618 03:36:29.505672 2815 master.cpp:2539] Received SUBSCRIBE call for framework 'default' at scheduler-3e992438-052b-45f0-af6a-851091145739@172.30.2.29:37328 [03:36:29]W: [Step 10/10] I0618 03:36:29.505702 2815 master.cpp:2008] Authorizing framework principal 'test-principal' to receive offers for role '*' [03:36:29]W: [Step 10/10] I0618 03:36:29.505854 2818 master.cpp:2615] Subscribing framework default with checkpointing enabled and capabilities [ ] [03:36:29]W: [Step 10/10] I0618 03:36:29.506031 2818 sched.cpp:723] Framework registered with 6d44b7c1-ac0b-4409-97df-a53fa2e39d09-0000 [03:36:29]W: [...",2 MESOS-5671,"MemoryPressureMesosTest.CGROUPS_ROOT_Statistics is flaky.","{noformat} [00:48:29] : [Step 10/10] [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics [00:48:29]W: [Step 10/10] 1+0 records in [00:48:29]W: [Step 10/10] 1+0 records out [00:48:29]W: [Step 10/10] 1048576 bytes (1.0 MB) copied, 0.000517638 s, 2.0 GB/s [00:48:30]W: [Step 10/10] I0617 00:48:30.000998 25413 cluster.cpp:155] Creating default 'local' authorizer [00:48:30]W: [Step 10/10] I0617 00:48:30.020459 25413 leveldb.cpp:174] Opened db in 19.338463ms [00:48:30]W: [Step 10/10] I0617 00:48:30.022897 25413 leveldb.cpp:181] Compacted db in 2.416906ms [00:48:30]W: [Step 10/10] I0617 00:48:30.022919 25413 leveldb.cpp:196] Created db iterator in 4037ns [00:48:30]W: [Step 10/10] I0617 00:48:30.022927 25413 leveldb.cpp:202] Seeked to beginning of db in 769ns [00:48:30]W: [Step 10/10] I0617 00:48:30.022932 25413 leveldb.cpp:271] Iterated through 0 keys in the db in 390ns [00:48:30]W: [Step 10/10] I0617 00:48:30.022944 25413 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned [00:48:30]W: [Step 10/10] I0617 00:48:30.023272 25432 recover.cpp:451] Starting replica recovery [00:48:30]W: [Step 10/10] I0617 00:48:30.023425 25434 recover.cpp:477] Replica is in EMPTY status [00:48:30]W: [Step 10/10] I0617 00:48:30.023748 25434 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (19361)@172.30.2.56:53790 [00:48:30]W: [Step 10/10] I0617 00:48:30.023849 25429 recover.cpp:197] Received a recover response from a replica in EMPTY status [00:48:30]W: [Step 10/10] I0617 00:48:30.024019 25435 recover.cpp:568] Updating replica status to STARTING [00:48:30]W: [Step 10/10] I0617 00:48:30.024338 25432 master.cpp:382] Master 0e92ffa4-4f26-4cea-84d3-9c67612de1bd (ip-172-30-2-56.mesosphere.io) started on 172.30.2.56:53790 [00:48:30]W: [Step 10/10] I0617 00:48:30.024348 25432 master.cpp:384] Flags at startup: --acls="""" --agent_ping_timeout=""15secs"" --agent_reregister_timeout=""10mins"" --allocation_interval=""1secs"" --allocator=""HierarchicalDRF"" --authenticate_agents=""true"" --authenticate_frameworks=""true"" --authenticate_http=""true"" --authenticate_http_frameworks=""true"" --authenticators=""crammd5"" --authorizers=""local"" --credentials=""/tmp/jBjY5p/credentials"" --framework_sorter=""drf"" --help=""false"" --hostname_lookup=""true"" --http_authenticators=""basic"" --http_framework_authenticators=""basic"" --initialize_driver_logging=""true"" --log_auto_initialize=""true"" --logbufsecs=""0"" --logging_level=""INFO"" --max_agent_ping_timeouts=""5"" --max_completed_frameworks=""50"" --max_completed_tasks_per_framework=""1000"" --quiet=""false"" --recovery_agent_removal_limit=""100%"" --registry=""replicated_log"" --registry_fetch_timeout=""1mins"" --registry_store_timeout=""100secs"" --registry_strict=""true"" --root_submissions=""true"" --user_sorter=""drf"" --version=""false"" --webui_dir=""/usr/local/share/mesos/webui"" --work_dir=""/tmp/jBjY5p/master"" --zk_session_timeout=""10secs"" [00:48:30]W: [Step 10/10] I0617 00:48:30.024502 25432 master.cpp:434] Master only allowing authenticated frameworks to register [00:48:30]W: [Step 10/10] I0617 00:48:30.024508 25432 master.cpp:448] Master only allowing authenticated agents to register [00:48:30]W: [Step 10/10] I0617 00:48:30.024513 25432 master.cpp:461] Master only allowing authenticated HTTP frameworks to register [00:48:30]W: [Step 10/10] I0617 00:48:30.024516 25432 credentials.hpp:37] Loading credentials for authentication from '/tmp/jBjY5p/credentials' [00:48:30]W: [Step 10/10] I0617 00:48:30.024603 25432 master.cpp:506] Using default 'crammd5' authenticator [00:48:30]W: [Step 10/10] I0617 00:48:30.024644 25432 master.cpp:578] Using default 'basic' HTTP authenticator [00:48:30]W: [Step 10/10] I0617 00:48:30.024701 25432 master.cpp:658] Using default 'basic' HTTP framework authenticator [00:48:30]W: [Step 10/10] I0617 00:48:30.024770 25432 master.cpp:705] Authorization enabled [00:48:30]W: [Step 10/10] I0617 00:48:30.024883 25435 whitelist_watcher.cpp:77] No whitelist given [00:48:30]W: [Step 10/10] I0617 00:48:30.024885 25434 hierarchical.cpp:142] Initialized hierarchical allocator process [00:48:30]W: [Step 10/10] I0617 00:48:30.025539 25433 master.cpp:1969] The newly elected leader is master@172.30.2.56:53790 with id 0e92ffa4-4f26-4cea-84d3-9c67612de1bd [00:48:30]W: [Step 10/10] I0617 00:48:30.025555 25433 master.cpp:1982] Elected as the leading master! [00:48:30]W: [Step 10/10] I0617 00:48:30.025560 25433 master.cpp:1669] Recovering from registrar [00:48:30]W: [Step 10/10] I0617 00:48:30.025611 25432 registrar.cpp:332] Recovering registrar [00:48:30]W: [Step 10/10] I0617 00:48:30.026397 25431 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 2.288187ms [00:48:30]W: [Step 10/10] I0617 00:48:30.026438 25431 replica.cpp:320] Persisted replica status to STARTING [00:48:30]W: [Step 10/10] I0617 00:48:30.026486 25431 recover.cpp:477] Replica is in STARTING status [00:48:30]W: [Step 10/10] I0617 00:48:30.026793 25432 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (19364)@172.30.2.56:53790 [00:48:30]W: [Step 10/10] I0617 00:48:30.026897 25429 recover.cpp:197] Received a recover response from a replica in STARTING status [00:48:30]W: [Step 10/10] I0617 00:48:30.027031 25428 recover.cpp:568] Updating replica status to VOTING [00:48:30]W: [Step 10/10] I0617 00:48:30.028960 25432 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 1.874668ms [00:48:30]W: [Step 10/10] I0617 00:48:30.028975 25432 replica.cpp:320] Persisted replica status to VOTING [00:48:30]W: [Step 10/10] I0617 00:48:30.029007 25432 recover.cpp:582] Successfully joined the Paxos group [00:48:30]W: [Step 10/10] I0617 00:48:30.029047 25432 recover.cpp:466] Recover process terminated [00:48:30]W: [Step 10/10] I0617 00:48:30.029209 25430 log.cpp:553] Attempting to start the writer [00:48:30]W: [Step 10/10] I0617 00:48:30.029614 25429 replica.cpp:493] Replica received implicit promise request from (19365)@172.30.2.56:53790 with proposal 1 [00:48:30]W: [Step 10/10] I0617 00:48:30.031486 25429 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 1.850474ms [00:48:30]W: [Step 10/10] I0617 00:48:30.031502 25429 replica.cpp:342] Persisted promised to 1 [00:48:30]W: [Step 10/10] I0617 00:48:30.031726 25431 coordinator.cpp:238] Coordinator attempting to fill missing positions [00:48:30]W: [Step 10/10] I0617 00:48:30.032245 25428 replica.cpp:388] Replica received explicit promise request from (19366)@172.30.2.56:53790 for position 0 with proposal 2 [00:48:30]W: [Step 10/10] I0617 00:48:30.034101 25428 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 1.831441ms [00:48:30]W: [Step 10/10] I0617 00:48:30.034117 25428 replica.cpp:712] Persisted action at 0 [00:48:30]W: [Step 10/10] I0617 00:48:30.034561 25433 replica.cpp:537] Replica received write request for position 0 from (19367)@172.30.2.56:53790 [00:48:30]W: [Step 10/10] I0617 00:48:30.034589 25433 leveldb.cpp:436] Reading position from leveldb took 10586ns [00:48:30]W: [Step 10/10] I0617 00:48:30.036419 25433 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 1.817267ms [00:48:30]W: [Step 10/10] I0617 00:48:30.036434 25433 replica.cpp:712] Persisted action at 0 [00:48:30]W: [Step 10/10] I0617 00:48:30.036679 25429 replica.cpp:691] Replica received learned notice for position 0 from @0.0.0.0:0 [00:48:30]W: [Step 10/10] I0617 00:48:30.038661 25429 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 1.96521ms [00:48:30]W: [Step 10/10] I0617 00:48:30.038677 25429 replica.cpp:712] Persisted action at 0 [00:48:30]W: [Step 10/10] I0617 00:48:30.038682 25429 replica.cpp:697] Replica learned NOP action at position 0 [00:48:30]W: [Step 10/10] I0617 00:48:30.038839 25435 log.cpp:569] Writer started with ending position 0 [00:48:30]W: [Step 10/10] I0617 00:48:30.039198 25433 leveldb.cpp:436] Reading position from leveldb took 10572ns [00:48:30]W: [Step 10/10] I0617 00:48:30.039412 25433 registrar.cpp:365] Successfully fetched the registry (0B) in 13.778944ms [00:48:30]W: [Step 10/10] I0617 00:48:30.039448 25433 registrar.cpp:464] Applied 1 operations in 4778ns; attempting to update the 'registry' [00:48:30]W: [Step 10/10] I0617 00:48:30.039643 25428 log.cpp:577] Attempting to append 205 bytes to the log [00:48:30]W: [Step 10/10] I0617 00:48:30.039696 25432 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 1 [00:48:30]W: [Step 10/10] I0617 00:48:30.039945 25430 replica.cpp:537] Replica received write request for position 1 from (19368)@172.30.2.56:53790 [00:48:30]W: [Step 10/10] I0617 00:48:30.041738 25430 leveldb.cpp:341] Persisting action (224 bytes) to leveldb took 1.771112ms [00:48:30]W: [Step 10/10] I0617 00:48:30.041754 25430 replica.cpp:712] Persisted action at 1 [00:48:30]W: [Step 10/10] I0617 00:48:30.041977 25432 replica.cpp:691] Replica received learned notice for position 1 from @0.0.0.0:0 [00:48:30]W: [Step 10/10] I0617 00:48:30.043805 25432 leveldb.cpp:341] Persisting action (226 bytes) to leveldb took 1.810425ms [00:48:30]W: [Step 10/10] I0617 00:48:30.043820 25432 replica.cpp:712] Persisted action at 1 [00:48:30]W: [Step 10/10] I0617 00:48:30.043825 25432 replica.cpp:697] Replica learned APPEND action at position 1 [00:48:30]W: [Step 10/10] I0617 00:48:30.044040 25430 registrar.cpp:509] Successfully updated the 'registry' in 4.556032ms [00:48:30]W: [Step 10/10] I0617 00:48:30.044100 25430 registrar.cpp:395] Successfully recovered registrar [00:48:30]W: [Step 10/10] I0617 00:48:30.044124 25428 log.cpp:596] Attempting to truncate the log to 1 [00:48:30]W: [Step 10/10] I0617 00:48:30.044215 25431 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 2 [00:48:30]W: [Step 10/10] I0617 00:48:30.044244 25430 master.cpp:1777] Recovered 0 agents from the Registry (166B) ; allowing 10mins for agents to re-register [00:48:30]W: [Step 10/10] I0617 00:48:30.044317 25433 hierarchical.cpp:169] Skipping recovery of hierarchical allocator: nothing to recover [00:48:30]W: [Step 10/10] I0617 00:48:30.044497 25433 replica.cpp:537] Replica received write request for position 2 from (19369)@172.30.2.56:53790 [00:48:30]W: [Step 10/10] I0617 00:48:30.046368 25433 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 1.851883ms [00:48:30]W: [Step 10/10] I0617 00:48:30.046383 25433 replica.cpp:712] Persisted action at 2 [00:48:30]W: [Step 10/10] I0617 00:48:30.046583 25430 replica.cpp:691] Replica received learned notice for position 2 from @0.0.0.0:0 [00:48:30]W: [Step 10/10] I0617 00:48:30.048426 25430 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 1.821628ms [00:48:30]W: [Step 10/10] I0617 00:48:30.048455 25430 leveldb.cpp:399] Deleting ~1 keys from leveldb took 14283ns [00:48:30]W: [Step 10/10] I0617 00:48:30.048463 25430 replica.cpp:712] Persisted action at 2 [00:48:30]W: [Step 10/10] I0617 00:48:30.048468 25430 replica.cpp:697] Replica learned TRUNCATE action at position 2 [00:48:30]W: [Step 10/10] I0617 00:48:30.055145 25413 containerizer.cpp:203] Using isolation: cgroups/mem,filesystem/posix,network/cni [00:48:30]W: [Step 10/10] I0617 00:48:30.058349 25413 linux_launcher.cpp:101] Using /cgroup/freezer as the freezer hierarchy for the Linux launcher [00:48:30]W: [Step 10/10] I0617 00:48:30.069301 25413 cluster.cpp:432] Creating default 'local' authorizer [00:48:30]W: [Step 10/10] I0617 00:48:30.069707 25431 slave.cpp:203] Agent started on 485)@172.30.2.56:53790 [00:48:30]W: [Step 10/10] I0617 00:48:30.069718 25431 slave.cpp:204] Flags at startup: --acls="""" --appc_simple_discovery_uri_prefix=""http://"" --appc_store_dir=""/tmp/mesos/store/appc"" --authenticate_http=""true"" --authenticatee=""crammd5"" --authorizer=""local"" --cgroups_cpu_enable_pids_and_tids_count=""false"" --cgroups_enable_cfs=""false"" --cgroups_hierarchy=""/cgroup"" --cgroups_limit_swap=""false"" --cgroups_root=""mesos_test_d7ff4961-cb6d-4d51-bb21-10129a5c5572"" --container_disk_watch_interval=""15secs"" --containerizers=""mesos"" --credential=""/mnt/teamcity/temp/buildTmp/MemoryPressureMesosTest_CGROUPS_ROOT_Statistics_AF5X0p/credential"" --default_role=""*"" --disk_watch_interval=""1mins"" --docker=""docker"" --docker_kill_orphans=""true"" --docker_registry=""https://registry-1.docker.io"" --docker_remove_delay=""6hrs"" --docker_socket=""/var/run/docker.sock"" --docker_stop_timeout=""0ns"" --docker_store_dir=""/tmp/mesos/store/docker"" --docker_volume_checkpoint_dir=""/var/run/mesos/isolators/docker/volume"" --enforce_container_disk_quota=""false"" --executor_registration_timeout=""1mins"" --executor_shutdown_grace_period=""5secs"" --fetcher_cache_dir=""/mnt/teamcity/temp/buildTmp/MemoryPressureMesosTest_CGROUPS_ROOT_Statistics_AF5X0p/fetch"" --fetcher_cache_size=""2GB"" --frameworks_home="""" --gc_delay=""1weeks"" --gc_disk_headroom=""0.1"" --hadoop_home="""" --help=""false"" --hostname_lookup=""true"" --http_authenticators=""basic"" --http_command_executor=""false"" --http_credentials=""/mnt/teamcity/temp/buildTmp/MemoryPressureMesosTest_CGROUPS_ROOT_Statistics_AF5X0p/http_credentials"" --image_provisioner_backend=""copy"" --initialize_driver_logging=""true"" --isolation=""cgroups/mem"" --launcher_dir=""/mnt/teamcity/work/4240ba9ddd0997c3/build/src"" --logbufsecs=""0"" --logging_level=""INFO"" --oversubscribed_resources_interval=""15secs"" --perf_duration=""10secs"" --perf_interval=""1mins"" --qos_correction_interval_min=""0ns"" --quiet=""false"" --recover=""reconnect"" --recovery_timeout=""15mins"" --registration_backoff_factor=""10ms"" --resources=""cpus:2;gpus:0;mem:1024;disk:1024;ports:[31000-32000]"" --revocable_cpu_low_priority=""true"" --sandbox_directory=""/mnt/mesos/sandbox"" --strict=""true"" --switch_user=""true"" --systemd_enable_support=""true"" --systemd_runtime_directory=""/run/systemd/system"" --version=""false"" --work_dir=""/mnt/teamcity/temp/buildTmp/MemoryPressureMesosTest_CGROUPS_ROOT_Statistics_AF5X0p"" [00:48:30]W: [Step 10/10] I0617 00:48:30.069916 25431 credentials.hpp:86] Loading credential for authentication from '/mnt/teamcity/temp/buildTmp/MemoryPressureMesosTest_CGROUPS_ROOT_Statistics_AF5X0p/credential' [00:48:30]W: [Step 10/10] I0617 00:48:30.069967 25431 slave.cpp:341] Agent using credential for: test-principal [00:48:30]W: [Step 10/10] I0617 00:48:30.069984 25431 credentials.hpp:37] Loading credentials for authentication from '/mnt/teamcity/temp/buildTmp/MemoryPressureMesosTest_CGROUPS_ROOT_Statistics_AF5X0p/http_credentials' [00:48:30]W: [Step 10/10] I0617 00:48:30.070050 25431 slave.cpp:393] Using default 'basic' HTTP authenticator [00:48:30]W: [Step 10/10] I0617 00:48:30.070127 25431 resources.cpp:572] Parsing resources as JSON failed: cpus:2;gpus:0;mem:1024;disk:1024;ports:[31000-32000] [00:48:30]W: [Step 10/10] Trying semicolon-delimited string format instead [00:48:30]W: [Step 10/10] I0617 00:48:30.070282 25431 slave.cpp:592] Agent resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] [00:48:30]W: [Step 10/10] I0617 00:48:30.070309 25431 slave.cpp:600] Agent attributes: [ ] [00:48:30]W: [Step 10/10] I0617 00:48:30.070314 25431 slave.cpp:605] Agent hostname: ip-172-30-2-56.mesosphere.io [00:48:30]W: [Step 10/10] I0617 00:48:30.070484 25413 sched.cpp:224] Version: 1.0.0 [00:48:30]W: [Step 10/10] I0617 00:48:30.070667 25433 sched.cpp:328] New master detected at master@172.30.2.56:53790 [00:48:30]W: [Step 10/10] I0617 00:48:30.070711 25429 state.cpp:57] Recovering state from '/mnt/teamcity/temp/buildTmp/MemoryPressureMesosTest_CGROUPS_ROOT_Statistics_AF5X0p/meta' [00:48:30]W: [Step 10/10] I0617 00:48:30.070749 25433 sched.cpp:394] Authenticating with master master@172.30.2.56:53790 [00:48:30]W: [Step 10/10] I0617 00:48:30.070758 25433 sched.cpp:401] Using default CRAM-MD5 authenticatee [00:48:30]W: [Step 10/10] I0617 00:48:30.070793 25430 status_update_manager.cpp:200] Recovering status update manager [00:48:30]W: [Step 10/10] I0617 00:48:30.070904 25432 authenticatee.cpp:121] Creating new client SASL connection [00:48:30]W: [Step 10/10] I0617 00:48:30.070914 25430 containerizer.cpp:518] Recovering containerizer [00:48:30]W: [Step 10/10] I0617 00:48:30.071049 25432 master.cpp:5943] Authenticating scheduler-21f8a988-6288-4ec1-9d6a-b66ae746896a@172.30.2.56:53790 [00:48:30]W: [Step 10/10] I0617 00:48:30.071105 25428 authenticator.cpp:414] Starting authentication session for crammd5_authenticatee(984)@172.30.2.56:53790 [00:48:30]W: [Step 10/10] I0617 00:48:30.071164 25434 authenticator.cpp:98] Creating new server SASL connection [00:48:30]W: [Step 10/10] I0617 00:48:30.071241 25434 authenticatee.cpp:213] Received SASL authentication mechanisms: CRAM-MD5 [00:48:30]W: [Step 10/10] I0617 00:48:30.071254 25434 authenticatee.cpp:239] Attempting to authenticate with mechanism 'CRAM-MD5' [00:48:30]W: [Step 10/10] I0617 00:48:30.071292 25434 authenticator.cpp:204] Received SASL authentication start [00:48:30]W: [Step 10/10] I0617 00:48:30.071336 25434 authenticator.cpp:326] Authentication requires more steps [00:48:30]W: [Step 10/10] I0617 00:48:30.071374 25434 authenticatee.cpp:259] Received SASL authentication step [00:48:30]W: [Step 10/10] I0617 00:48:30.071553 25434 authenticator.cpp:232] Received SASL authentication step [00:48:30]W: [Step 10/10] I0617 00:48:30.071574 25434 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: 'ip-172-30-2-56' server FQDN: 'ip-172-30-2-56' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false [00:48:30]W: [Step 10/10] I0617 00:48:30.071586 25434 auxprop.cpp:179] Looking up auxiliary property '*userPassword' [00:48:30]W: [Step 10/10] I0617 00:48:30.071594 25434 auxprop.cpp:179] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' [00:48:30]W: [Step 10/10] I0617 00:48:30.071604 25434 auxprop.cpp:107] Request to lookup properties for user: 'test-principal' realm: 'ip-172-30-2-56' server FQDN: 'ip-172-30-2-56' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true [00:48:30]W: [Step 10/10] I0617 00:48:30.071615 25434 auxprop.cpp:129] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true [00:48:30]W: [Step 10/10] I0617 00:48:30.071619 25434 auxprop.cpp:129] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true [00:48:30]W: [Step 10/10] I0617 00:48:30.071630 25434 authenticator.cpp:318] Authentication success [00:48:30]W: [Step 10/10] I0617 00:48:30.071684 25428 authenticator.cpp:432] Authentication session cleanup for crammd5_authenticatee(984)@172.30.2.56:53790 [00:48:30]W: [Step 10/10] I0617 00:48:30.071687 25431 authenticatee.cpp:299] Authentication success [00:48:30]W: [Step 10/10] I0617 00:48:30.071704 25434 master.cpp:5973] Successfully authenticated principal 'test-principal' at scheduler-21f8a988-6288-4ec1-9d6a-b66ae746896a@172.30.2.56:53790 [00:48:30]W: [Step 10/10] I0617 00:48:30.071826 25431 sched.cpp:484] Successfully authenticated with master master@172.30.2.56:53790 [00:48:30]W: [Step 10/10] I0617 00:48:30.071841 25431 sched.cpp:800] Sending SUBSCRIBE call to master@172.30.2.56:53790 [00:48:30]W: [Step 10/10] I0617 00:48:30.071954 25431 sched.cpp:833] Will retry registration in 731.385085ms if necessary [00:48:30]W: [Step 10/10] I0617 00:48:30.071996 25434 master.cpp:2539] Received SUBSCRIBE call for framework 'default' at scheduler-21f8a988-6288-4ec1-9d6a-b66ae746896a@172.30.2.56:53790 [00:48:30]W: [Step 10/10] I0617 00:48:30.072013 25434 master.cpp:2008] Authorizing framework principal 'test-principal' to receive offers for role '*' [00:48:30]W: [Step 10/10] I0617 00:48:30.072180 25430 master.cpp:2615] Subscribing framework default with checkpointing disabled and capabilities [ ] [00:48:30]W: [Step 10/10] I0617 00:48:30.072305 25429 hierarchical.cpp:264] Added framework 0e92ffa4-4f26-4cea-84d3-9c67612de1bd-000...",2 MESOS-5673,"Port mapping isolator may cause segfault if it bind mount root does not exist.","A check is needed for port mapping isolator for its bind mount root. Otherwise, non-existed port-mapping bind mount root may cause segmentation fault for some cases. Here is the test log: {noformat} [00:57:42] : [Step 10/10] [----------] 11 tests from PortMappingIsolatorTest [00:57:42] : [Step 10/10] [ RUN ] PortMappingIsolatorTest.ROOT_NC_ContainerToContainerTCP [00:57:42]W: [Step 10/10] I0604 00:57:42.723029 24841 port_mapping_tests.cpp:229] Using eth0 as the public interface [00:57:42]W: [Step 10/10] I0604 00:57:42.723348 24841 port_mapping_tests.cpp:237] Using lo as the loopback interface [00:57:42]W: [Step 10/10] I0604 00:57:42.735090 24841 resources.cpp:572] Parsing resources as JSON failed: cpus:2;mem:1024;disk:1024;ephemeral_ports:[30001-30999];ports:[31000-32000] [00:57:42]W: [Step 10/10] Trying semicolon-delimited string format instead [00:57:42]W: [Step 10/10] I0604 00:57:42.736006 24841 port_mapping.cpp:1557] Using eth0 as the public interface [00:57:42]W: [Step 10/10] I0604 00:57:42.736331 24841 port_mapping.cpp:1582] Using lo as the loopback interface [00:57:42]W: [Step 10/10] I0604 00:57:42.737501 24841 port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh3 = '1024' [00:57:42]W: [Step 10/10] I0604 00:57:42.737545 24841 port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh1 = '128' [00:57:42]W: [Step 10/10] I0604 00:57:42.737578 24841 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_wmem = '4096 16384 4194304' [00:57:42]W: [Step 10/10] I0604 00:57:42.737608 24841 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_synack_retries = '5' [00:57:42]W: [Step 10/10] I0604 00:57:42.737637 24841 port_mapping.cpp:1869] /proc/sys/net/core/rmem_max = '212992' [00:57:42]W: [Step 10/10] I0604 00:57:42.737666 24841 port_mapping.cpp:1869] /proc/sys/net/core/somaxconn = '128' [00:57:42]W: [Step 10/10] I0604 00:57:42.737694 24841 port_mapping.cpp:1869] /proc/sys/net/core/wmem_max = '212992' [00:57:42]W: [Step 10/10] I0604 00:57:42.737720 24841 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_rmem = '4096 87380 6291456' [00:57:42]W: [Step 10/10] I0604 00:57:42.737746 24841 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_time = '7200' [00:57:42]W: [Step 10/10] I0604 00:57:42.737772 24841 port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh2 = '512' [00:57:42]W: [Step 10/10] I0604 00:57:42.737798 24841 port_mapping.cpp:1869] /proc/sys/net/core/netdev_max_backlog = '1000' [00:57:42]W: [Step 10/10] I0604 00:57:42.737828 24841 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_intvl = '75' [00:57:42]W: [Step 10/10] I0604 00:57:42.737854 24841 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_probes = '9' [00:57:42]W: [Step 10/10] I0604 00:57:42.737879 24841 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_max_syn_backlog = '512' [00:57:42]W: [Step 10/10] I0604 00:57:42.737905 24841 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_retries2 = '15' [00:57:42]W: [Step 10/10] F0604 00:57:42.737968 24841 port_mapping_tests.cpp:448] CHECK_SOME(isolator): Failed to get realpath for bind mount root '/var/run/netns': Not found [00:57:42]W: [Step 10/10] *** Check failure stack trace: *** [00:57:42]W: [Step 10/10] @ 0x7f8bd52583d2 google::LogMessage::Fail() [00:57:42]W: [Step 10/10] @ 0x7f8bd525832b google::LogMessage::SendToLog() [00:57:42]W: [Step 10/10] @ 0x7f8bd5257d21 google::LogMessage::Flush() [00:57:42]W: [Step 10/10] @ 0x7f8bd525ab92 google::LogMessageFatal::~LogMessageFatal() [00:57:42]W: [Step 10/10] @ 0xa62171 _CheckFatal::~_CheckFatal() [00:57:42]W: [Step 10/10] @ 0x1931b17 mesos::internal::tests::PortMappingIsolatorTest_ROOT_NC_ContainerToContainerTCP_Test::TestBody() [00:57:42]W: [Step 10/10] @ 0x19e17b6 testing::internal::HandleSehExceptionsInMethodIfSupported<>() [00:57:42]W: [Step 10/10] @ 0x19dc864 testing::internal::HandleExceptionsInMethodIfSupported<>() [00:57:42]W: [Step 10/10] @ 0x19bd2ae testing::Test::Run() [00:57:42]W: [Step 10/10] @ 0x19bda66 testing::TestInfo::Run() [00:57:42]W: [Step 10/10] @ 0x19be0b7 testing::TestCase::Run() [00:57:42]W: [Step 10/10] @ 0x19c4bf5 testing::internal::UnitTestImpl::RunAllTests() [00:57:42]W: [Step 10/10] @ 0x19e247d testing::internal::HandleSehExceptionsInMethodIfSupported<>() [00:57:42]W: [Step 10/10] @ 0x19dd3a4 testing::internal::HandleExceptionsInMethodIfSupported<>() [00:57:42]W: [Step 10/10] @ 0x19c38d1 testing::UnitTest::Run() [00:57:42]W: [Step 10/10] @ 0xfd28cb RUN_ALL_TESTS() [00:57:42]W: [Step 10/10] @ 0xfd24b1 main [00:57:42]W: [Step 10/10] @ 0x7f8bceb89580 __libc_start_main [00:57:42]W: [Step 10/10] @ 0xa607c9 _start [00:57:43]W: [Step 10/10] /mnt/teamcity/temp/agentTmp/custom_script659125926639545396: line 3: 24841 Aborted (core dumped) GLOG_v=1 ./bin/mesos-tests.sh --verbose --gtest_filter=""$GTEST_FILTER"" [00:57:43]W: [Step 10/10] Process exited with code 134 {noformat}",3 MESOS-5674,"Port mapping isolator may fail in 'isolate' method.","Port mapping isolator may return failure in isolate method, if a symlink to the network namespace handle using that ContainerId already existed. We should overwrite the symlink if it exist. This affects a couple test failures: {noformat} PortMappingIsolatorTest.ROOT_TooManyContainers PortMappingIsolatorTest.ROOT_ContainerARPExternal PortMappingIsolatorTest.ROOT_ContainerCMPInternal PortMappingIsolatorTest.ROOT_NC_HostToContainerTCP {noformat} Here is an example failure test log: {noformat} [00:28:37] : [Step 10/10] [ RUN ] PortMappingIsolatorTest.ROOT_TooManyContainers [00:28:37]W: [Step 10/10] I0606 00:28:37.046444 24846 port_mapping_tests.cpp:229] Using eth0 as the public interface [00:28:37]W: [Step 10/10] I0606 00:28:37.046728 24846 port_mapping_tests.cpp:237] Using lo as the loopback interface [00:28:37]W: [Step 10/10] I0606 00:28:37.058758 24846 resources.cpp:572] Parsing resources as JSON failed: cpus:2;mem:1024;disk:1024;ephemeral_ports:[30001-30999];ports:[31000-32000] [00:28:37]W: [Step 10/10] Trying semicolon-delimited string format instead [00:28:37]W: [Step 10/10] I0606 00:28:37.059711 24846 port_mapping.cpp:1557] Using eth0 as the public interface [00:28:37]W: [Step 10/10] I0606 00:28:37.059998 24846 port_mapping.cpp:1582] Using lo as the loopback interface [00:28:37]W: [Step 10/10] I0606 00:28:37.061126 24846 port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh3 = '1024' [00:28:37]W: [Step 10/10] I0606 00:28:37.061172 24846 port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh1 = '128' [00:28:37]W: [Step 10/10] I0606 00:28:37.061206 24846 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_wmem = '4096 16384 4194304' [00:28:37]W: [Step 10/10] I0606 00:28:37.061256 24846 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_synack_retries = '5' [00:28:37]W: [Step 10/10] I0606 00:28:37.061297 24846 port_mapping.cpp:1869] /proc/sys/net/core/rmem_max = '212992' [00:28:37]W: [Step 10/10] I0606 00:28:37.061331 24846 port_mapping.cpp:1869] /proc/sys/net/core/somaxconn = '128' [00:28:37]W: [Step 10/10] I0606 00:28:37.061360 24846 port_mapping.cpp:1869] /proc/sys/net/core/wmem_max = '212992' [00:28:37]W: [Step 10/10] I0606 00:28:37.061390 24846 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_rmem = '4096 87380 6291456' [00:28:37]W: [Step 10/10] I0606 00:28:37.061419 24846 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_time = '7200' [00:28:37]W: [Step 10/10] I0606 00:28:37.061450 24846 port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh2 = '512' [00:28:37]W: [Step 10/10] I0606 00:28:37.061480 24846 port_mapping.cpp:1869] /proc/sys/net/core/netdev_max_backlog = '1000' [00:28:37]W: [Step 10/10] I0606 00:28:37.061511 24846 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_intvl = '75' [00:28:37]W: [Step 10/10] I0606 00:28:37.061540 24846 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_probes = '9' [00:28:37]W: [Step 10/10] I0606 00:28:37.061569 24846 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_max_syn_backlog = '512' [00:28:37]W: [Step 10/10] I0606 00:28:37.061599 24846 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_retries2 = '15' [00:28:37]W: [Step 10/10] I0606 00:28:37.069964 24846 linux_launcher.cpp:101] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher [00:28:37]W: [Step 10/10] I0606 00:28:37.070144 24846 resources.cpp:572] Parsing resources as JSON failed: ports:[31000-31499] [00:28:37]W: [Step 10/10] Trying semicolon-delimited string format instead [00:28:37]W: [Step 10/10] I0606 00:28:37.070677 24867 port_mapping.cpp:2512] Using non-ephemeral ports {[31000,31500)} and ephemeral ports [30208,30720) for container container1 of executor '' [00:28:37]W: [Step 10/10] I0606 00:28:37.071688 24846 linux_launcher.cpp:281] Cloning child process with flags = CLONE_NEWNS | CLONE_NEWNET [00:28:37]W: [Step 10/10] I0606 00:28:37.084079 24863 port_mapping.cpp:2576] Bind mounted '/proc/11997/ns/net' to '/run/netns/11997' for container container1 [00:28:37] : [Step 10/10] ../../src/tests/containerizer/port_mapping_tests.cpp:1438: Failure [00:28:37] : [Step 10/10] (isolator.get()->isolate(containerId1, pid.get())).failure(): Failed to symlink the network namespace handle '/var/run/mesos/netns/container1' -> '/run/netns/11997': File exists [00:28:37] : [Step 10/10] [ FAILED ] PortMappingIsolatorTest.ROOT_TooManyContainers (57 ms) {noformat}",3 MESOS-5677,"Provide doc examples for dynamic reservation/persistent volumes","Users have found it difficult to make use of the dynamic reservation and persistent volume features. The API governing use of these features is a bit complicated, and this leads to users having trouble forming correct requests for reservations, volume creation, etc. Providing multiple examples of reserve/unreserve/create/destroy requests would make it much easier for users to get started.",3 MESOS-5679,"Example frameworks should allow setting failover timeout","The example frameworks do not currently set a framework failover timeout when they register with the master. This means that when these frameworks are used in prolonged testing scenarios, small network outages can lead to flapping frameworks. We should either set the failover timeout to a reasonable value in the example frameworks, or add command-line flags that allow the timeout to be set.",2 MESOS-5684,"Master captures `this` when creating authorization callback","When exposing its log file, the master currently installs an authorization callback for the log file which captures the master's {{this}} pointer. Such captures have previously caused bugs (MESOS-5629), and this one should be fixed as well. The callback should be dispatched to the master process, and it should be dispatched via the {{self()}} PID.",1 MESOS-5685,"The /files/download endpoint's authorization can be compromised","If a forward slash is appended to the path of a file a user wishes to download via {{/files/download}}, the authorization logic for that path will be bypassed and the file will be downloaded regardless of permissions. This is because we store the authorization callbacks for these paths in a map which is keyed by the path name, so a request to {{/master/log/}} fails to find the callback which is installed for {{/master/log}}. When the master fails to find the callback, it assumes authorization is not required for that path and authorizes the action. Consider the following excerpt: {code} gmann@gmac:~/src/mesos/build⚡ http GET http://127.0.0.1:5050/files/download\?path\=/master/log -a foo:bar HTTP/1.1 403 Forbidden Content-Length: 0 Date: Wed, 22 Jun 2016 21:28:53 GMT gmann@gmac:~/src/mesos/build⚡ http GET http://127.0.0.1:5050/files/download\?path\=/master/log/ -a foo:bar HTTP/1.1 200 OK Content-Disposition: attachment; filename=mesos-master.gmac.gmann.log.INFO.20160622-142843.65615 Content-Length: 14432 Content-Type: application/octet-stream Date: Wed, 22 Jun 2016 21:28:56 GMT Log file created at: 2016/06/22 14:28:43 Running on machine: gmac Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg I0622 14:28:43.476925 2080764672 logging.cpp:194] INFO level logging started! I0622 14:28:43.477522 2080764672 main.cpp:367] Using 'HierarchicalDRF' allocator I0622 14:28:43.480650 2080764672 leveldb.cpp:174] Opened db in 2961us I0622 14:28:43.481046 2080764672 leveldb.cpp:181] Compacted db in 372us I0622 14:28:43.481078 2080764672 leveldb.cpp:196] Created db iterator in 13us I0622 14:28:43.481096 2080764672 leveldb.cpp:202] Seeked to beginning of db in 9us I0622 14:28:43.481111 2080764672 leveldb.cpp:271] Iterated through 0 keys in the db in 8us I0622 14:28:43.481165 2080764672 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0622 14:28:43.481967 219914240 recover.cpp:451] Starting replica recovery I0622 14:28:43.482193 219914240 recover.cpp:477] Replica is in EMPTY status I0622 14:28:43.482589 2080764672 main.cpp:488] Creating default 'local' authorizer I0622 14:28:43.482719 2080764672 main.cpp:545] Starting Mesos master I0622 14:28:43.483085 218841088 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (4)@127.0.0.1:5050 I0622 14:28:43.487284 218304512 recover.cpp:197] Received a recover response from a replica in EMPTY status I0622 14:28:43.487694 219914240 recover.cpp:568] Updating replica status to STARTING {code} We could consider disallowing paths which end in trailing slashes.",2 MESOS-5691,"SSL downgrade support will leak sockets in CLOSE_WAIT status","Repro steps: 1) Start a master: {code} bin/mesos-master.sh --work_dir=/tmp/master {code} 2) Start an agent with SSL and downgrade enabled: {code} # Taken from http://mesos.apache.org/documentation/latest/ssl/ openssl genrsa -des3 -f4 -passout pass:some_password -out key.pem 4096 openssl req -new -x509 -passin pass:some_password -days 365 -key key.pem -out cert.pem SSL_KEY_FILE=key.pem SSL_CERT_FILE=cert.pem SSL_ENABLED=true SSL_SUPPORT_DOWNGRADE=true sudo -E bin/mesos-agent.sh --master=localhost:5050 --work_dir=/tmp/agent {code} 3) Start a framework that launches lots of executors, one after another: {code} sudo src/balloon-framework --master=localhost:5050 --task_memory=64mb --task_memory_usage_limit=256mb --long_running {code} 4) Check FDs, repeatedly {code} sudo lsof -i | grep mesos | grep CLOSE_WAIT | wc -l {code} The number of sockets in {{CLOSE_WAIT}} will increase linearly with the number of launched executors.",5 MESOS-5697,"Support file volume in mesos containerizer.","Currently in mesos containerizer, the host_path volume (to be bind mounted from a host path) specified in ContainerInfo can only be a directory. We should also support the volume type as a file.",3 MESOS-5698,"Quota sorter not updated for resource changes at agent.","Consider this sequence of events: 1. Slave connects, with 128MB of disk. 2. Master offers resources at slave to framework 3. Framework creates a dynamic reservation for 1MB and a persistent volume of the same size on the slave's resources. => This invokes {{Master::apply}}, which invokes {{allocator->updateAllocation}}, which invokes {{Sorter::update()}} on the framework sorter and role sorter. If the framework's role has a configured quota, it also invokes {{update}} on the quota role sorter -- in this case, the framework's role has no quota, so the quota role sorter is *not* updated. => {{DRFSorter::update}} updates the *total* resources at a given slave, among updating other state. New total resources will be 127MB of unreserved disk and 1MB of reserved disk with a volume. Note that the quota role sorter still thinks the slave has 128MB of unreserved disk. 4. The slave is removed from the cluster. {{HierarchicalAllocatorProcess::removeSlave}} invokes: {code} roleSorter->remove(slaveId, slaves[slaveId].total); quotaRoleSorter->remove(slaveId, slaves[slaveId].total.nonRevocable()); {code} {{slaves\[slaveId\].total.nonRevocable()}} is 127MB of unreserved disk and 1MB of reserved disk with a volume. When we remove this from the quota role sorter, we're left with total resources on the reserved slave of 1MB of unreserved disk, since that is the result of subtracting <127MB unreserved, 1MB reserved+volume> from <128MB unreserved>. The implications of this can't be good: at minimum, we're leaking resources for removed slaves in the quota role sorter. We're also introducing an inconsistency between {{total_.resources\[slaveId\]}} and {{total_.scalarQuantities}}, since the latter has already stripped-out volume/reservation information.",5 MESOS-5699,"Create new documentation for Mesos networking.","With introduction of CNI and dockers support docker user-defined networks, there are quite a few options within Mesos for IP-per-container solutions for container networking. We therefore need to re-write networking documentation for Mesos highlighting all the networking support that Mesos provides for orchestrating containers on IP networks.",1 MESOS-5704,"Fine-grained authorization on /frameworks","Even if ACLs were defined for the actions VIEW_FRAMEWORKS, VIEW_EXECUTORS and VIEW_TASKS, the data these actions were supposed to protect, could still leaked through the master's /frameworks endpoint, since it didn't enable any authorization mechanism.",3 MESOS-5705,"ZK credential is exposed in /flags and /state","Mesos allows zk credentials to be embedded in the zk url, but exposes these credentials in the /flags and /state endpoint. Even though /state is authorized, it only filters out frameworks/tasks, so the top-level flags are shown to any authenticated user. ""zk"": ""zk://dcos_mesos_master:my_secret_password@127.0.0.1:2181/mesos"", We need to find some way to hide this data, or even add a first-class VIEW_FLAGS acl that applies to any endpoint that exposes flags.",5 MESOS-5706,"GET_ENDPOINT_WITH_PATH authz doesn't make sense for /flags","The master or agent flags are exposed in /state as well as /flags, so any user who wants to disable/control access to the flags likely intends to control access to flags no matter what endpoint exposes them. As such, /flags is a poor candidate for GET_ENDPOINT_WITH_PATH authz, since we care more about protecting the flag data than the specific endpoint path. We should remove the GET_ENDPOINT authz from master and agent /flags until we can come up with a better solution, perhaps a first-class VIEW_FLAGS acl.",2 MESOS-5707,"LocalAuthorizer should error if passed a GET_ENDPOINT ACL with an unhandled path","Since GET_ENDPOINT_WITH_PATH doesn't (yet) work with any arbitrary path, we should a) validate --acls and error if GET_ENDPOINT_WITH_PATH has a path object that doesn't match an endpoint that uses this authz strategy. b) document exactly which endpoints support GET_ENDPOINT_WITH_PATH",3 MESOS-5708,"Add authz to /files/debug","The /files/debug endpoint exposes the attached master/agent log paths and every attached sandbox path, which includes the frameworkId and executorId. Even if sandboxes are protected, we still don't want to expose this information to unauthorized users.",3 MESOS-5709,"Authorization for /roles","The /roles endpoint exposes the list of all roles and their weights, as well as the list of all frameworkIds registered with each role. This is a superset of the information exposed on GET /weights, which we already protect. We should protect the data in /roles the same way. - Should we reuse VIEW_FRAMEWORK with role (from /state)? - Should we add a new VIEW_ROLE and adapt GET_WEIGHTS to use it?",3 MESOS-5710,"The /logging/toggle endpoint accepts requests with any http method","Any of a GET, POST, PUT, or DELETE to `/logging/toggle?level=INFO&duration=5mins` will set the log level and return 200. To be consistent with REST-like syntax, DELETE, GET, and even POST are wrong and should return a MethodNotAllowed. Once this endpoint no longer accepts GET, it is no longer appropriate to use the GET_ENDPOINT acl here. Instead we could create a new PUT_ENDPOINT_WITH_PATH acl (which hopefully ignores query params), or add a first-class TOGGLE_LOGGING acl.",3 MESOS-5711,"Update AUTHORIZATION strings in endpoint help","The endpoint help macros support AUTHENTICATION and AUTHORIZATION sections. We added AUTHORIZATION help for some of the newer endpoints, but not the previously authenticated endpoints. Authorization endpoints needing help string updates: Master::Http::CREATE_VOLUMES_HELP Master::Http::DESTROY_VOLUMES_HELP Master::Http::RESERVE_HELP Master::Http::STATE_HELP Master::Http::STATESUMMARY_HELP Master::Http::TEARDOWN_HELP Master::Http::TASKS_HELP Master::Http::UNRESERVE_HELP Slave::Http::STATE_HELP",2 MESOS-5712,"Document exactly what is handled by GET_ENDPOINTS_WITH_PATH acl","Users may expect that the GET_ENDPOINT_WITH_PATH acl can be used with any Mesos endpoint, but that is not (yet) the case. We should clearly document the list of applicable endpoints, in authorization.md and probably even upgrades.md.",1 MESOS-5713,"Add a __sockets__ diagnostic endpoint to libprocess.","Libprocess exposes a endpoint {{/__processes__}}, which displays some info on the existing actors and messages queued up on each. It would be nice to inspect the state of libprocess's {{SocketManager}} too. This could be an endpoint like {{/__sockets__}} that exposes information like: * Inbound FDs: type and source * Outbound FDs: type and source * Temporary and persistent sockets * Linkers and linkees. * Outgoing messages and their associated socket",3 MESOS-5716,"Document docker private registry with authentication support in Unified Containerizer.","Add documentation for docker private registry with authentication support in unified containerizer. This is the basic support for docker private registry.",3 MESOS-5717,"Can't autodiscovery GPU resources without '--enable-nvidia-gpu-support' and '--nvidia_gpu_devices' flags","Prerequisite: In MESOS\-5257 ""By default, with no '\-\-nvidia_gpu_devices' flag or `gpus` resources flag, the new auto-discovery will simply enumerate all of the GPUs on the system"" and in MESOS\-5630 ""removes this flag(\-\-enable-nvidia-gpu-support) and enables this support for all builds on Linux."" So, I '../configure' without any flag, and start agent without '\-\-resources' or '\-\-nvidia_gpu_devices' , but can not discovery GPU resources, and I also start agent with '\-\-resources' and '\-\-nvidia_gpu_devices' , it also does not work. I'm sure the NVIDIA GPUs on my machines are OK, because with '\-\-enable-nvidia-gpu-support' when './configure' and with '\-\-resources', '\-\-nvidia_gpu_devices' when starting agents it works well.",2 MESOS-5723,"SSL-enabled libprocess will leak incoming links to forks","Encountered two different buggy behaviors that can be tracked down to the same underlying problem. Repro #1 (non-crashy): (1) Start a master. Doesn't matter if SSL is enabled or not. (2) Start an agent, with SSL enabled. Downgrade support has the same problem. The master/agent {{link}} to one another. (3) Run a sleep task. Keep this alive. If you inspect FDs at this point, you'll notice the task has inherited the {{link}} FD (master -> agent). (4) Restart the agent. Due to (3), the master's {{link}} stays open. (5) Check master's logs for the agent's re-registration message. (6) Check the agent's logs for re-registration. The message will not appear. The master is actually using the old {{link}} which is not connected to the agent. ---- Repro #2 (crashy): (1) Start a master. Doesn't matter if SSL is enabled or not. (2) Start an agent, with SSL enabled. Downgrade support has the same problem. (3) Run ~100 sleep task one after the other, keep them all alive. Each task links back to the agent. Due to an FD leak, each task will inherit the incoming links from all other actors... (4) At some point, the agent will run out of FDs and kernel panic. ---- It appears that the SSL socket {{accept}} call is missing {{os::nonblock}} and {{os::cloexec}} calls: https://github.com/apache/mesos/blob/4b91d936f50885b6a66277e26ea3c32fe942cf1a/3rdparty/libprocess/src/libevent_ssl_socket.cpp#L794-L806 For reference, here's {{poll}} socket's {{accept}}: https://github.com/apache/mesos/blob/4b91d936f50885b6a66277e26ea3c32fe942cf1a/3rdparty/libprocess/src/poll_socket.cpp#L53-L75 ",2 MESOS-5727,"Command executor health check does not work when the task specifies container image.","Since we launch the task after pivot_root, we no longer has the access to the mesos-health-check binary. The solution is to refactor health check to be a library (libprocess) so that it does not depend on the underlying filesystem. One note here is that we should strive to keep both the command executor and the task in the same mount namespace so that Mesos CLI tooling does not need to find the mount namespace for the task. It just need to find the corresponding pid for the executor.",5 MESOS-5729,"Consider allowing the libprocess caller an option to not set CLOEXEC on libprocess sockets","Both implementations of libprocess's {{Socket}} interface will set the {{CLOEXEC}} option on all new sockets (incoming or outgoing). This assumption is pervasive across Mesos, but since libprocess aims to be a general-purpose library, the caller should be able to *not* {{CLOEXEC}} sockets when desired. See TODOs added here: https://reviews.apache.org/r/49281/",3 MESOS-5740,"Consider adding `relink` functionality to libprocess","Currently we don't have the {{relink}} functionality in libprocess. i.e. A way to create a new persistent connection between actors, even if a connection already exists. This can benefit us in a couple of ways: - The application may have more information on the state of a connection than libprocess does, as libprocess only checks if the connection is alive or not. For example, a linkee may accept a connection, then fork, pass the connection to a child, and subsequently exit. As the connection is still active, libprocess may not detect the exit. - Sometimes, the {{ExitedEvent}} might be delayed or might be dropped due to the remote instance being unavailable (e.g., partition, network intermediaries not sending RST's etc). ",3 MESOS-5742,"When start an agent with `--resources`, the GPU resource can be fractional","So far, the GPU resource is not fractional, only integer values are allowed. But when starting agents with {{\-\-resources='gpu:1.2'}}, it can also work without any warning or error. And in the webui the GPU resource is `1.2`.",1 MESOS-5748,"Potential segfault in `link` and `send` when linking to a remote process","There is a race in the SocketManager, between a remote {{link}} and disconnection of the underlying socket. We potentially segfault here: https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1512 {{\*socket}} dereferences the shared pointer underpinning the {{Socket*}} object. However, the code above this line actually has ownership of the pointer: https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1494-L1499 If the socket dies during the link, the {{ignore_recv_data}} may delete the Socket underneath {{link}}: https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1399-L1411 ---- The same race exists for {{send}}. This race was discovered while running a new test in repetition: https://reviews.apache.org/r/49175/ On OSX, I hit the race consistently every 500-800 repetitions: {code} 3rdparty/libprocess/libprocess-tests --gtest_filter=""ProcessRemoteLinkTest.RemoteLink"" --gtest_break_on_failure --gtest_repeat=1000 {code}",2 MESOS-5753,"Command executor should use `mesos-containerizer launch` to launch user task.","Currently, command executor and `mesos-containerizer launch` share a lot of the logic. Command executor should in fact, just use `mesos-containerizer launch` to launch the user task. Potentially, `mesos-containerizer launch` can be also used by custom executor to launch user tasks.",8 MESOS-5754,"CommandInfo.user not honored in docker containerizer","Repro by creating a framework that starts a task with CommandInfo.user set, and observe that the dockerized executor is still running as the default (e.g. root). cc [~kaysoky]",3 MESOS-5755,"NVML headers are not installed as part of 3rdparty install with --enable-install-module-dependencies","Review: https://reviews.apache.org/r/49480/ ",2 MESOS-5759,"ProcessRemoteLinkTest.RemoteUseStaleLink and RemoteStaleLinkRelink are flaky","{{ProcessRemoteLinkTest.RemoteUseStaleLink}} and {{ProcessRemoteLinkTest.RemoteStaleLinkRelink}} are failing occasionally with the error: {code} [ RUN ] ProcessRemoteLinkTest.RemoteStaleLinkRelink WARNING: Logging before InitGoogleLogging() is written to STDERR I0630 07:42:34.661110 18888 process.cpp:1066] libprocess is initialized on 172.17.0.2:56294 with 16 worker threads E0630 07:42:34.666393 18765 process.cpp:2104] Failed to shutdown socket with fd 7: Transport endpoint is not connected /mesos/3rdparty/libprocess/src/tests/process_tests.cpp:1059: Failure Value of: exitedPid.isPending() Actual: false Expected: true [ FAILED ] ProcessRemoteLinkTest.RemoteStaleLinkRelink (56 ms) {code} There appears to be a race between establishing a socket connection and the test calling {{::shutdown}} on the socket. Under some circumstances, the {{::shutdown}} may actually result in failing the future in {{SocketManager::link_connect}} error and thereby trigger {{SocketManager::close}}.",1 MESOS-5761,"Improve the logic of orphan tasks","Right now, a task is called orphaned if an agent re-registers with it but the corresponding framework information is not known to the master. This happens immediately after a master failover. It would great if the master knows the information about the framework even after a failover, irrespective of whether a framework re-registers, so that we don't have orphan tasks. Getting rid of orphan tasks will make the task authorization story easy (see MESOS-5757).",5 MESOS-5765,"Add 'systemGetDriverVersion' to NVML abstraction.","This command returns a string representing the version of the underlying Nvidia drivers installed on a host. It will be used by the upcoming {{NvidiaVolume}} component.",2 MESOS-5766,"Missing License Information for Bundled NVML headers","See Summary",1 MESOS-5767,"Add ELFIO as bundled Dependency to Mesos","ELFIO is a header-only replacement for parsing ELF binaries. Previously we were using libelf, which introduced both a new build-time dependency as well as a runtime dependence even though we only really needed this library when operating on machines that have GPUs. By using this header-only library and bundling it with Mesos, we can remove this external dependence altogether.",2 MESOS-5768,"Reimplement the stout ELF abstraction in terms of ELFIO","With the introduction of the new bundled ELFIO library, we need to reimplement our stout ELF abstraction in terms of it. As part of this, we need to update the tests that use it (i.e. ldcache_test.cpp)",2 MESOS-5769,"Add get_abi_version() to ELF abstraction in stout","This function allows us to inspect the {{.note.ABI-tag}} section of an ELF binary to determine the ABI version of the executable / library. This is needed for checking soe of the logic in building up an NvidiaVolume for injection into a container. ",2 MESOS-5779,"Allow Docker v1 ImageManifests to be parsed from the output of `docker inspect`"," The `docker::spec::v1::ImageManifest` protobuf implements the official v1 image manifest specification found at: https://github.com/docker/docker/blob/master/image/spec/v1.md The field names in this spec are all written in snake_case as are the field names of the JSON representing the image manifest when reading it from disk (for example after performing a `docker save`). As such, the protobuf for ImageManifest also provides these fields in snake_case. Unfortunately, the `docker inspect` command also provides a method of retrieving the JSON for an image manifest, with one major caveat -- it represents all of its top level keys in CamelCase. To allow both representations to be parsed in the same way, we should intercept the incoming JSON from either source (disk or `docker inspect`) and convert it to a canonical snake_case representation.",3 MESOS-5782,"Renamed 'commands' to 'pre_exec_commands' in ContainerLaunchInfo.","Currently the 'commands' in isolator.proto ContainerLaunchInfo is somehow confusing. It is a pre-executed command (can be any script or shell command) before launch. We should renamed 'commands' to 'pre_exec_commands' in ContainerLaunchInfo and add comments.",2 MESOS-5787,"Add ability to set framework capabilities in 'mesos-execute'","For now, we want to add this so that we can run {{mesos-execute}} against agents that offer GPU resources. In the future, as we add more framework capabilities, this functionality will become more generally useful.",2 MESOS-5788,"Consider adding a Java Scheduler Shim/Adapter for the new/old API.","Currently, for existing JAVA based frameworks, moving to try out the new API can be cumbersome. This change intends to introduce a shim/adapter interface that makes this easier by allowing to toggle between the old/new API (driver/new scheduler library) implementation via an environment variable. This would allow framework developers to transition their older frameworks to the new API rather seamlessly. This would look similar to the work done for the executor shim for C++ (command/docker executor). ",8 MESOS-5792,"Add mesos tests to CMake (make check)","Provide CMakeLists.txt and configuration files to build mesos tests using CMake.",8 MESOS-5793,"Add ability to inject Nvidia devices into a container",NULL,3 MESOS-5802,"SlaveAuthorizerTest/0.ViewFlags is flaky.","{noformat} [15:24:47] : [Step 10/10] [ RUN ] SlaveAuthorizerTest/0.ViewFlags [15:24:47]W: [Step 10/10] I0707 15:24:47.025609 25322 containerizer.cpp:196] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni [15:24:47]W: [Step 10/10] I0707 15:24:47.030421 25322 linux_launcher.cpp:101] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher [15:24:47]W: [Step 10/10] I0707 15:24:47.032060 25339 slave.cpp:205] Agent started on 335)@172.30.2.7:43076 [15:24:47]W: [Step 10/10] I0707 15:24:47.032078 25339 slave.cpp:206] Flags at startup: --acls="""" --appc_simple_discovery_uri_prefix=""http://"" --appc_store_dir=""/tmp/mesos/store/appc"" --authenticate_http=""true"" --authenticatee=""crammd5"" --authentication_backoff_factor=""1secs"" --authorizer=""local"" --cgroups_cpu_enable_pids_and_tids_count=""false"" --cgroups_enable_cfs=""false"" --cgroups_hierarchy=""/sys/fs/cgroup"" --cgroups_limit_swap=""false"" --cgroups_root=""mesos"" --container_disk_watch_interval=""15secs"" --containerizers=""mesos"" --credential=""/mnt/teamcity/temp/buildTmp/SlaveAuthorizerTest_0_ViewFlags_OsJb5C/credential"" --default_role=""*"" --disk_watch_interval=""1mins"" --docker=""docker"" --docker_kill_orphans=""true"" --docker_registry=""https://registry-1.docker.io"" --docker_remove_delay=""6hrs"" --docker_socket=""/var/run/docker.sock"" --docker_stop_timeout=""0ns"" --docker_store_dir=""/tmp/mesos/store/docker"" --docker_volume_checkpoint_dir=""/var/run/mesos/isolators/docker/volume"" --enforce_container_disk_quota=""false"" --executor_registration_timeout=""1mins"" --executor_shutdown_grace_period=""5secs"" --fetcher_cache_dir=""/mnt/teamcity/temp/buildTmp/SlaveAuthorizerTest_0_ViewFlags_OsJb5C/fetch"" --fetcher_cache_size=""2GB"" --frameworks_home="""" --gc_delay=""1weeks"" --gc_disk_headroom=""0.1"" --hadoop_home="""" --help=""true"" --hostname_lookup=""true"" --http_authenticators=""basic"" --http_command_executor=""false"" --http_credentials=""/mnt/teamcity/temp/buildTmp/SlaveAuthorizerTest_0_ViewFlags_OsJb5C/http_credentials"" --image_provisioner_backend=""copy"" --initialize_driver_logging=""true"" --isolation=""posix/cpu,posix/mem"" --launcher_dir=""/mnt/teamcity/work/4240ba9ddd0997c3/build/src"" --logbufsecs=""0"" --logging_level=""INFO"" --oversubscribed_resources_interval=""15secs"" --perf_duration=""10secs"" --perf_interval=""1mins"" --qos_correction_interval_min=""0ns"" --quiet=""false"" --recover=""reconnect"" --recovery_timeout=""15mins"" --registration_backoff_factor=""10ms"" --resources=""cpus:2;gpus:0;mem:1024;disk:1024;ports:[31000-32000]"" --revocable_cpu_low_priority=""true"" --sandbox_directory=""/mnt/mesos/sandbox"" --strict=""true"" --switch_user=""true"" --systemd_enable_support=""true"" --systemd_runtime_directory=""/run/systemd/system"" --version=""false"" --work_dir=""/mnt/teamcity/temp/buildTmp/SlaveAuthorizerTest_0_ViewFlags_OsJb5C"" --xfs_project_range=""[5000-10000]"" [15:24:47]W: [Step 10/10] I0707 15:24:47.032306 25339 credentials.hpp:86] Loading credential for authentication from '/mnt/teamcity/temp/buildTmp/SlaveAuthorizerTest_0_ViewFlags_OsJb5C/credential' [15:24:47]W: [Step 10/10] I0707 15:24:47.032424 25339 slave.cpp:343] Agent using credential for: test-principal [15:24:47]W: [Step 10/10] I0707 15:24:47.032441 25339 credentials.hpp:37] Loading credentials for authentication from '/mnt/teamcity/temp/buildTmp/SlaveAuthorizerTest_0_ViewFlags_OsJb5C/http_credentials' [15:24:47]W: [Step 10/10] I0707 15:24:47.032528 25339 slave.cpp:395] Using default 'basic' HTTP authenticator [15:24:47]W: [Step 10/10] I0707 15:24:47.032754 25339 resources.cpp:572] Parsing resources as JSON failed: cpus:2;gpus:0;mem:1024;disk:1024;ports:[31000-32000] [15:24:47]W: [Step 10/10] Trying semicolon-delimited string format instead [15:24:47]W: [Step 10/10] I0707 15:24:47.032838 25339 resources.cpp:572] Parsing resources as JSON failed: cpus:2;gpus:0;mem:1024;disk:1024;ports:[31000-32000] [15:24:47]W: [Step 10/10] Trying semicolon-delimited string format instead [15:24:47]W: [Step 10/10] I0707 15:24:47.032968 25339 slave.cpp:594] Agent resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] [15:24:47]W: [Step 10/10] I0707 15:24:47.032994 25339 slave.cpp:602] Agent attributes: [ ] [15:24:47]W: [Step 10/10] I0707 15:24:47.032999 25339 slave.cpp:607] Agent hostname: ip-172-30-2-7.ec2.internal.mesosphere.io [15:24:47]W: [Step 10/10] I0707 15:24:47.033291 25339 process.cpp:3322] Handling HTTP event for process 'slave(335)' with path: '/slave(335)/flags' [15:24:47]W: [Step 10/10] I0707 15:24:47.033329 25343 state.cpp:57] Recovering state from '/mnt/teamcity/temp/buildTmp/SlaveAuthorizerTest_0_ViewFlags_OsJb5C/meta' [15:24:47]W: [Step 10/10] I0707 15:24:47.033576 25342 status_update_manager.cpp:200] Recovering status update manager [15:24:47] : [Step 10/10] ../../src/tests/slave_authorization_tests.cpp:316: Failure [15:24:47]W: [Step 10/10] I0707 15:24:47.033604 25340 http.cpp:269] HTTP GET for /slave(335)/flags from 172.30.2.7:33866 [15:24:47] : [Step 10/10] Value of: (response).get().status [15:24:47] : [Step 10/10] Actual: ""503 Service Unavailable"" [15:24:47]W: [Step 10/10] I0707 15:24:47.033687 25340 containerizer.cpp:522] Recovering containerizer [15:24:47] : [Step 10/10] Expected: OK().status [15:24:47] : [Step 10/10] Which is: ""200 OK"" [15:24:47]W: [Step 10/10] I0707 15:24:47.034953 25340 process.cpp:3322] Handling HTTP event for process 'slave(335)' with path: '/slave(335)/state' [15:24:47] : [Step 10/10] Agent has not finished recovery [15:24:47] : [Step 10/10] ../../src/tests/slave_authorization_tests.cpp:320: Failure [15:24:47]W: [Step 10/10] I0707 15:24:47.035152 25343 http.cpp:269] HTTP GET for /slave(335)/state from 172.30.2.7:33868 [15:24:47] : [Step 10/10] parse: syntax error at line 1 near: Agent has not finished recovery [15:24:47]W: [Step 10/10] I0707 15:24:47.035768 25341 slave.cpp:841] Agent terminating [15:24:47]W: [Step 10/10] I0707 15:24:47.036150 25337 provisioner.cpp:253] Provisioner recovery complete [15:24:47] : [Step 10/10] [ FAILED ] SlaveAuthorizerTest/0.ViewFlags, where TypeParam = mesos::internal::LocalAuthorizer (14 ms) {noformat}",2 MESOS-5806,"CNI isolator should prepare network related /etc/* files for containers using host mode but specify container images.","Currently, the CNI isolator will just ignore those containers that want to join the host network (i.e., not specifying NetworkInfo). However, if the container specifies a container image, we need to make sure that it has access to host /etc/* files. We should perform the bind mount for the container. This is also what docker does when a container is running in host mode.",5 MESOS-5812,"MasterAPITest.Subscribe is flaky","This test seems to be flaky, although on Mac OS X and CentOS 7 the error a bit different. On Mac OS X: {noformat}[ RUN ] ContentType/MasterAPITest.Subscribe/0 I0708 11:42:48.474665 1927435008 cluster.cpp:155] Creating default 'local' authorizer I0708 11:42:48.480677 1927435008 leveldb.cpp:174] Opened db in 5727us I0708 11:42:48.481494 1927435008 leveldb.cpp:181] Compacted db in 722us I0708 11:42:48.481541 1927435008 leveldb.cpp:196] Created db iterator in 19us I0708 11:42:48.481572 1927435008 leveldb.cpp:202] Seeked to beginning of db in 9us I0708 11:42:48.481587 1927435008 leveldb.cpp:271] Iterated through 0 keys in the db in 7us I0708 11:42:48.481617 1927435008 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0708 11:42:48.482030 350982144 recover.cpp:451] Starting replica recovery I0708 11:42:48.482203 350982144 recover.cpp:477] Replica is in EMPTY status I0708 11:42:48.484107 348299264 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (3780)@127.0.0.1:50325 I0708 11:42:48.484318 350982144 recover.cpp:197] Received a recover response from a replica in EMPTY status I0708 11:42:48.484750 348835840 master.cpp:382] Master e055d60c-05ff-487e-82da-d0a43e52605c (localhost) started on 127.0.0.1:50325 I0708 11:42:48.484850 349908992 recover.cpp:568] Updating replica status to STARTING I0708 11:42:48.484788 348835840 master.cpp:384] Flags at startup: --acls="""" --agent_ping_timeout=""15secs"" --agent_reregister_timeout=""10mins"" --allocation_interval=""1secs"" --allocator=""HierarchicalDRF"" --authenticate_agents=""true"" --authenticate_frameworks=""true"" --authenticate_http=""true"" --authenticate_http_frameworks=""true"" --authenticators=""crammd5"" --authorizers=""local"" --credentials=""/private/tmp/Sn2Kf4/credentials"" --framework_sorter=""drf"" --help=""false"" --hostname_lookup=""true"" --http_authenticators=""basic"" --http_framework_authenticators=""basic"" --initialize_driver_logging=""true"" --log_auto_initialize=""true"" --logbufsecs=""0"" --logging_level=""INFO"" --max_agent_ping_timeouts=""5"" --max_completed_frameworks=""50"" --max_completed_tasks_per_framework=""1000"" --quiet=""false"" --recovery_agent_removal_limit=""100%"" --registry=""replicated_log"" --registry_fetch_timeout=""1mins"" --registry_store_timeout=""100secs"" --registry_strict=""true"" --root_submissions=""true"" --user_sorter=""drf"" --version=""false"" --webui_dir=""/usr/local/share/mesos/webui"" --work_dir=""/private/tmp/Sn2Kf4/master"" --zk_session_timeout=""10secs"" W0708 11:42:48.485263 348835840 master.cpp:387] ************************************************** Master bound to loopback interface! Cannot communicate with remote schedulers or agents. You might want to set '--ip' flag to a routable IP address. ************************************************** I0708 11:42:48.485291 348835840 master.cpp:434] Master only allowing authenticated frameworks to register I0708 11:42:48.485314 348835840 master.cpp:448] Master only allowing authenticated agents to register I0708 11:42:48.485335 348835840 master.cpp:461] Master only allowing authenticated HTTP frameworks to register I0708 11:42:48.485347 348835840 credentials.hpp:37] Loading credentials for authentication from '/private/tmp/Sn2Kf4/credentials' I0708 11:42:48.485373 349372416 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 397us I0708 11:42:48.485414 349372416 replica.cpp:320] Persisted replica status to STARTING I0708 11:42:48.485608 350982144 recover.cpp:477] Replica is in STARTING status I0708 11:42:48.485749 348835840 master.cpp:506] Using default 'crammd5' authenticator I0708 11:42:48.485852 348835840 master.cpp:578] Using default 'basic' HTTP authenticator I0708 11:42:48.486018 348835840 master.cpp:658] Using default 'basic' HTTP framework authenticator I0708 11:42:48.486140 348835840 master.cpp:705] Authorization enabled I0708 11:42:48.486486 350982144 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (3783)@127.0.0.1:50325 I0708 11:42:48.486758 352055296 recover.cpp:197] Received a recover response from a replica in STARTING status I0708 11:42:48.487176 350982144 recover.cpp:568] Updating replica status to VOTING I0708 11:42:48.487576 352055296 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 300us I0708 11:42:48.487658 352055296 replica.cpp:320] Persisted replica status to VOTING I0708 11:42:48.487736 350982144 recover.cpp:582] Successfully joined the Paxos group I0708 11:42:48.487951 350982144 recover.cpp:466] Recover process terminated I0708 11:42:48.489441 348835840 master.cpp:1973] The newly elected leader is master@127.0.0.1:50325 with id e055d60c-05ff-487e-82da-d0a43e52605c I0708 11:42:48.489518 348835840 master.cpp:1986] Elected as the leading master! I0708 11:42:48.489545 348835840 master.cpp:1673] Recovering from registrar I0708 11:42:48.489637 350982144 registrar.cpp:332] Recovering registrar I0708 11:42:48.490120 351518720 log.cpp:553] Attempting to start the writer I0708 11:42:48.491161 350445568 replica.cpp:493] Replica received implicit promise request from (3784)@127.0.0.1:50325 with proposal 1 I0708 11:42:48.491461 350445568 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 252us I0708 11:42:48.491528 350445568 replica.cpp:342] Persisted promised to 1 I0708 11:42:48.492337 348299264 coordinator.cpp:238] Coordinator attempting to fill missing positions I0708 11:42:48.493482 349372416 replica.cpp:388] Replica received explicit promise request from (3785)@127.0.0.1:50325 for position 0 with proposal 2 I0708 11:42:48.493854 349372416 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 283us I0708 11:42:48.493904 349372416 replica.cpp:712] Persisted action at 0 I0708 11:42:48.495302 348299264 replica.cpp:537] Replica received write request for position 0 from (3786)@127.0.0.1:50325 I0708 11:42:48.495455 348299264 leveldb.cpp:436] Reading position from leveldb took 45us I0708 11:42:48.495761 348299264 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 261us I0708 11:42:48.495803 348299264 replica.cpp:712] Persisted action at 0 I0708 11:42:48.496484 350445568 replica.cpp:691] Replica received learned notice for position 0 from @0.0.0.0:0 I0708 11:42:48.496795 350445568 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 255us I0708 11:42:48.496857 350445568 replica.cpp:712] Persisted action at 0 I0708 11:42:48.496896 350445568 replica.cpp:697] Replica learned NOP action at position 0 I0708 11:42:48.497445 350982144 log.cpp:569] Writer started with ending position 0 I0708 11:42:48.498523 350982144 leveldb.cpp:436] Reading position from leveldb took 80us I0708 11:42:48.499307 349908992 registrar.cpp:365] Successfully fetched the registry (0B) in 9.63712ms I0708 11:42:48.499464 349908992 registrar.cpp:464] Applied 1 operations in 36us; attempting to update the 'registry' I0708 11:42:48.499953 351518720 log.cpp:577] Attempting to append 159 bytes to the log I0708 11:42:48.500088 350982144 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 1 I0708 11:42:48.500880 348299264 replica.cpp:537] Replica received write request for position 1 from (3787)@127.0.0.1:50325 I0708 11:42:48.501186 348299264 leveldb.cpp:341] Persisting action (178 bytes) to leveldb took 259us I0708 11:42:48.501231 348299264 replica.cpp:712] Persisted action at 1 I0708 11:42:48.501786 351518720 replica.cpp:691] Replica received learned notice for position 1 from @0.0.0.0:0 I0708 11:42:48.502118 351518720 leveldb.cpp:341] Persisting action (180 bytes) to leveldb took 311us I0708 11:42:48.502260 351518720 replica.cpp:712] Persisted action at 1 I0708 11:42:48.502305 351518720 replica.cpp:697] Replica learned APPEND action at position 1 I0708 11:42:48.503475 349908992 registrar.cpp:509] Successfully updated the 'registry' in 3.944192ms I0708 11:42:48.503909 349908992 registrar.cpp:395] Successfully recovered registrar I0708 11:42:48.504003 350982144 log.cpp:596] Attempting to truncate the log to 1 I0708 11:42:48.504250 349372416 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 2 I0708 11:42:48.504546 350445568 master.cpp:1781] Recovered 0 agents from the Registry (121B) ; allowing 10mins for agents to re-register I0708 11:42:48.506022 352055296 replica.cpp:537] Replica received write request for position 2 from (3788)@127.0.0.1:50325 I0708 11:42:48.506479 352055296 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 320us I0708 11:42:48.506513 352055296 replica.cpp:712] Persisted action at 2 I0708 11:42:48.506978 351518720 replica.cpp:691] Replica received learned notice for position 2 from @0.0.0.0:0 I0708 11:42:48.507155 351518720 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 169us I0708 11:42:48.507237 351518720 leveldb.cpp:399] Deleting ~1 keys from leveldb took 37us I0708 11:42:48.507264 351518720 replica.cpp:712] Persisted action at 2 I0708 11:42:48.507285 351518720 replica.cpp:697] Replica learned TRUNCATE action at position 2 I0708 11:42:48.521363 1927435008 cluster.cpp:432] Creating default 'local' authorizer I0708 11:42:48.522498 350982144 slave.cpp:205] Agent started on 119)@127.0.0.1:50325 I0708 11:42:48.522538 350982144 slave.cpp:206] Flags at startup: --acls="""" --appc_simple_discovery_uri_prefix=""http://"" --appc_store_dir=""/tmp/mesos/store/appc"" --authenticate_http=""true"" --authenticatee=""crammd5"" --authentication_backoff_factor=""1secs"" --authorizer=""local"" --container_disk_watch_interval=""15secs"" --containerizers=""mesos"" --credential=""/var/folders/ny/tcvyblqj43s2gdh2_895v9nw0000gp/T/ContentType_MasterAPITest_Subscribe_0_VaPndX/credential"" --default_role=""*"" --disk_watch_interval=""1mins"" --docker=""docker"" --docker_kill_orphans=""true"" --docker_registry=""https://registry-1.docker.io"" --docker_remove_delay=""6hrs"" --docker_socket=""/var/run/docker.sock"" --docker_stop_timeout=""0ns"" --docker_store_dir=""/tmp/mesos/store/docker"" --docker_volume_checkpoint_dir=""/var/run/mesos/isolators/docker/volume"" --enforce_container_disk_quota=""false"" --executor_registration_timeout=""1mins"" --executor_shutdown_grace_period=""5secs"" --fetcher_cache_dir=""/var/folders/ny/tcvyblqj43s2gdh2_895v9nw0000gp/T/ContentType_MasterAPITest_Subscribe_0_VaPndX/fetch"" --fetcher_cache_size=""2GB"" --frameworks_home="""" --gc_delay=""1weeks"" --gc_disk_headroom=""0.1"" --hadoop_home="""" --help=""false"" --hostname_lookup=""true"" --http_authenticators=""basic"" --http_command_executor=""false"" --http_credentials=""/var/folders/ny/tcvyblqj43s2gdh2_895v9nw0000gp/T/ContentType_MasterAPITest_Subscribe_0_VaPndX/http_credentials"" --image_provisioner_backend=""copy"" --initialize_driver_logging=""true"" --isolation=""posix/cpu,posix/mem"" --launcher_dir=""/Users/zhitao/Uber/sync/zhitao-mesos1.dev.uber.com/home/uber/mesos/build/src"" --logbufsecs=""0"" --logging_level=""INFO"" --oversubscribed_resources_interval=""15secs"" --qos_correction_interval_min=""0ns"" --quiet=""false"" --recover=""reconnect"" --recovery_timeout=""15mins"" --registration_backoff_factor=""10ms"" --resources=""cpus:2;gpus:0;mem:1024;disk:1024;ports:[31000-32000]"" --sandbox_directory=""/mnt/mesos/sandbox"" --strict=""true"" --switch_user=""true"" --version=""false"" --work_dir=""/var/folders/ny/tcvyblqj43s2gdh2_895v9nw0000gp/T/ContentType_MasterAPITest_Subscribe_0_VaPndX"" W0708 11:42:48.522903 350982144 slave.cpp:209] ************************************************** Agent bound to loopback interface! Cannot communicate with remote master(s). You might want to set '--ip' flag to a routable IP address. ************************************************** I0708 11:42:48.522922 350982144 credentials.hpp:86] Loading credential for authentication from '/var/folders/ny/tcvyblqj43s2gdh2_895v9nw0000gp/T/ContentType_MasterAPITest_Subscribe_0_VaPndX/credential' W0708 11:42:48.522965 1927435008 scheduler.cpp:157] ************************************************** Scheduler driver bound to loopback interface! Cannot communicate with remote master(s). You might want to set 'LIBPROCESS_IP' environment variable to use a routable IP address. ************************************************** I0708 11:42:48.522992 1927435008 scheduler.cpp:172] Version: 1.0.0 I0708 11:42:48.523066 350982144 slave.cpp:343] Agent using credential for: test-principal I0708 11:42:48.523092 350982144 credentials.hpp:37] Loading credentials for authentication from '/var/folders/ny/tcvyblqj43s2gdh2_895v9nw0000gp/T/ContentType_MasterAPITest_Subscribe_0_VaPndX/http_credentials' I0708 11:42:48.523334 350982144 slave.cpp:395] Using default 'basic' HTTP authenticator I0708 11:42:48.523973 352055296 scheduler.cpp:461] New master detected at master@127.0.0.1:50325 I0708 11:42:48.524050 350982144 slave.cpp:594] Agent resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0708 11:42:48.524196 350982144 slave.cpp:602] Agent attributes: [ ] I0708 11:42:48.524224 350982144 slave.cpp:607] Agent hostname: localhost I0708 11:42:48.525522 350445568 state.cpp:57] Recovering state from '/var/folders/ny/tcvyblqj43s2gdh2_895v9nw0000gp/T/ContentType_MasterAPITest_Subscribe_0_VaPndX/meta' I0708 11:42:48.525853 350445568 status_update_manager.cpp:200] Recovering status update manager I0708 11:42:48.526165 350445568 slave.cpp:4856] Finished recovery I0708 11:42:48.527223 349372416 status_update_manager.cpp:174] Pausing sending status updates I0708 11:42:48.527231 352055296 slave.cpp:969] New master detected at master@127.0.0.1:50325 I0708 11:42:48.527276 352055296 slave.cpp:1028] Authenticating with master master@127.0.0.1:50325 I0708 11:42:48.527328 352055296 slave.cpp:1039] Using default CRAM-MD5 authenticatee I0708 11:42:48.527561 352055296 slave.cpp:1001] Detecting new master I0708 11:42:48.527582 348299264 authenticatee.cpp:121] Creating new client SASL connection I0708 11:42:48.528666 349908992 master.cpp:6006] Authenticating slave(119)@127.0.0.1:50325 I0708 11:42:48.528880 352055296 authenticator.cpp:98] Creating new server SASL connection I0708 11:42:48.529089 350445568 http.cpp:381] HTTP POST for /master/api/v1/scheduler from 127.0.0.1:50918 I0708 11:42:48.529233 350445568 master.cpp:2272] Received subscription request for HTTP framework 'default' I0708 11:42:48.529261 350445568 master.cpp:2012] Authorizing framework principal 'test-principal' to receive offers for role '*' I0708 11:42:48.529323 352055296 authenticatee.cpp:213] Received SASL authentication mechanisms: CRAM-MD5 I0708 11:42:48.529357 352055296 authenticatee.cpp:239] Attempting to authenticate with mechanism 'CRAM-MD5' I0708 11:42:48.529417 352055296 authenticator.cpp:204] Received SASL authentication start I0708 11:42:48.529503 352055296 authenticator.cpp:326] Authentication requires more steps I0708 11:42:48.529561 352055296 master.cpp:2370] Subscribing framework 'default' with checkpointing disabled and capabilities [ ] I0708 11:42:48.529721 349908992 authenticatee.cpp:259] Received SASL authentication step I0708 11:42:48.530005 348835840 authenticator.cpp:232] Received SASL authentication step I0708 11:42:48.530241 348835840 authenticator.cpp:318] Authentication success I0708 11:42:48.530254 350445568 hierarchical.cpp:271] Added framework e055d60c-05ff-487e-82da-d0a43e52605c-0000 I0708 11:42:48.530900 349908992 authenticatee.cpp:299] Authentication success I0708 11:42:48.531186 350982144 master.cpp:6036] Successfully authenticated principal 'test-principal' at slave(119)@127.0.0.1:50325 I0708 11:42:48.531657 348299264 slave.cpp:1123] Successfully authenticated with master master@127.0.0.1:50325 I0708 11:42:48.531935 349372416 master.cpp:4676] Registering agent at slave(119)@127.0.0.1:50325 (localhost) with id e055d60c-05ff-487e-82da-d0a43e52605c-S0 I0708 11:42:48.532304 349908992 registrar.cpp:464] Applied 1 operations in 55us; attempting to update the 'registry' I0708 11:42:48.532908 348835840 log.cpp:577] Attempting to append 326 bytes to the log I0708 11:42:48.533015 352055296 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 3 I0708 11:42:48.533641 349372416 replica.cpp:537] Replica received write request for position 3 from (3798)@127.0.0.1:50325 I0708 11:42:48.533867 349372416 leveldb.cpp:341] Persisting action (345 bytes) to leveldb took 186us I0708 11:42:48.533917 349372416 replica.cpp:712] Persisted action at 3 I0708 11:42:48.537066 349908992 replica.cpp:691] Replica received learned notice for position 3 from @0.0.0.0:0 I0708 11:42:48.538169 349908992 leveldb.cpp:341] Persisting action (347 bytes) to leveldb took 914us I0708 11:42:48.538226 349908992 replica.cpp:712] Persisted action at 3 I0708 11:42:48.538255 349908992 replica.cpp:697] Replica learned APPEND action at position 3 I0708 11:42:48.539247 352055296 registrar.cpp:509] Successfully updated the 'registry' in 6.895104ms I0708 11:42:48.539302 348299264 log.cpp:596] Attempting to truncate the log to 3 I0708 11:42:48.539393 348299264 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 4 I0708 11:42:48.539798 348835840 master.cpp:4745] Registered agent e055d60c-05ff-487e-82da-d0a43e52605c-S0 at slave(119)@127.0.0.1:50325 (localhost) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0708 11:42:48.539881 348299264 hierarchical.cpp:478] Added agent e055d60c-05ff-487e-82da-d0a43e52605c-S0 (localhost) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (allocated: ) I0708 11:42:48.539901 349908992 slave.cpp:1169] Registered with master master@127.0.0.1:50325; given agent ID e055d60c-05ff-487e-82da-d0a43e52605c-S0 I0708 11:42:48.540287 350445568 status_update_manager.cpp:181] Resuming sending status updates I0708 11:42:48.540501 351518720 replica.cpp:537] Replica received write request for position 4 from (3799)@127.0.0.1:50325 I0708 11:42:48.540583 352055296 master.cpp:5835] Sending 1 offers to framework e055d60c-05ff-487e-82da-d0a43e52605c-0000 (default) I0708 11:42:48.540798 351518720 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 247us I0708 11:42:48.540868 351518720 replica.cpp:712] Persisted action at 4 I0708 11:42:48.540895 349908992 slave.cpp:1229] Forwarding total oversubscribed resources I0708 11:42:48.541035 352055296 master.cpp:5128] Received update of agent e055d60c-05ff-487e-82da-d0a43e52605c-S0 at slave(119)@127.0.0.1:50325 (localhost) with total oversubscribed resources I0708 11:42:48.541291 349908992 hierarchical.cpp:542] Agent e055d60c-05ff-487e-82da-d0a43e52605c-S0 (localhost) updated with oversubscribed resources (total: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000], allocated: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]) I0708 11:42:48.541630 350982144 replica.cpp:691] Replica received learned notice for position 4 from @0.0.0.0:0 I0708 11:42:48.541911 350982144 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 189us I0708 11:42:48.541965 350982144 leveldb.cpp:399] Deleting ~2 keys from leveldb took 28us I0708 11:42:48.541987 350982144 replica.cpp:712] Persisted action at 4 I0708 11:42:48.542006 350982144 replica.cpp:697] Replica learned TRUNCATE action at position 4 I0708 11:42:48.544836 352055296 http.cpp:381] HTTP POST for /master/api/v1 from 127.0.0.1:50920 I0708 11:42:48.544884 352055296 http.cpp:484] Processing call SUBSCRIBE I0708 11:42:48.545382 352055296 master.cpp:7599] Added subscriber: a85e7341-ac15-4f18-9021-1a2efa326442 to the list of active subscribers I0708 11:42:48.550048 348835840 http.cpp:381] HTTP POST for /master/api/v1/scheduler from 127.0.0.1:50919 I0708 11:42:48.550339 348835840 master.cpp:3468] Processing ACCEPT call for offers: [ e055d60c-05ff-487e-82da-d0a43e52605c-O0 ] on agent e055d60c-05ff-487e-82da-d0a43e52605c-S0 at slave(119)@127.0.0.1:50325 (localhost) for framework e055d60c-05ff-487e-82da-d0a43e52605c-0000 (default) I0708 11:42:48.550390 348835840 mas...",3 MESOS-5822,"Add a build script for the Windows CI","The ASF CI for Mesos runs a script that lives inside the Mesos codebase: https://github.com/apache/mesos/blob/1cbfdc3c1e4b8498a67f8531ab264003c8c19fb1/support/docker_build.sh ASF Infrastructure have set up a machine that we can use for building Mesos on Windows. Considering the environment, we will need a separate script to build here.",3 MESOS-5824,"Include disk source information in stringification","Some frameworks (like kafka_mesos) ignore the Source field when trying to reserve an offered mount or path persistent volume; the resulting error message is bewildering: {code:none} Task uses more resources cpus(*):4; mem(*):4096; ports(*):[31000-31000]; disk(kafka, kafka)[kafka_0:data]:960679 than available cpus(*):32; mem(*):256819; ports(*):[31000-32000]; disk(kafka, kafka)[kafka_0:data]:960679; disk(*):240169; {code} The stringification of disk resources should include source information. ",3 MESOS-5825,"Support mounting image volume in mesos containerizer.","Mesos containerizer should be able to support mounting image volume type. Specifically, both image rootfs and default manifest should be reachable inside container's mount namespace.",5 MESOS-5828,"Modularize Network in replicated_log","Currently replicated_log relies on Zookeeper for coordinator election. This is done through network abstraction _ZookeeperNetwork_. We need to modularize this part in order to enable replicated_log when using Master contender/detector modules.",8 MESOS-5841,"Clean up `FlagsBase::add`","In the definition for {{FlagsBase}}, we currently have 20 overloads for the {{FlagsBase::add}} function. This makes both the {{FlagsBase}} class definition and the {{flags.cpp}} files in Mesos difficult to read. We should clean up {{FlagsBase::add}} so that it does not require so many overloads.",3 MESOS-5844,"PersistentVolumeEndpointsTest.OfferCreateThenEndpointRemove test is flaky","Observed on ASF CI: https://builds.apache.org/job/Mesos/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-6)/2497/changes {code} [ RUN ] PersistentVolumeEndpointsTest.OfferCreateThenEndpointRemove I0713 18:43:55.968503 28220 cluster.cpp:155] Creating default 'local' authorizer I0713 18:43:56.082345 28220 leveldb.cpp:174] Opened db in 113.403661ms I0713 18:43:56.131445 28220 leveldb.cpp:181] Compacted db in 49.034774ms I0713 18:43:56.131533 28220 leveldb.cpp:196] Created db iterator in 28012ns I0713 18:43:56.131552 28220 leveldb.cpp:202] Seeked to beginning of db in 3046ns I0713 18:43:56.131564 28220 leveldb.cpp:271] Iterated through 0 keys in the db in 255ns I0713 18:43:56.131614 28220 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0713 18:43:56.134064 28246 recover.cpp:451] Starting replica recovery I0713 18:43:56.134627 28246 recover.cpp:477] Replica is in EMPTY status I0713 18:43:56.136396 28252 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (9553)@172.17.0.8:35418 I0713 18:43:56.136759 28252 recover.cpp:197] Received a recover response from a replica in EMPTY status I0713 18:43:56.137676 28246 recover.cpp:568] Updating replica status to STARTING I0713 18:43:56.148720 28242 master.cpp:382] Master 2258d072-b0c9-4c40-874c-6cf933ee345a (500c3e866abe) started on 172.17.0.8:35418 I0713 18:43:56.148759 28242 master.cpp:384] Flags at startup: --acls="""" --agent_ping_timeout=""15secs"" --agent_reregister_timeout=""10mins"" --allocation_interval=""50ms"" --allocator=""HierarchicalDRF"" --authenticate_agents=""true"" --authenticate_frameworks=""true"" --authenticate_http=""true"" --authenticate_http_frameworks=""true"" --authenticators=""crammd5"" --authorizers=""local"" --credentials=""/tmp/LrwRl4/credentials"" --framework_sorter=""drf"" --help=""false"" --hostname_lookup=""true"" --http_authenticators=""basic"" --http_framework_authenticators=""basic"" --initialize_driver_logging=""true"" --log_auto_initialize=""true"" --logbufsecs=""0"" --logging_level=""INFO"" --max_agent_ping_timeouts=""5"" --max_completed_frameworks=""50"" --max_completed_tasks_per_framework=""1000"" --quiet=""false"" --recovery_agent_removal_limit=""100%"" --registry=""replicated_log"" --registry_fetch_timeout=""1mins"" --registry_store_timeout=""100secs"" --registry_strict=""true"" --roles=""role1"" --root_submissions=""true"" --user_sorter=""drf"" --version=""false"" --webui_dir=""/mesos/mesos-1.1.0/_inst/share/mesos/webui"" --work_dir=""/tmp/LrwRl4/master"" --zk_session_timeout=""10secs"" I0713 18:43:56.149247 28242 master.cpp:434] Master only allowing authenticated frameworks to register I0713 18:43:56.149265 28242 master.cpp:448] Master only allowing authenticated agents to register I0713 18:43:56.149273 28242 master.cpp:461] Master only allowing authenticated HTTP frameworks to register I0713 18:43:56.149283 28242 credentials.hpp:37] Loading credentials for authentication from '/tmp/LrwRl4/credentials' I0713 18:43:56.149780 28242 master.cpp:506] Using default 'crammd5' authenticator I0713 18:43:56.149940 28242 master.cpp:578] Using default 'basic' HTTP authenticator I0713 18:43:56.150091 28242 master.cpp:658] Using default 'basic' HTTP framework authenticator I0713 18:43:56.150209 28242 master.cpp:705] Authorization enabled W0713 18:43:56.150233 28242 master.cpp:768] The '--roles' flag is deprecated. This flag will be removed in the future. See the Mesos 0.27 upgrade notes for more information I0713 18:43:56.150760 28240 hierarchical.cpp:151] Initialized hierarchical allocator process I0713 18:43:56.151018 28249 whitelist_watcher.cpp:77] No whitelist given I0713 18:43:56.155668 28242 master.cpp:1973] The newly elected leader is master@172.17.0.8:35418 with id 2258d072-b0c9-4c40-874c-6cf933ee345a I0713 18:43:56.155781 28242 master.cpp:1986] Elected as the leading master! I0713 18:43:56.155848 28242 master.cpp:1673] Recovering from registrar I0713 18:43:56.156065 28254 registrar.cpp:332] Recovering registrar I0713 18:43:56.201568 28245 hierarchical.cpp:1537] No allocations performed I0713 18:43:56.201666 28245 hierarchical.cpp:1172] Performed allocation for 0 agents in 167962ns I0713 18:43:56.218626 28246 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 80.746657ms I0713 18:43:56.218705 28246 replica.cpp:320] Persisted replica status to STARTING I0713 18:43:56.219219 28246 recover.cpp:477] Replica is in STARTING status I0713 18:43:56.221391 28246 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (9556)@172.17.0.8:35418 I0713 18:43:56.221869 28253 recover.cpp:197] Received a recover response from a replica in STARTING status I0713 18:43:56.222760 28249 recover.cpp:568] Updating replica status to VOTING I0713 18:43:56.252303 28254 hierarchical.cpp:1537] No allocations performed I0713 18:43:56.252404 28254 hierarchical.cpp:1172] Performed allocation for 0 agents in 167038ns I0713 18:43:56.270256 28249 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 47.316392ms I0713 18:43:56.270387 28249 replica.cpp:320] Persisted replica status to VOTING I0713 18:43:56.270700 28250 recover.cpp:582] Successfully joined the Paxos group I0713 18:43:56.271121 28250 recover.cpp:466] Recover process terminated I0713 18:43:56.271503 28248 log.cpp:553] Attempting to start the writer I0713 18:43:56.273140 28240 replica.cpp:493] Replica received implicit promise request from (9557)@172.17.0.8:35418 with proposal 1 I0713 18:43:56.303086 28254 hierarchical.cpp:1537] No allocations performed I0713 18:43:56.303175 28254 hierarchical.cpp:1172] Performed allocation for 0 agents in 155905ns I0713 18:43:56.312978 28240 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 39.718643ms I0713 18:43:56.313405 28240 replica.cpp:342] Persisted promised to 1 I0713 18:43:56.314775 28245 coordinator.cpp:238] Coordinator attempting to fill missing positions I0713 18:43:56.316547 28250 replica.cpp:388] Replica received explicit promise request from (9558)@172.17.0.8:35418 for position 0 with proposal 2 I0713 18:43:56.354794 28239 hierarchical.cpp:1537] No allocations performed I0713 18:43:56.354898 28239 hierarchical.cpp:1172] Performed allocation for 0 agents in 178033ns I0713 18:43:56.363484 28250 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 46.846904ms I0713 18:43:56.363585 28250 replica.cpp:712] Persisted action at 0 I0713 18:43:56.365622 28250 replica.cpp:537] Replica received write request for position 0 from (9559)@172.17.0.8:35418 I0713 18:43:56.365727 28250 leveldb.cpp:436] Reading position from leveldb took 45172ns I0713 18:43:56.406314 28252 hierarchical.cpp:1537] No allocations performed I0713 18:43:56.406421 28252 hierarchical.cpp:1172] Performed allocation for 0 agents in 177001ns I0713 18:43:56.421867 28250 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 56.06514ms I0713 18:43:56.421968 28250 replica.cpp:712] Persisted action at 0 I0713 18:43:56.423286 28254 replica.cpp:691] Replica received learned notice for position 0 from @0.0.0.0:0 I0713 18:43:56.458665 28250 hierarchical.cpp:1537] No allocations performed I0713 18:43:56.458799 28250 hierarchical.cpp:1172] Performed allocation for 0 agents in 250863ns I0713 18:43:56.470486 28254 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 47.13918ms I0713 18:43:56.470552 28254 replica.cpp:712] Persisted action at 0 I0713 18:43:56.470584 28254 replica.cpp:697] Replica learned NOP action at position 0 I0713 18:43:56.471782 28247 log.cpp:569] Writer started with ending position 0 I0713 18:43:56.475908 28253 leveldb.cpp:436] Reading position from leveldb took 79764ns I0713 18:43:56.479058 28247 registrar.cpp:365] Successfully fetched the registry (0B) in 322.939904ms I0713 18:43:56.479388 28247 registrar.cpp:464] Applied 1 operations in 50643ns; attempting to update the 'registry' I0713 18:43:56.483093 28247 log.cpp:577] Attempting to append 168 bytes to the log I0713 18:43:56.483269 28249 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 1 I0713 18:43:56.484673 28245 replica.cpp:537] Replica received write request for position 1 from (9560)@172.17.0.8:35418 I0713 18:43:56.509866 28239 hierarchical.cpp:1537] No allocations performed I0713 18:43:56.509959 28239 hierarchical.cpp:1172] Performed allocation for 0 agents in 157789ns I0713 18:43:56.512147 28245 leveldb.cpp:341] Persisting action (187 bytes) to leveldb took 27.358967ms I0713 18:43:56.512193 28245 replica.cpp:712] Persisted action at 1 I0713 18:43:56.513278 28250 replica.cpp:691] Replica received learned notice for position 1 from @0.0.0.0:0 I0713 18:43:56.537894 28250 leveldb.cpp:341] Persisting action (189 bytes) to leveldb took 24.568093ms I0713 18:43:56.537973 28250 replica.cpp:712] Persisted action at 1 I0713 18:43:56.538008 28250 replica.cpp:697] Replica learned APPEND action at position 1 I0713 18:43:56.539737 28252 registrar.cpp:509] Successfully updated the 'registry' in 60.26496ms I0713 18:43:56.539949 28252 registrar.cpp:395] Successfully recovered registrar I0713 18:43:56.540544 28252 master.cpp:1781] Recovered 0 agents from the Registry (129B) ; allowing 10mins for agents to re-register I0713 18:43:56.540832 28250 hierarchical.cpp:178] Skipping recovery of hierarchical allocator: nothing to recover I0713 18:43:56.541285 28251 log.cpp:596] Attempting to truncate the log to 1 I0713 18:43:56.541637 28248 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 2 I0713 18:43:56.542763 28240 replica.cpp:537] Replica received write request for position 2 from (9561)@172.17.0.8:35418 I0713 18:43:56.571691 28240 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 28.798341ms I0713 18:43:56.571889 28240 replica.cpp:712] Persisted action at 2 I0713 18:43:56.573218 28240 replica.cpp:691] Replica received learned notice for position 2 from @0.0.0.0:0 I0713 18:43:56.620200 28240 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 46.927607ms I0713 18:43:56.620338 28240 leveldb.cpp:399] Deleting ~1 keys from leveldb took 59898ns I0713 18:43:56.620512 28240 replica.cpp:712] Persisted action at 2 I0713 18:43:56.620630 28240 replica.cpp:697] Replica learned TRUNCATE action at position 2 I0713 18:43:56.624091 28249 hierarchical.cpp:1537] No allocations performed I0713 18:43:56.624169 28249 hierarchical.cpp:1172] Performed allocation for 0 agents in 140818ns I0713 18:43:56.628180 28220 containerizer.cpp:196] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni W0713 18:43:56.629341 28220 backend.cpp:75] Failed to create 'aufs' backend: AufsBackend requires root privileges, but is running as user mesos W0713 18:43:56.629616 28220 backend.cpp:75] Failed to create 'bind' backend: BindBackend requires root privileges I0713 18:43:56.631988 28220 cluster.cpp:432] Creating default 'local' authorizer I0713 18:43:56.635001 28243 slave.cpp:205] Agent started on 251)@172.17.0.8:35418 I0713 18:43:56.635308 28220 resources.cpp:572] Parsing resources as JSON failed: disk:512 Trying semicolon-delimited string format instead I0713 18:43:56.635026 28243 slave.cpp:206] Flags at startup: --acls="""" --appc_simple_discovery_uri_prefix=""http://"" --appc_store_dir=""/tmp/mesos/store/appc"" --authenticate_http=""true"" --authenticatee=""crammd5"" --authentication_backoff_factor=""1secs"" --authorizer=""local"" --cgroups_cpu_enable_pids_and_tids_count=""false"" --cgroups_enable_cfs=""false"" --cgroups_hierarchy=""/sys/fs/cgroup"" --cgroups_limit_swap=""false"" --cgroups_root=""mesos"" --container_disk_watch_interval=""15secs"" --containerizers=""mesos"" --credential=""/tmp/PersistentVolumeEndpointsTest_OfferCreateThenEndpointRemove_gqStXQ/credential"" --default_role=""*"" --disk_watch_interval=""1mins"" --docker=""docker"" --docker_kill_orphans=""true"" --docker_registry=""https://registry-1.docker.io"" --docker_remove_delay=""6hrs"" --docker_socket=""/var/run/docker.sock"" --docker_stop_timeout=""0ns"" --docker_store_dir=""/tmp/mesos/store/docker"" --docker_volume_checkpoint_dir=""/var/run/mesos/isolators/docker/volume"" --enforce_container_disk_quota=""false"" --executor_registration_timeout=""1mins"" --executor_shutdown_grace_period=""5secs"" --fetcher_cache_dir=""/tmp/PersistentVolumeEndpointsTest_OfferCreateThenEndpointRemove_gqStXQ/fetch"" --fetcher_cache_size=""2GB"" --frameworks_home="""" --gc_delay=""1weeks"" --gc_disk_headroom=""0.1"" --hadoop_home="""" --help=""false"" --hostname_lookup=""true"" --http_authenticators=""basic"" --http_command_executor=""false"" --http_credentials=""/tmp/PersistentVolumeEndpointsTest_OfferCreateThenEndpointRemove_gqStXQ/http_credentials"" --image_provisioner_backend=""copy"" --initialize_driver_logging=""true"" --isolation=""posix/cpu,posix/mem"" --launcher_dir=""/mesos/mesos-1.1.0/_build/src"" --logbufsecs=""0"" --logging_level=""INFO"" --oversubscribed_resources_interval=""15secs"" --perf_duration=""10secs"" --perf_interval=""1mins"" --qos_correction_interval_min=""0ns"" --quiet=""false"" --recover=""reconnect"" --recovery_timeout=""15mins"" --registration_backoff_factor=""10ms"" --resources=""disk(*):1024"" --revocable_cpu_low_priority=""true"" --sandbox_directory=""/mnt/mesos/sandbox"" --strict=""true"" --switch_user=""true"" --systemd_enable_support=""true"" --systemd_runtime_directory=""/run/systemd/system"" --version=""false"" --work_dir=""/tmp/PersistentVolumeEndpointsTest_OfferCreateThenEndpointRemove_gqStXQ"" I0713 18:43:56.635709 28243 credentials.hpp:86] Loading credential for authentication from '/tmp/PersistentVolumeEndpointsTest_OfferCreateThenEndpointRemove_gqStXQ/credential' I0713 18:43:56.635892 28243 slave.cpp:343] Agent using credential for: test-principal I0713 18:43:56.635924 28243 credentials.hpp:37] Loading credentials for authentication from '/tmp/PersistentVolumeEndpointsTest_OfferCreateThenEndpointRemove_gqStXQ/http_credentials' I0713 18:43:56.636272 28243 slave.cpp:395] Using default 'basic' HTTP authenticator I0713 18:43:56.636615 28243 resources.cpp:572] Parsing resources as JSON failed: disk(*):1024 Trying semicolon-delimited string format instead I0713 18:43:56.636878 28243 resources.cpp:572] Parsing resources as JSON failed: disk(*):1024 Trying semicolon-delimited string format instead I0713 18:43:56.637318 28243 slave.cpp:594] Agent resources: disk(*):1024; cpus(*):16; mem(*):47270; ports(*):[31000-32000] I0713 18:43:56.637859 28243 slave.cpp:602] Agent attributes: [ ] I0713 18:43:56.638073 28220 sched.cpp:226] Version: 1.1.0 I0713 18:43:56.638074 28243 slave.cpp:607] Agent hostname: 500c3e866abe I0713 18:43:56.640148 28253 sched.cpp:330] New master detected at master@172.17.0.8:35418 I0713 18:43:56.640650 28253 sched.cpp:396] Authenticating with master master@172.17.0.8:35418 I0713 18:43:56.640738 28253 sched.cpp:403] Using default CRAM-MD5 authenticatee I0713 18:43:56.640801 28239 state.cpp:57] Recovering state from '/tmp/PersistentVolumeEndpointsTest_OfferCreateThenEndpointRemove_gqStXQ/meta' I0713 18:43:56.640976 28243 authenticatee.cpp:121] Creating new client SASL connection I0713 18:43:56.641319 28253 status_update_manager.cpp:200] Recovering status update manager I0713 18:43:56.641477 28243 master.cpp:6006] Authenticating scheduler-398078e0-6dae-4c02-8197-af69d9eb230a@172.17.0.8:35418 I0713 18:43:56.641636 28239 authenticator.cpp:414] Starting authentication session for crammd5_authenticatee(554)@172.17.0.8:35418 I0713 18:43:56.641542 28240 containerizer.cpp:522] Recovering containerizer I0713 18:43:56.642201 28239 authenticator.cpp:98] Creating new server SASL connection I0713 18:43:56.642602 28252 authenticatee.cpp:213] Received SASL authentication mechanisms: CRAM-MD5 I0713 18:43:56.642634 28252 authenticatee.cpp:239] Attempting to authenticate with mechanism 'CRAM-MD5' I0713 18:43:56.642714 28239 authenticator.cpp:204] Received SASL authentication start I0713 18:43:56.642792 28239 authenticator.cpp:326] Authentication requires more steps I0713 18:43:56.642882 28239 authenticatee.cpp:259] Received SASL authentication step I0713 18:43:56.642978 28239 authenticator.cpp:232] Received SASL authentication step I0713 18:43:56.643009 28239 auxprop.cpp:109] Request to lookup properties for user: 'test-principal' realm: '500c3e866abe' server FQDN: '500c3e866abe' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0713 18:43:56.643026 28239 auxprop.cpp:181] Looking up auxiliary property '*userPassword' I0713 18:43:56.643064 28239 auxprop.cpp:181] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0713 18:43:56.643091 28239 auxprop.cpp:109] Request to lookup properties for user: 'test-principal' realm: '500c3e866abe' server FQDN: '500c3e866abe' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0713 18:43:56.643107 28239 auxprop.cpp:131] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0713 18:43:56.643117 28239 auxprop.cpp:131] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0713 18:43:56.643136 28239 authenticator.cpp:318] Authentication success I0713 18:43:56.643290 28239 authenticatee.cpp:299] Authentication success I0713 18:43:56.643379 28239 master.cpp:6036] Successfully authenticated principal 'test-principal' at scheduler-398078e0-6dae-4c02-8197-af69d9eb230a@172.17.0.8:35418 I0713 18:43:56.643501 28244 authenticator.cpp:432] Authentication session cleanup for crammd5_authenticatee(554)@172.17.0.8:35418 I0713 18:43:56.643987 28244 sched.cpp:502] Successfully authenticated with master master@172.17.0.8:35418 I0713 18:43:56.644011 28244 sched.cpp:820] Sending SUBSCRIBE call to master@172.17.0.8:35418 I0713 18:43:56.644103 28244 sched.cpp:853] Will retry registration in 809.142674ms if necessary I0713 18:43:56.644287 28244 master.cpp:2550] Received SUBSCRIBE call for framework 'default' at scheduler-398078e0-6dae-4c02-8197-af69d9eb230a@172.17.0.8:35418 I0713 18:43:56.644346 28244 master.cpp:2012] Authorizing framework principal 'test-principal' to receive offers for role 'role1' I0713 18:43:56.644675 28249 provisioner.cpp:253] Provisioner recovery complete I0713 18:43:56.645089 28245 master.cpp:2626] Subscribing framework default with checkpointing disabled and capabilities [ ] I0713 18:43:56.645783 28249 hierarchical.cpp:271] Added framework 2258d072-b0c9-4c40-874c-6cf933ee345a-0000 I0713 18:43:56.645916 28249 hierarchical.cpp:1537] No allocations performed I0713 18:43:56.646000 28249 hierarchical.cpp:1632] No inverse offers to send out! I0713 18:43:56.646083 28248 sched.cpp:743] Framework registered with 2258d072-b0c9-4c40-874c-6cf933ee345a-0000 I0713 18:43:56.646116 28249 hierarchical.cpp:1172] Performed allocation for 0 agents in 249850ns I0713 18:43:56.646163 28248 sched.cpp:757] Scheduler::registered took 21831ns I0713 18:43:56.646317 28246 slave.cpp:4856] Finished recovery I0713 18:43:56.663516 28246 slave.cpp:5028] Querying resource estimator for oversubscribable resources I0713 18:43:56.664029 28254 status_update_manager.cpp:174] Pausing sending status updates I0713 18:43:56.664043 28246 slave.cpp:969] New master detected at master@172.17.0.8:35418 I0713 18:43:56.664567 28246 slave.cpp:1028] Authenticating with master master@172.17.0.8:35418 I0713 18:43:56.665148 28246 slave.cpp:1039] Using default CRAM-MD5 authenticatee I0713 18:43:56.665555 28246 slave.cpp:1001] Detecting new master I0713 18:43:56.665590 28244 authenticatee.cpp:121] Creating new client SASL connection I0713 18:43:56.665889 28246 slave.cpp:5042] Received oversubscribable resources from the resource estimator I0713 18:43:56.666071 28253 master.cpp:6006] Authenticating slave(251)@172.17.0.8:35418 I0713 18:43:56.666316 28244 authenticator.cpp:414] Starting authentication session for crammd5_authenticat...",1 MESOS-5845,"The fetcher can access any local file as root","The Mesos fetcher currently runs as root and does a blind cp+chown of any file:// URI into the task's sandbox, to be owned by the task user. Even if frameworks are restricted from running tasks as root, it seems they can still access root-protected files in this way. We should secure the fetcher so that it has the filesystem permissions of the user its associated task is being run as. One option would be to run the fetcher as the same user that the task will run as.",3 MESOS-5848,"Docker health checks are malformed.","When wrapping the health check command into {{docker exec}}, docker executor erroneously forms the health check command itself. Here is an excerpt from an executor log: {noformat} Launching health check process: docker exec mesos-2070f452-2120-45ad-a8d2-a339d234da41-S0.c27d1b78-d4aa-424b-91fa-1e91576db9b5 sh -c "" true "" /opt/mesosphere/packages/mesos--59d45b30116143cb8d9995ca26f9dec5e93dc710/libexec/mesos/mesos-health-check --executor=(1)@10.0.1.41:40651 --health_check_json={""command"":{""shell"":true,""value"":""docker exec mesos-2070f452-2120-45ad-a8d2-a339d234da41-S0.c27d1b78-d4aa-424b-91fa-1e91576db9b5 sh -c \"" true \""""},""consecutive_failures"":1,""delay_seconds"":0.0,""grace_period_seconds"":10.0,""interval_seconds"":5.0,""timeout_seconds"":10.0} --task_id=testhc.db69c60b-4a75-11e6-b9b0-c254ada9b06d {noformat}",1 MESOS-5850,"Add a test that runs the 'mesos-local' binary","The balloon framework test runs the Mesos master and agent binaries, but we don't seem to have any tests which run the {{mesos-local}} binary at the moment. Such a test should be added, or one of the existing example framework tests could be modified to accomplish this.",2 MESOS-5852,"CMake build needs to generate protobufs before building libmesos","The existing CMake lists place protobufs at the same level as other Mesos sources: https://github.com/apache/mesos/blob/c4cecf9c279c5206faaf996fef0b1810b490b329/src/CMakeLists.txt#L415 This is incorrect, as protobuf changes need to be regenerated before we can build against them. Note: in the autotools build, this is done by compiling protobufs into {{libmesos}}, which then builds {{libmesos_no_3rdparty}}: https://github.com/apache/mesos/blob/c4cecf9c279c5206faaf996fef0b1810b490b329/src/Makefile.am#L1304-L1305",2 MESOS-5855,"Create a 'Disk (not) full' example framework","We need example frameworks for verifying the correct behavior of posix/disk isolator when the disk quota enforcement is in place. One framework for verifying that disk quota enforcement is working and that container gets terminated when it goes beyond disk quota, and another one for verifying that container does not get killed if it stays within its disk quota bounds. ",3 MESOS-5856,"Logrotate ContainerLogger module does not rotate logs when run as root with --switch_user","The logrotate ContainerLogger module runs as the agent's user. In most cases, this is {{root}}. When {{logrotate}} is run as root, there is an additional check the configuration files must pass (because a root {{logrotate}} needs to be secured against non-root modifications to the configuration): https://github.com/logrotate/logrotate/blob/fe80cb51a2571ca35b1a7c8ba0695db5a68feaba/config.c#L807-L815 Log rotation will fail under the following scenario: 1) The agent is run with {{--switch_user}} (default: true) 2) A task is launched with a non-root user specified 3) The logrotate module spawns a few companion processes (as root) and this creates the {{stdout}}, {{stderr}}, {{stdout.logrotate.conf}}, and {{stderr.logrotate.conf}} files (as root). This step races with the next step. 4) The Mesos containerizer and Fetcher will {{chown}} the task's sandbox to the non-root user. Including the files just created. 5) When {{logrotate}} is run, it will skip any non-root configuration files. This means the files are not rotated. ---- Fix: The logrotate module's companion processes should call {{setuid}} and {{setgid}}.",1 MESOS-5863,"Enabling SSL causes fetcher fail to fetch from HTTPS sites.","This is because curl (which fetcher relies on) also relies on some of the environment variables used by libprocess SSL support. For instance, `SSL_CERT_FILE`. If the operator sets `SSL_CERT_FILE` env var for Mesos agent, the fetcher will inherit this env var and cause curl to fail: {noformat} [centos@ip-10-10-0-205 ~]$ SSL_CERT_FILE=/run/dcos/pki/tls/certs/mesos-slave.crt curl https://registry-1.docker.io:443/v2/library/alpine/manifests/latest curl: (60) SSL certificate problem: unable to get local issuer certificate More details here: https://curl.haxx.se/docs/sslcerts.html curl performs SSL certificate verification by default, using a ""bundle"" of Certificate Authority (CA) public keys (CA certs). If the default bundle file isn't adequate, you can specify an alternate file using the --cacert option. If this HTTPS server uses a certificate signed by a CA represented in the bundle, the certificate verification probably failed due to a problem with the certificate (it might be expired, or the name might not match the domain name in the URL). If you'd like to turn off curl's verification of the certificate, use the -k (or --insecure) option. [centos@ip-10-10-0-205 ~]$ curl https://registry-1.docker.io:443/v2/library/alpine/manifests/latest {""errors"":[{""code"":""UNAUTHORIZED"",""message"":""authentication required"",""detail"":[{""Type"":""repository"",""Name"":""library/alpine"",""Action"":""pull""}]}]} {noformat} To solve this problem, we deprecated the existing `SSL_` env variables and used `LIBPROCESS_SSL_` instead. To be backward compatible, we still accept `SSL_` env variables for the time being.",3 MESOS-5864,"Document MESOS_SANDBOX executor env variable.","And we should document the difference with MESOS_DIRECTORY.",2 MESOS-5874,"Only send ShutdownFrameworkMessage to agents associated with framework.","slave.cpp:2079] Asked to shut down framework ${framework} by master@${master} slave.cpp:2094] Cannot shut down unknown framework ${framework} For high framework/churn clusters this saturates agent logs with these messages. When a framework terminates a ShutdownFrameworkMessage is sent to every registered slave in a for loop. This patch proposes sending this message to agents with executors associated with the framework. Also proposed is moving the logline to VLOG(1). ",1 MESOS-5878,"Strict/RegistrarTest.UpdateQuota/0 is flaky","Observed on ASF CI (https://builds.apache.org/job/Mesos/BUILDTOOL=autotools,COMPILER=clang,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-6)/2539/consoleFull). Log file is attached. Note that this might have been uncovered due to the recent removal of {{os::sleep}} from {{Clock::settle}}.",3 MESOS-5879,"cgroups/net_cls isolator causing agent recovery issues","We run with 'cgroups/net_cls' in our isolator list, and when we restart any agent process in a cluster running an experimental custom isolator as well, the agents are unable to recover from checkpoint, because net_cls reports that unknown orphan containers have duplicate net_cls handles. While this is a problem that needs to be solved (probably by fixing our custom isolator), it's also a problem that the net_cls isolator fails recovery just for duplicate handles in cgroups that it is literally about to unconditionally destroy during recovery. Can this be fixed?",1 MESOS-5886,"FUTURE_DISPATCH may react on irrelevant dispatch.","[{{FUTURE_DISPATCH}}|https://github.com/apache/mesos/blob/e8ebbe5fe4189ef7ab046da2276a6abee41deeb2/3rdparty/libprocess/include/process/gmock.hpp#L50] uses [{{DispatchMatcher}}|https://github.com/apache/mesos/blob/e8ebbe5fe4189ef7ab046da2276a6abee41deeb2/3rdparty/libprocess/include/process/gmock.hpp#L350] to figure out whether a processed {{DispatchEvent}} is the same the user is waiting for. However, comparing {{std::type_info}} of function pointers is not enough: different class methods with same signatures will be matched. Here is the test that proves this: {noformat} class DispatchProcess : public Process { public: MOCK_METHOD0(func0, void()); MOCK_METHOD1(func1, bool(bool)); MOCK_METHOD1(func1_same_but_different, bool(bool)); MOCK_METHOD1(func2, Future(bool)); MOCK_METHOD1(func3, int(int)); MOCK_METHOD2(func4, Future(bool, int)); }; {noformat} {noformat} TEST(ProcessTest, DispatchMatch) { DispatchProcess process; PID pid = spawn(&process); Future future = FUTURE_DISPATCH( pid, &DispatchProcess::func1_same_but_different); EXPECT_CALL(process, func1(_)) .WillOnce(ReturnArg<0>()); dispatch(pid, &DispatchProcess::func1, true); AWAIT_READY(future); terminate(pid); wait(pid); } {noformat} The test passes: {noformat} [ RUN ] ProcessTest.DispatchMatch [ OK ] ProcessTest.DispatchMatch (1 ms) {noformat} This change was introduced in https://reviews.apache.org/r/28052/.",5 MESOS-5887,"Enhance DispatchEvent to include demangled method name.","Currently, [{{DispatchEvent}}|https://github.com/apache/mesos/blob/e8ebbe5fe4189ef7ab046da2276a6abee41deeb2/3rdparty/libprocess/include/process/event.hpp#L148] does not include any user-friendly information about the actual method being dispatched. This can be helpful in order to simplify triaging and debugging, e.g., using {{\_\_processes\_\_}} endpoint. Now we print the [event type only|https://github.com/apache/mesos/blob/e8ebbe5fe4189ef7ab046da2276a6abee41deeb2/3rdparty/libprocess/src/process.cpp#L3198-L3203].",5 MESOS-5891,"/help endpoint does not set Content-Type to HTML","This change added a default {{Content-Type}} to all responses: https://github.com/apache/mesos/commit/b2c5d91addbae609af3791f128c53fb3a26c7d53 Unfortunately, this changed the {{/help}} endpoint from no {{Content-Type}} to {{text/plain}}. For a browser to render this page correctly, we need an HTML content type.",1 MESOS-5901,"Make the command executor unversioned","Currently, the command executor in {{src/launcher/executor.cpp}} is in the {{v1}} namespace. As referenced in the versioning design doc, we had agreed to keep the mesos internal code in the unversioned namespace and use {{evolve/devolve}} helpers for requests/responses. Following this pattern, we should bring the command executor in the {{mesos::internal}} namespace.",2 MESOS-5907,"ExamplesTest.DiskFullFramework fails on Arch","This test fails consistently on recent Arch linux, running in a VM.",1 MESOS-5909,"Stout ""OsTest.User"" test can fail on some systems","Libc call {{getgrouplist}} doesn't return the {{gid}} list in a sorted manner (in my case, it's returning ""471 100"") ... whereas {{id -G}} return a sorted list (""100 471"" in my case) causing the validation inside the loop to fail. We should sort both lists before comparing the values.",2 MESOS-5923,"Ubuntu 14.04 LTS GPU Isolator ""/run"" directory is noexec","In Ubuntu 14.04 LTS the mount for /run directory is noexec. It affect the {{/var/run/mesos/isolators/gpu/nvidia_352.63/bin}} directory which mesos GPU isolators depended on. {{bill@billz:/var/run$ mount | grep noexec proc on /proc type proc (rw,noexec,nosuid,nodev) sysfs on /sys type sysfs (rw,noexec,nosuid,nodev) devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620) tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)}} The /var/run is link to /run: {{bill@billz:/var$ ll total 52 drwxr-xr-x 13 root root 4096 May 5 20:00 ./ drwxr-xr-x 27 root root 4096 Jul 14 17:29 ../ lrwxrwxrwx 1 root root 9 May 5 19:50 lock -> /run/lock/ drwxrwxr-x 19 root syslog 4096 Jul 28 08:00 log/ drwxr-xr-x 2 root root 4096 Aug 4 2015 opt/ lrwxrwxrwx 1 root root 4 May 5 19:50 run -> /run/}} Current the work around is mount without noexec: {{sudo mount -o remount,exec /run}}",3 MESOS-5924,"Fetcher may print logging error when run as unprivileged user","Now that the fetcher performs its fetching as the framework's/task's user when one is specified, it prints an error message when its user does not have permissions to create the default glog logging file: {code} I0728 17:29:39.337363 25464 logging.cpp:194] INFO level logging started! I0728 17:29:39.337628 25464 fetcher.cpp:498] Fetcher Info: {""cache_directory"":""\/tmp\/mesos\/fetch\/slaves\/57c21e0d-487d-4668-8da6-005f13401598-S0\/centos"",""items"":[{""action"":""BYPASS_CACHE"",""uri"":{""cache"":false,""executable"":false,""extract"":true,""value"":""file:\/\/\/nonrootdir\/nonroottest""}}],""sandbox_directory"":""\/var\/lib\/mesos\/slave\/slaves\/57c21e0d-487d-4668-8da6-005f13401598-S0\/frameworks\/57c21e0d-487d-4668-8da6-005f13401598-0001\/executors\/non-root-success.dc9e820d-54e8-11e6-b082-70b3d5120001\/runs\/3cae229f-8c6d-439e-8116-78bf06ac3731"",""user"":""centos""} I0728 17:29:39.339738 25464 fetcher.cpp:409] Fetching URI 'file:///nonrootdir/nonroottest' I0728 17:29:39.339758 25464 fetcher.cpp:250] Fetching directly into the sandbox directory I0728 17:29:39.339777 25464 fetcher.cpp:187] Fetching URI 'file:///nonrootdir/nonroottest' I0728 17:29:39.339792 25464 fetcher.cpp:167] Copying resource with command:cp '/nonrootdir/nonroottest' '/var/lib/mesos/slave/slaves/57c21e0d-487d-4668-8da6-005f13401598-S0/frameworks/57c21e0d-487d-4668-8da6-005f13401598-0001/executors/non-root-success.dc9e820d-54e8-11e6-b082-70b3d5120001/runs/3cae229f-8c6d-439e-8116-78bf06ac3731/nonroottest' Could not create logging file: Permission denied COULD NOT CREATE A LOGGINGFILE 20160728-172939.25464!W0728 17:29:39.342435 25464 fetcher.cpp:289] Copying instead of extracting resource from URI with 'extract' flag, because it does not seem to be an archive: file:///nonrootdir/nonroottest I0728 17:29:39.342511 25464 fetcher.cpp:547] Fetched 'file:///nonrootdir/nonroottest' to '/var/lib/mesos/slave/slaves/57c21e0d-487d-4668-8da6-005f13401598-S0/frameworks/57c21e0d-487d-4668-8da6-005f13401598-0001/executors/non-root-success.dc9e820d-54e8-11e6-b082-70b3d5120001/runs/3cae229f-8c6d-439e-8116-78bf06ac3731/nonroottest' + /opt/mesosphere/packages/mesos--6c64154890d6c22595d7d047193773cda8de6a7c/libexec/mesos/mesos-containerizer mount --help=false --operation=make-rslave --path=/ I0728 17:29:39.603394 25433 exec.cpp:161] Version: 1.0.0 I0728 17:29:39.699053 25490 exec.cpp:236] Executor registered on agent 57c21e0d-487d-4668-8da6-005f13401598-S0 {code} It seems that the fetcher binary is unable to create the default logging file due to a permissions issue. However, when I set the relevant {{GLOG_logtostderr=true}} flag, which should prevent the creation of this default file, it had no effect. Note that the fetcher's logging output was piped to stdout/stderr as expected, and the task ran and completed successfully, so these errors do not seem to affect the execution of the task.",2 MESOS-5928,"Agent's '--version' flag doesn't work","With the removal of the agent's default {{work_dir}}, the {{--version}} flag no longer works. Instead, the agent complains about the lack of a {{work_dir}} and prints the usage instructions.",1 MESOS-5934,"Enable the upgrade test script to run multiple masters/agents","The script designed to test upgrades between different Mesos versions, {{support/test-upgrade.py}}, should be improved to test upgrades with multiple masters and agents.",3 MESOS-5935,"Add upgrade testing to the ASF CI","We should add execution of the {{support/test-upgrade.py}} script to the ASF CI runs. This will require having a build of a previous Mesos version to run against latest master; perhaps we could cache builds of the last stable release somewhere, which could be fetched and executed against CI builds.",5 MESOS-5943,"Incremental http parsing of URLs leads to decoder error","When requests arrive to the decoder in pieces (e.g. {{mes}} followed by a separate chunk of {{os.apache.org}}) the http parser is not able to handle this case if the split is within the URL component. This causes the decoder to error out, and can lead to connection invalidation. The scheduler driver is susceptible to this.",3 MESOS-5944,"Remove `O_SYNC` from StatusUpdateManager logs","Currently the {{StatusUpdateManager}} uses {{O_SYNC}} to flush status updates to disk. We don't need to use {{O_SYNC}} because we only read this file if the host did not crash. {{os::write}} success implies the kernel will have flushed our data to the page cache. This is sufficient for the recovery scenarios we use this data for.",1 MESOS-5945,"NvidiaVolume::create() should check for root before creating volume","Without root, we cannot create the nvidia volume in {{/var/run/mesos}} or mount a {{tmpfs}} in cases where we need to override the {{noexec}} on the current file system.",2 MESOS-5954,"Docker executor does not use HealthChecker library.","https://github.com/apache/mesos/commit/1556d9a3a02de4e8a90b5b64d268754f95b12d77 refactored health checks into a library. Command executor uses the library instead of the ""mesos-health-check"" binary, docker executor should do the same for consistency.",3 MESOS-5955,"The ""mesos-health-check"" binary is not used anymore.","MESOS-5727 and MESOS-5954 refactored the health check code into the {{HealthChecker}} library, hence the ""mesos-health-check"" binary became unused. While the command and docker executors could just use the library to avoid the subprocess complexity, we may want to consider keeping a binary version that ships with the installation, because the intention of the binary was to allow other executors to re-use our implementation. On the other side, this binary is ill suited to this since it uses libprocess message passing, so if we do not have code that requires the binary it seems ok to remove it for now. Custom executors may use the {{HealthChecker}} library directly, it is not much more complex than using the binary.",3 MESOS-5959,"All non-root tests fail on GPU machine","A recent addition to ensure that {{NvidiaVolume::create()}} ran as root broke all non-root tests on GPU machines. The reason is that we unconditionally create this volume so long as we detect {{nvml.isAvailable()}} which will fail now that we are only allowed to create this volume if we have root permissions. We should fix this by adding the proper conditions to determine when / if we should create this volume based on some combination of {{\-\-containerizer}} and {{\-\-isolation}} flags.",2