Speaker diarization is the task of determining "who spoke when" in a multispeaker environment and is an essential component of many speech recognition tasks processing large volumes of data (e.g., police body cam recordings, large corpora of meetings). While the state-of-the-art diarization methods work remarkably well on the cases that have been considered thus far (e.g., CallHome or two-person callcenter communications), this success does not transfer to more challenging corpora such as "speech in the wild" (YouTube videos, recordings from wearables, etc). DIHARD is the first of a series of challenges aiming to break this last barrier.