Japanese Expressions Dataset from Human Rights Infringement on Internet
Description
Japanese Expressions Dataset from Human Rights Infringement on Internet
これは、言語処理学会2023において”権利侵害と不快さの間:日本語人権侵害表現データセット”の発表で用いた、誹謗中傷等の表現を集めたデータセットです。
仕様は変更される可能性があります
Release Date
Version 0.1 2023/03/09
Version 0.2 2023/06/01
1.調査対象
民事事件
- 発信者情報開示請求事件
- 損害賠償等請求事件
インターネット上の投稿による人権侵害を争った事件よりこのデータセットは作成されています。
Annotation Scheme
1: Text in Dispute
Text in Dispute is transcribed from the list of articles submitted or the facts and reasons section of the court cases.
This text is what the plaintiff alleges is infringing his or her rights.
For example, we use the tag [Plaintiff] for descriptions related to plaintiffs, the tag [Defandant] for descriptions related to defandantss and [Third Party] for any mention of third parties.
The tag [Other] is designated for texts involving anonymized place names and any other content not covered by the preceding tags.
If distinguishing between tags proves challenging, we advise the use of [Other].
Should a URL be present in the text, it is substituted with the [URL] tag.
2: Context Utilized in Adjudication
In some cases, the history of the conversation and previous postings are taken into account in determining whether or not an infringement has occurred.
Such text is transcribed from the fact-finding and judgment portions of the court case.
Contextual information that can be used for tasks other than proper understanding of the issue and accurate learning and classification.
Anonymization is applied in the same way as for [1: Text in Dispute].
3: Allegedly Infringed Right and Judgement on the Infringement Allegation
This category provides a label indicating the types of personal rights infringement complained of by the plaintiff against Text in Dispute and whether or not the infringement of that right was recognized by the trial court.
If there are more than two types of infringement of rights complained of by the plaintiff against one text, up to two labels are given.
3-1a: The types of Allegedly Infringed Right 1
Labels are assigned from the above types of personal rights, the right of reputation, the sense of honor, the right of peaceful private life, and other personal rights, from the complaint of infringement of rights by the plaintiff.
3-1b: Judgement on the Infringement Allegation 1
This category indicates whether the trial court granted the plaintiff's complaint against the text under consideration.
The annotator reads the result of the Judgement for each case from the court's judgment in the case.
If we find that the court has admitted the plaintiff's claim, we assign "1" label to the case concerning the infringement of personal rights.
If we find that the court has dismissed the plaintiff's claim, we assign "0" label to those cases that are not infringing.
We assign the label "UNJUDGE" to cases where we cannot find the court's Judgement.
This appears in cases where the plaintiff alleges multiple infringements and only one of the claims is adjudicated, or where the complaint is found to have been dismissed based on factors other than a determination of infringement by textual expression.
3-2a & b:The types of Allegedly Infringed Right and Judgement on the Infringement Allegation 2
If two infringements are complained of, the annotator assigns a label for the second right.
4: Index Information of Court Cases
This section contains information about the cited court case.
4-1: Case Number
This is the case number of the court case and the court in which the case was argued.
4-2: Case Name
This is the case name of the court case and the court in which the case was argued.
Either a case requesting for disclosure of sender's information. or a case requesting compensation for damages is applicable.
4-3: Bibliography
This is the bibliography of the cited case.
4-4: Article Number
This is the number assigned to Text in Dispute in the court case.
4-5: Online Platform
If the online platform where Text in Dispute is explicitly indicated in the court case, it is stated.
Since the form of the text and context differs from platform to platform, considering the uniqueness of each service is an important factor to be taken into account.
Files
Files
(699.5 kB)
Name | Size | Download all |
---|---|---|
md5:eacfe6409548c9505dcebe577126febe
|
419.4 kB | Download |
md5:7276716845a1af6c473384cfbeffcc69
|
280.1 kB | Download |