Published May 21, 2024
| Version v1
Dataset
Open
PhishDecloaker Datasets
Authors/Creators
Description
This record contains datasets part of the paper: "PhishDecloaker: Detecting CAPTCHA-cloaked Phishing Websites via Hybrid
Vision-based Interactive Models", published at USENIX Security'24.
Phishing Kit Dataset
- Section: 2
- Description: For empirical study.
- Contents: 100 defanged PHP phishing kits representing the following list of brands
1. Microsoft2. Banco de Oro3. Microsoft OneDrive4. Deutsche Kreditbank5. Adobe Acrobat6. N267. Absa Group8. DHL9. Microsoft10. Correos11. Kempinski Summerland Hotel & Resort Beirut12. Vantage West Credit Union13. NetFlix14. Agencia Tributaria15. Square16. Chronopost17. PayPal18. American Express19. Allegro20. LinkedIn21. virtru22. Citibank23. AOL24. Credit Agricole25. Mercado Pago26. Université de Pau et des Pays de l'Adour (UPPA)27. Fifth Third Banki28. Columbia Bank29. Alibaba Mail30. Microsoft OneDrive31. Intesa Sanpaolo32. Santander33. America First Credit Union34. Barclays35. Interac36. USPS37. Wells Fargo38. Yahoo39. XFINITY40. Berliner Sparkasse41. OneDrive42. Standard Bank43. Wells Fargo44. aruba.it45. Bancolombia46. Caisse d’Epargne47. DubaiPay48. Chase Bank49. M&T Bank50. Postmaster51. Volksbanken Raiffeisenbanken52. Facebook53. Huntington Bank54. Commonwealth Bank of Australia55. Orange56. shopify57. Google Drive58. WalletConnect59. Meritrust Credit Union60. Credit Agricole61. Desjardins62. Postbank63. Dropbox64. DocuSign65. dpdgroup66. L'Assurance Maladie67. Adobe Acrobat68. Global Sources69. Microsoft Excel70. SFR71. FedEx72. Citibank73. Royal Credit Union74. GoDaddy75. ADP76. International Card Services77. Israeli Post78. UNI Financial Cooperation79. TD Bank80. ATB Mobile81. HSBC82. Bank of Montreal83. RBC Royal Bank84. IONOS85. AlaskaUSA Federal Credit Union86. French Government87. UOL SAC88. Banco Itaú Paraguay89. Amazon90. Apple91. AT&T92. Australian Government93. Bank of America94. BNP Paribas95. eBay96. ING Group97. Instagram98. MetaMask99. SingTel100. Société Générale
Landscape Dataset
- Section: 4.3
- Description: For training the rotation CAPTCHA solver model.
- Contents: 7,268 natural and man-made landscape images (320×180).
- Format: JPEG images.
CAPTCHA Detection Dataset
- Section: 5.2.1
- Description: For training the CAPTCHA detection model.
- Contents: 19,680 webpage screenshots (1920×1080), 10,680 with annotated CAPTCHA bounding boxes, 9,000 without.
- Format: PNG images with annotations in PASCAL VOC and COCO format.All bounding boxes are labeled as the "CAPTCHA" class (no CAPTCHA type categorization).
CAPTCHA Recognition Dataset
- Section: 5.2.2
- Description: For training the CAPTCHA recognition model
- Contents: 6,612 CAPTCHA images distributed across 38 classes.
- Format: PNG images with their corresponding class labels in CSV
CAPTCHA classes:1. baidu_slide_rotate2. dingxiang_audio3. dingxiang_click_area4. dingxiang_click_difference5. dingxiang_click_font6. dingxiang_click_icon7. dingxiang_click_vr8. dingxiang_click_word9. dingxiang_drag10. dingxiang_slide_puzzle11. dingxiang_slide_puzzle212. dingxiang_slide_rotate13. geetest_checkbox14. geetest_click_icon15. geetest_click_phrase16. geetest_click_word17. geetest_game_playing18. geetest_game_playing219. geetest_select20. geetest_slide_puzzle21. hcaptcha22. hcaptcha_checkbox23. netease_click_icon24. netease_click_phrase25. netease_click_vr26. netease_click_word27. netease_drag28. netease_slide29. press_and_hold30. recaptchav231. recaptchav2_checkbox32. tencent_slide33. text_134. text_235. text_336. text_437. text_538. text_6
CAPTCHA Open-set Dataset
- Section: 5.2.2
- Description: For testing the CAPTCHA detection and recognition pipeline.
- Contents: 1,100 webpage screenshots (1920×1080), all of which have annotated CAPTCHA classes spanning 11 different categories.
- Format: PNG CAPTCHA and screenshot images with their corresponding class labels in CSV
CAPTCHA classes:1. arkose_select_22. capycaptcha_drag3. dicecaptcha_qa4. funcaptcha_select5. funcaptcha_select_26. funcaptcha_select_37. funcaptcha_select_48. funcaptcha_select_59. funcaptcha_select_610. keycaptcha_drag11. mtcaptcha_text
Ablation Dataset
- Section: 5.4
- Description: For training the CAPTCHA recognition model
- Contents: 722 webpage screenshots (1920×1080), 622 with CAPTCHAs spanning 38 classes, 100 without.
- Format: PNG images with their corresponding bounding box and class labels in CSV. Class IDs 0-37 can be directly mapped to class names in CAPTCHA recognition dataset. Class ID 38 are samples without CAPTCHAs.
Files
ablation-dataset.zip
Files
(5.7 GB)
| Name | Size | |
|---|---|---|
|
md5:dea75d2b452736511e044e034df49967
|
342.7 MB | Preview Download |
|
md5:b40a4b9c2774a03ee55f0a0d6926a611
|
4.1 GB | Preview Download |
|
md5:9d73960a6f113b49edb5780d1f19fc3d
|
595.1 MB | Preview Download |
|
md5:c4c6042f01a1da2c41a77420d1976bdc
|
322.7 MB | Preview Download |
|
md5:2f17ea845a079e9a2a3164930601506e
|
79.3 MB | Preview Download |
|
md5:8395ddea3bf84469fedf785941660e3a
|
172.6 MB | Preview Download |
Additional details
Software
- Programming language
- Python , JavaScript , PHP
- Development Status
- Active