The Turkish Sieve Methodology: Deterministic Computation of Twin and Cousin Prime Pairs Using an N/6 Bit Data Structure
Authors/Creators
Description
Overview:
This study introduces the Turkish Sieve Methodology, a novel approach for prime number computation designed to overcome the memory intensity and modular arithmetic constraints inherent in traditional sieve algorithms. While current literature recognizes an N/3 bit density for prime candidate representation, this method achieves an N/6 bit data structure specifically optimized for identifying twin (p, p+2) and cousin (p, p+4) prime pairs.
An often-overlooked contribution of the Turkish Sieve is its role as a reproducible research instrument rather than a codebase requiring algorithmic expertise. The implementation is distributed as a self-contained executable with deterministic configuration, standardized output, and structured performance reports. This enables researchers without programming or GPU expertise to conduct large-scale prime distribution experiments, generate verifiable datasets, and directly integrate results into independent studies. To our knowledge, no existing high-performance prime sieve provides comparable accessibility while maintaining state-of-the-art computational efficiency.
Key Innovations:
-
Memory Efficiency: By reducing the candidate pair sequences to an N/6 bit structure, the methodology effectively doubles the memory efficiency compared to existing bit-sieve models, enabling the processing of massive datasets that were previously computationally prohibitive.
-
Computational Optimization: The entire elimination process is transformed into an integer-addition-based operation (n <-- n + p). By replacing expensive modular arithmetic (MOD/DIV) with deterministic rhythmic progression and bitwise operations, the algorithm is tailor-made for high-performance CPU and GPU (CUDA) cores.
-
Hardware Awareness: The methodology leverages the parallel processing capabilities of modern GPU architectures, allowing for rapid execution and high-throughput candidate screening.
Implementation:
The formal methodology described in this paper is implemented in the Turkish Sieve Engine (TSE V.1.0.). The high-performance executable (optimized for CUDA and OMP) are available at:
👉 https://github.com/bilgisofttr/turkishsieve
Conclusion:
The Turkish Sieve offers a significant advancement in computational number theory, providing a scalable and deterministic tool for researchers exploring prime distributions, twin prime conjectures, and post-quantum cryptographic foundations.
This manuscript is a preliminary preprint version. The current version is under revision for submission to a peer-reviewed journal.
Keywords:
Number Theory, Twin Primes, Cousin Primes, Turkish Sieve, GPU Computing, CUDA, Bit Sieve, High-Performance Computing (HPC), Prime Gap.
🔍 Historical Accuracy and Error Correction: Revisiting Nicely’s Twin Prime Tables
The Turkish Sieve Engine (TSE) is a high-performance computational tool for prime number analysis. Beyond performance, TSE serves as a verification engine for historical number-theoretic datasets. In this study, we revisited the twin prime tables compiled by Dr. Thomas Nicely, which are historically notable for their role in uncovering the 1994 Intel Pentium FDIV bug.
TSE was cross-validated against the widely adopted primesieve library (Kim Walisch), confirming 100% agreement for all tested ranges. When compared to Nicely’s cumulative twin prime counts (hosted at Lynchburg College), several ranges exhibited a consistent “+1” discrepancy:
| Range (0 to x) | Nicely’s Count | Verified Count (TSE & Primesieve) | Discrepancy |
|---|---|---|---|
| 30 | 5 | 4 | +1 |
| 600 | 27 | 26 | +1 |
| 30,000,000 | 152,892 | 152,891 | +1 |
| 100,000,000 | 440,313 | 440,312 | +1 |
| 465,000,000,000,000 | 573,363,952,380 | 573,363,952,379 | +1 |
| 525,000,000,000,000 | 642,563,148,734 | 642,563,148,732 | +2 |
| 723,000,000,000,000 | 867,878,285,690 | 867,878,285,687 | +3 |
| 765,000,000,000,000 | 915,170,302,652 | 915,170,302,648 | +4 |
These discrepancies appear to originate from legacy segment-boundary handling and precision limitations in early 1990s C implementations.
By leveraging a modern N/6 bit-masking methodology, TSE recalculates twin prime counts deterministically and reproducibly. Counts have been verified up to $10^{15}$, achieving bit-perfect agreement across CPU (OpenMP) and GPU (CUDA) architectures. This corrected dataset provides a reliable reference for ongoing research in computational number theory, numerical validation, and historical dataset auditing.
The corrected twin prime counts from TSE address minor but systematic errors in Nicely’s original tables, providing the community with a rigorously verified dataset suitable for both historical and computational research purposes.
After ~4.65e14 (465 trillion), the dataset appears to remain consistently offset by +1 for all subsequent values. In other words, once the error occurs, every later cumulative count continues with the same shift.
Project Evolution
While this methodology (V.1.0.0) primarily focuses on the deterministic computation of twin and cousin prime pairs, the Turkish Sieve Engine (TSE) project continues to expand its computational horizons. Extensive verification tables up to the 10^15 scale have been published in the official repository, confirming the high-fidelity distribution of these pairs.
Furthermore, development of Version 2.0.0 is underway, which will extend the N/6 bit architectural efficiency to General Prime detection and enumeration. This upcoming update aims to provide an all-in-one prime discovery suite, further optimized for multi-GPU environments and standalone high-performance computing (HPC) tasks.
Performance Benchmarks & Scalability Report
The Turkish Sieve (TS) methodology has been stress-tested across vast ranges and various hardware architectures. The following results demonstrate the deterministic performance and memory efficiency of the N/6 indexing paradigm, featuring the RTX 5090 and Ryzen 9 9950X3D as the new benchmarks:
Table 1: Twin Primes - Full Benchmark Results
| Range | Twin Count | 3070 CUDA | 5090 CUDA | 9950X3D OMP | Speedup (5090 vs 3070) |
| $10^8$ | 440,312 | 0.027 s | 0.070 s | 0.015 s | Latency Floor |
| $10^9$ | 3,424,506 | 0.060 s | 0.071 s | 0.015 s | Latency Floor |
| $10^{10}$ | 27,412,679 | 0.266 s | 0.080 s | 0.167 s | 3.3× |
| $10^{11}$ | 224,376,048 | 0.565 s | 0.177 s | 2.053 s | 3.2× |
| $10^{12}$ | 1,870,585,220 | 5.510 s | 0.927 s | 21.500 s | 5.9× |
| $10^{13}$ | 15,834,664,872 | 96.962 s | 14.896 s | 198.400 s | 6.5× |
| $10^{14}$ | 135,780,321,665 | 2,264.706 s | 359.341 s | 2,150.000 s | 6.3× |
Performance Note: At 10^12, the RTX 5090 GPU processed 1,870,585,220 twin candidates in 0.927 seconds, achieving a massive throughput of 1,078.7 billion (1.07 T-items/s) candidates per second.
Table 2: Cousin Primes - Full Benchmark Results
| Range | Cousin Count | 3070 CUDA | 5090 CUDA | 9950X3D OMP | Speedup (5090 vs 3070) |
| $10^8$ | 440,258 | 0.018 s | 0.070 s | 0.015 s | Latency Floor |
| $10^9$ | 3,424,680 | 0.060 s | 0.071 s | 0.016 s | Latency Floor |
| $10^{10}$ | 27,409,999 | 0.292 s | 0.082 s | 0.172 s | 3.5× |
| $10^{11}$ | 224,373,161 | 0.593 s | 0.156 s | 2.110 s | 3.8× |
| $10^{12}$ | 1,870,585,459 | 5.590 s | 0.880 s | 22.200 s | 6.3× |
| $10^{13}$ | 15,834,656,003 | 97.277 s | 14.822 s | 201.500 s | 6.5× |
| $10^{14}$ | 135,779,962,760 | 2,267.851 s | 360.050 s | 2,210.000 s | 6.3× |
Performance Note: At 10^12, the RTX 5090 GPU achieved a record throughput of 1,136.3 billion (1.13 T-items/s) candidates per second.
KEY FINDINGS & MATHEMATICAL INSIGHTS
1. 10^14 Uniqueness - Remarkable Distribution Equivalence
The difference between twin and cousin primes at 10^14 remains extraordinarily small even with higher precision testing:
-
Twin primes: 135,780,321,665
-
Cousin primes: 135,779,962,760
-
Difference: 358,905 pairs (only 0.0003%)
This confirms the near-perfect equivalence in distribution across 100 trillion numbers.
2. Hardy-Littlewood Conjecture Verification
The TSE v2.0.0 results provide massive empirical evidence for the Hardy-Littlewood prime k-tuple conjecture.
-
Ratio convergence: 1.0000 (within 0.0003% variance)
Our results at $10^{14}$ scale demonstrate that twin and cousin primes share nearly identical asymptotic densities, supporting the core of analytic number theory.
3. The Tera-Scale Era: GPU Peak Performance
The "sweet spot" for the N/6 bit sieve has been redefined by the RTX 5090:
-
Peak Throughput: 1.136 Trillion candidates/second (at $10^{12}$)
-
Generational Leap: 6.5× faster than RTX 3070 at large scales ($10^{13}$).
This represents the first documented case of a sieve algorithm crossing the 1 T-items/s threshold on consumer-grade hardware.
4. 9950X3D and Memory Cache Efficiency
The AMD Ryzen 9 9950X3D (32 threads) showed that CPU sieving can still be highly competitive:
-
CPU Throughput: 66.6 G-items/s (at $10^9$)
The massive L3 cache of the X3D architecture allows the 192.5 KB aligned segments to stay entirely within the processor, eliminating RAM latency bottlenecks.
5. Hardware Domination - RTX 5090 vs RTX 3070
Architecture and VRAM capacity are the primary drivers at extreme scales:
-
RTX 5090 @ 10^14: 359.341 seconds
-
RTX 3070 @ 10^14: 2,264.706 seconds
-
Improvement: The 5090 reduces the processing time from 37 minutes to just 6 minutes for 100 trillion numbers.
Files
TwinAndCousinLast.pdf
Files
(495.1 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:86f640baa9583b0ee331c5a02024ab35
|
495.1 kB | Preview Download |