The Turkish Sieve Methodology: Deterministic Computation of Twin and Cousin Prime Pairs Using an N/6 Bit Data Structure
Authors/Creators
Description
Overview:
This study introduces the Turkish Sieve Methodology, a novel approach for prime number computation designed to overcome the memory intensity and modular arithmetic constraints inherent in traditional sieve algorithms. While current literature recognizes an N/3 bit density for prime candidate representation, this method achieves an N/6 bit data structure specifically optimized for identifying twin (p, p+2) and cousin (p, p+4) prime pairs.
An often-overlooked contribution of the Turkish Sieve is its role as a reproducible research instrument rather than a codebase requiring algorithmic expertise. The implementation is distributed as a self-contained executable with deterministic configuration, standardized output, and structured performance reports. This enables researchers without programming or GPU expertise to conduct large-scale prime distribution experiments, generate verifiable datasets, and directly integrate results into independent studies. To our knowledge, no existing high-performance prime sieve provides comparable accessibility while maintaining state-of-the-art computational efficiency.
Key Innovations:
-
Memory Efficiency: By reducing the candidate pair sequences to an N/6 bit structure, the methodology effectively doubles the memory efficiency compared to existing bit-sieve models, enabling the processing of massive datasets that were previously computationally prohibitive.
-
Computational Optimization: The entire elimination process is transformed into an integer-addition-based operation (n <-- n + p). By replacing expensive modular arithmetic (MOD/DIV) with deterministic rhythmic progression and bitwise operations, the algorithm is tailor-made for high-performance CPU and GPU (CUDA) cores.
-
Hardware Awareness: The methodology leverages the parallel processing capabilities of modern GPU architectures, allowing for rapid execution and high-throughput candidate screening.
Implementation:
The formal methodology described in this paper is implemented in the Turkish Sieve Engine (TSE V.1.0.). The high-performance executable (optimized for CUDA and OMP) are available at:
👉 https://github.com/bilgisofttr/turkishsieve
Conclusion:
The Turkish Sieve offers a significant advancement in computational number theory, providing a scalable and deterministic tool for researchers exploring prime distributions, twin prime conjectures, and post-quantum cryptographic foundations.
This manuscript is a preliminary preprint version. The current version is under revision for submission to a peer-reviewed journal.
Keywords:
Number Theory, Twin Primes, Cousin Primes, Turkish Sieve, GPU Computing, CUDA, Bit Sieve, High-Performance Computing (HPC), Prime Gap.
Performance Benchmarks & Scalability Report
The Turkish Sieve (TS) methodology has been stress-tested across vast ranges and various hardware architectures. The following results demonstrate the deterministic performance and memory efficiency of the N/6 indexing paradigm, featuring the RTX 5090 and Ryzen 9 9950X3D as the new benchmarks:
Table 1: Twin Primes - Full Benchmark Results
| Range | Twin Count | 3070 CUDA | 5090 CUDA | 9950X3D OMP | Speedup (5090 vs 3070) |
| $10^8$ | 440,312 | 0.027 s | 0.070 s | 0.015 s | Latency Floor |
| $10^9$ | 3,424,506 | 0.060 s | 0.071 s | 0.015 s | Latency Floor |
| $10^{10}$ | 27,412,679 | 0.266 s | 0.080 s | 0.167 s | 3.3× |
| $10^{11}$ | 224,376,048 | 0.565 s | 0.177 s | 2.053 s | 3.2× |
| $10^{12}$ | 1,870,585,220 | 5.510 s | 0.927 s | 21.500 s | 5.9× |
| $10^{13}$ | 15,834,664,872 | 96.962 s | 14.896 s | 198.400 s | 6.5× |
| $10^{14}$ | 135,780,321,665 | 2,264.706 s | 359.341 s | 2,150.000 s | 6.3× |
Performance Note: At 10^12, the RTX 5090 GPU processed 1,870,585,220 twin candidates in 0.927 seconds, achieving a massive throughput of 1,078.7 billion (1.07 T-items/s) candidates per second.
Table 2: Cousin Primes - Full Benchmark Results
| Range | Cousin Count | 3070 CUDA | 5090 CUDA | 9950X3D OMP | Speedup (5090 vs 3070) |
| $10^8$ | 440,258 | 0.018 s | 0.070 s | 0.015 s | Latency Floor |
| $10^9$ | 3,424,680 | 0.060 s | 0.071 s | 0.016 s | Latency Floor |
| $10^{10}$ | 27,409,999 | 0.292 s | 0.082 s | 0.172 s | 3.5× |
| $10^{11}$ | 224,373,161 | 0.593 s | 0.156 s | 2.110 s | 3.8× |
| $10^{12}$ | 1,870,585,459 | 5.590 s | 0.880 s | 22.200 s | 6.3× |
| $10^{13}$ | 15,834,656,003 | 97.277 s | 14.822 s | 201.500 s | 6.5× |
| $10^{14}$ | 135,779,962,760 | 2,267.851 s | 360.050 s | 2,210.000 s | 6.3× |
Performance Note: At 10^12, the RTX 5090 GPU achieved a record throughput of 1,136.3 billion (1.13 T-items/s) candidates per second.
KEY FINDINGS & MATHEMATICAL INSIGHTS
1. 10^14 Uniqueness - Remarkable Distribution Equivalence
The difference between twin and cousin primes at 10^14 remains extraordinarily small even with higher precision testing:
-
Twin primes: 135,780,321,665
-
Cousin primes: 135,779,962,760
-
Difference: 358,905 pairs (only 0.0003%)
This confirms the near-perfect equivalence in distribution across 100 trillion numbers.
2. Hardy-Littlewood Conjecture Verification
The TSE v2.0.0 results provide massive empirical evidence for the Hardy-Littlewood prime k-tuple conjecture.
-
Ratio convergence: 1.0000 (within 0.0003% variance)
Our results at $10^{14}$ scale demonstrate that twin and cousin primes share nearly identical asymptotic densities, supporting the core of analytic number theory.
3. The Tera-Scale Era: GPU Peak Performance
The "sweet spot" for the N/6 bit sieve has been redefined by the RTX 5090:
-
Peak Throughput: 1.136 Trillion candidates/second (at $10^{12}$)
-
Generational Leap: 6.5× faster than RTX 3070 at large scales ($10^{13}$).
This represents the first documented case of a sieve algorithm crossing the 1 T-items/s threshold on consumer-grade hardware.
4. 9950X3D and Memory Cache Efficiency
The AMD Ryzen 9 9950X3D (32 threads) showed that CPU sieving can still be highly competitive:
-
CPU Throughput: 66.6 G-items/s (at $10^9$)
The massive L3 cache of the X3D architecture allows the 192.5 KB aligned segments to stay entirely within the processor, eliminating RAM latency bottlenecks.
5. Hardware Domination - RTX 5090 vs RTX 3070
Architecture and VRAM capacity are the primary drivers at extreme scales:
-
RTX 5090 @ 10^14: 359.341 seconds
-
RTX 3070 @ 10^14: 2,264.706 seconds
-
Improvement: The 5090 reduces the processing time from 37 minutes to just 6 minutes for 100 trillion numbers.
Files
TwinAndCousinLast.pdf
Files
(495.1 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:86f640baa9583b0ee331c5a02024ab35
|
495.1 kB | Preview Download |