The Turkish Sieve Methodology: Deterministic Computation of Twin and Cousin Prime Pairs Using an N/6 Bit Data Structure
Authors/Creators
Description
Overview:
This study introduces the Turkish Sieve Methodology, a novel approach for prime number computation designed to overcome the memory intensity and modular arithmetic constraints inherent in traditional sieve algorithms. While many sieve implementations are based on N/3-like candidate representations, this method achieves an N/6 bit data structure specifically optimized for identifying twin (p, p+2) and cousin (p, p+4) prime pairs.
An often-overlooked contribution of the Turkish Sieve is its role as a reproducible research instrument rather than a codebase requiring algorithmic expertise. The implementation is distributed as a self-contained executable with deterministic configuration, standardized output, and structured performance reports. This enables researchers without programming or GPU expertise to conduct large-scale prime distribution experiments, generate verifiable datasets, and directly integrate results into independent studies.
To our knowledge, existing prime sieve implementations typically focus either on computational performance or on library-based usage. The proposed system integrates deterministic execution, structured reporting, and accessibility features within a single reproducible framework.
Key Innovations:
-
Memory Efficiency: By reducing the candidate pair sequences to an N/6 bit structure, the methodology effectively doubles the memory efficiency compared to existing bit-sieve models, enabling the processing of massive datasets that were previously computationally prohibitive.
-
Computational Optimization: The entire elimination process is transformed into an integer-addition-based operation (n <-- n + p). By replacing expensive modular arithmetic (MOD/DIV) with deterministic rhythmic progression and bitwise operations, the algorithm is tailor-made for high-performance CPU and GPU (CUDA) cores.
-
Hardware Awareness: The methodology leverages the parallel processing capabilities of modern GPU architectures, allowing for rapid execution and high-throughput candidate screening.
Implementation:
The formal methodology described in this paper is implemented in the Turkish Sieve Engine (TSE V.2.0.0). The high-performance executable (optimized for CUDA and OMP) are available at:
👉 https://github.com/bilgisofttr/turkishsieve
Turkish Sieve Engine (TSE) v2.0.2 is now available.
This major update extends the capabilities of TSE beyond twin prime and cousin prime counting. In addition to deterministic enumeration of twin primes and cousin primes, after version 2.0.0 can now also generate standard prime statistics and prime-counting data.
- GPU-based automatic segmented execution with user-defined range start and offset
- Fully automated per-segment logging/export system
- Optimized CUDA and OpenMP execution
- Improved report formatting and pipeline consistency
The system is designed for large-scale numerical experimentation and performance research.
Conclusion:
The Turkish Sieve offers a significant advancement in computational number theory, providing a scalable and deterministic tool for researchers exploring prime distributions, twin prime conjectures, and post-quantum cryptographic foundations.
This manuscript is a preliminary preprint version. The current version is under revision for submission to a peer-reviewed journal.
Keywords:
Number Theory, Twin Primes, Cousin Primes, Turkish Sieve, GPU Computing, CUDA, Bit Sieve, High-Performance Computing (HPC), Prime Gap.
Project Evolution
While this methodology (V.1.0.0) primarily focuses on the deterministic computation of twin and cousin prime pairs, the Turkish Sieve Engine (TSE) project continues to expand its computational horizons. Extensive verification tables up to the 10^15 scale have been published in the official repository, confirming the high-fidelity distribution of these pairs.
Furthermore, development of Version 2.0.0 is underway, which will extend the N/6 bit architectural efficiency to General Prime detection and enumeration. This upcoming update aims to provide an all-in-one prime discovery suite, further optimized for multi-GPU environments and standalone high-performance computing (HPC) tasks.
Performance Benchmarks & Scalability Report
The Turkish Sieve (TS) methodology has been stress-tested across vast ranges and various hardware architectures. The following results demonstrate the deterministic performance and memory efficiency of the N/6 indexing paradigm, featuring the RTX 5090 and Ryzen 9 9950X3D as the new benchmarks:
Table 1: Twin Primes - Full Benchmark Results
| Range | Twin Count | 3070 CUDA | 5090 CUDA | 9950X3D OMP | Speedup (5090 vs 3070) |
| $10^8$ | 440,312 | 0.027 s | 0.070 s | 0.015 s | Latency Floor |
| $10^9$ | 3,424,506 | 0.060 s | 0.071 s | 0.015 s | Latency Floor |
| $10^{10}$ | 27,412,679 | 0.266 s | 0.080 s | 0.167 s | 3.3× |
| $10^{11}$ | 224,376,048 | 0.565 s | 0.177 s | 2.053 s | 3.2× |
| $10^{12}$ | 1,870,585,220 | 5.510 s | 0.927 s | 21.500 s | 5.9× |
| $10^{13}$ | 15,834,664,872 | 96.962 s | 14.896 s | 198.400 s | 6.5× |
| $10^{14}$ | 135,780,321,665 | 2,264.706 s | 359.341 s | 2,150.000 s | 6.3× |
Performance Note: At 10^12, the RTX 5090 GPU processed 1,870,585,220 twin candidates in 0.927 seconds, achieving a massive throughput of 1,078.7 billion (1.07 T-items/s) candidates per second.
Table 2: Cousin Primes - Full Benchmark Results
| Range | Cousin Count | 3070 CUDA | 5090 CUDA | 9950X3D OMP | Speedup (5090 vs 3070) |
| $10^8$ | 440,258 | 0.018 s | 0.070 s | 0.015 s | Latency Floor |
| $10^9$ | 3,424,680 | 0.060 s | 0.071 s | 0.016 s | Latency Floor |
| $10^{10}$ | 27,409,999 | 0.292 s | 0.082 s | 0.172 s | 3.5× |
| $10^{11}$ | 224,373,161 | 0.593 s | 0.156 s | 2.110 s | 3.8× |
| $10^{12}$ | 1,870,585,459 | 5.590 s | 0.880 s | 22.200 s | 6.3× |
| $10^{13}$ | 15,834,656,003 | 97.277 s | 14.822 s | 201.500 s | 6.5× |
| $10^{14}$ | 135,779,962,760 | 2,267.851 s | 360.050 s | 2,210.000 s | 6.3× |
Performance Note: At 10^12, the RTX 5090 GPU achieved a record throughput of 1,136.3 billion (1.13 T-items/s) candidates per second.
KEY FINDINGS & MATHEMATICAL INSIGHTS
1. 10^14 Uniqueness - Remarkable Distribution Equivalence
The difference between twin and cousin primes at 10^14 remains extraordinarily small even with higher precision testing:
-
Twin primes: 135,780,321,665
-
Cousin primes: 135,779,962,760
-
Difference: 358,905 pairs (only 0.0003%)
This confirms the near-perfect equivalence in distribution across 100 trillion numbers.
2. Hardy-Littlewood Conjecture Verification
The TSE v2.0.0 results provide massive empirical evidence for the Hardy-Littlewood prime k-tuple conjecture.
-
Ratio convergence: 1.0000 (within 0.0003% variance)
Our results at $10^{14}$ scale demonstrate that twin and cousin primes share nearly identical asymptotic densities, supporting the core of analytic number theory.
3. The Tera-Scale Era: GPU Peak Performance
The "sweet spot" for the N/6 bit sieve has been redefined by the RTX 5090:
-
Peak Throughput: 1.136 Trillion candidates/second (at $10^{12}$)
-
Generational Leap: 6.5× faster than RTX 3070 at large scales ($10^{13}$).
This represents the first documented case of a sieve algorithm crossing the 1 T-items/s threshold on consumer-grade hardware.
4. 9950X3D and Memory Cache Efficiency
The AMD Ryzen 9 9950X3D (32 threads) showed that CPU sieving can still be highly competitive:
-
CPU Throughput: 66.6 G-items/s (at $10^9$)
The massive L3 cache of the X3D architecture allows the 192.5 KB aligned segments to stay entirely within the processor, eliminating RAM latency bottlenecks.
5. Hardware Domination - RTX 5090 vs RTX 3070
Architecture and VRAM capacity are the primary drivers at extreme scales:
-
RTX 5090 @ 10^14: 359.341 seconds
-
RTX 3070 @ 10^14: 2,264.706 seconds
-
Improvement: The 5090 reduces the processing time from 37 minutes to just 6 minutes for 100 trillion numbers.
Files
TwinAndCousinLast.pdf
Files
(495.1 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:86f640baa9583b0ee331c5a02024ab35
|
495.1 kB | Preview Download |