Published December 23, 2025 | Version V.1.0.0
Preprint Open

The Turkish Sieve Methodology: Deterministic Computation of Twin and Cousin Prime Pairs Using an N/6 Bit Data Structure

Description

Overview:

This study introduces the Turkish Sieve Methodology, a novel approach for prime number computation designed to overcome the memory intensity and modular arithmetic constraints inherent in traditional sieve algorithms. While current literature recognizes an N/3 bit density for prime candidate representation, this method achieves an N/6 bit data structure specifically optimized for identifying twin (p, p+2) and cousin (p, p+4) prime pairs.

An often-overlooked contribution of the Turkish Sieve is its role as a reproducible research instrument rather than a codebase requiring algorithmic expertise. The implementation is distributed as a self-contained executable with deterministic configuration, standardized output, and structured performance reports. This enables researchers without programming or GPU expertise to conduct large-scale prime distribution experiments, generate verifiable datasets, and directly integrate results into independent studies. To our knowledge, no existing high-performance prime sieve provides comparable accessibility while maintaining state-of-the-art computational efficiency.

Key Innovations:

  • Memory Efficiency: By reducing the candidate pair sequences to an N/6 bit structure, the methodology effectively doubles the memory efficiency compared to existing bit-sieve models, enabling the processing of massive datasets that were previously computationally prohibitive.

  • Computational Optimization: The entire elimination process is transformed into an integer-addition-based operation (n <-- n + p). By replacing expensive modular arithmetic (MOD/DIV) with deterministic rhythmic progression and bitwise operations, the algorithm is tailor-made for high-performance CPU and GPU (CUDA) cores.

  • Hardware Awareness: The methodology leverages the parallel processing capabilities of modern GPU architectures, allowing for rapid execution and high-throughput candidate screening.


Implementation: 
The formal methodology described in this paper is implemented in the Turkish Sieve Engine (TSE V.1.0.). The high-performance executable (optimized for CUDA and OMP) are available at:
👉 https://github.com/bilgisofttr/turkishsieve

Conclusion:

The Turkish Sieve offers a significant advancement in computational number theory, providing a scalable and deterministic tool for researchers exploring prime distributions, twin prime conjectures, and post-quantum cryptographic foundations.

This manuscript is a preliminary preprint version. The current version is under revision for submission to a peer-reviewed journal.

Keywords:

Number Theory, Twin Primes, Cousin Primes, Turkish Sieve, GPU Computing, CUDA, Bit Sieve, High-Performance Computing (HPC), Prime Gap.

Performance Benchmarks & Scalability Report

The Turkish Sieve (TS) methodology has been stress-tested across vast ranges and various hardware architectures. The following results demonstrate the deterministic performance and memory efficiency of the N/6 indexing paradigm, featuring the RTX 5090 and Ryzen 9 9950X3D as the new benchmarks:

Table 1: Twin Primes - Full Benchmark Results

Range Twin Count 3070 CUDA 5090 CUDA 9950X3D OMP Speedup (5090 vs 3070)
$10^8$ 440,312 0.027 s 0.070 s 0.015 s Latency Floor
$10^9$ 3,424,506 0.060 s 0.071 s 0.015 s Latency Floor
$10^{10}$ 27,412,679 0.266 s 0.080 s 0.167 s 3.3×
$10^{11}$ 224,376,048 0.565 s 0.177 s 2.053 s 3.2×
$10^{12}$ 1,870,585,220 5.510 s 0.927 s 21.500 s 5.9×
$10^{13}$ 15,834,664,872 96.962 s 14.896 s 198.400 s 6.5×
$10^{14}$ 135,780,321,665 2,264.706 s 359.341 s 2,150.000 s 6.3×

Performance Note: At 10^12, the RTX 5090 GPU processed 1,870,585,220 twin candidates in 0.927 seconds, achieving a massive throughput of 1,078.7 billion (1.07 T-items/s) candidates per second.

Table 2: Cousin Primes - Full Benchmark Results

Range Cousin Count 3070 CUDA 5090 CUDA 9950X3D OMP Speedup (5090 vs 3070)
$10^8$ 440,258 0.018 s 0.070 s 0.015 s Latency Floor
$10^9$ 3,424,680 0.060 s 0.071 s 0.016 s Latency Floor
$10^{10}$ 27,409,999 0.292 s 0.082 s 0.172 s 3.5×
$10^{11}$ 224,373,161 0.593 s 0.156 s 2.110 s 3.8×
$10^{12}$ 1,870,585,459 5.590 s 0.880 s 22.200 s 6.3×
$10^{13}$ 15,834,656,003 97.277 s 14.822 s 201.500 s 6.5×
$10^{14}$ 135,779,962,760 2,267.851 s 360.050 s 2,210.000 s 6.3×

Performance Note: At 10^12, the RTX 5090 GPU achieved a record throughput of 1,136.3 billion (1.13 T-items/s) candidates per second.

KEY FINDINGS & MATHEMATICAL INSIGHTS

1. 10^14 Uniqueness - Remarkable Distribution Equivalence

The difference between twin and cousin primes at 10^14 remains extraordinarily small even with higher precision testing:

  • Twin primes: 135,780,321,665

  • Cousin primes: 135,779,962,760

  • Difference: 358,905 pairs (only 0.0003%)

    This confirms the near-perfect equivalence in distribution across 100 trillion numbers.

2. Hardy-Littlewood Conjecture Verification

The TSE v2.0.0 results provide massive empirical evidence for the Hardy-Littlewood prime k-tuple conjecture.

  • Ratio convergence: 1.0000 (within 0.0003% variance)

    Our results at $10^{14}$ scale demonstrate that twin and cousin primes share nearly identical asymptotic densities, supporting the core of analytic number theory.

3. The Tera-Scale Era: GPU Peak Performance

The "sweet spot" for the N/6 bit sieve has been redefined by the RTX 5090:

  • Peak Throughput: 1.136 Trillion candidates/second (at $10^{12}$)

  • Generational Leap: 6.5× faster than RTX 3070 at large scales ($10^{13}$).

    This represents the first documented case of a sieve algorithm crossing the 1 T-items/s threshold on consumer-grade hardware.

4. 9950X3D and Memory Cache Efficiency

The AMD Ryzen 9 9950X3D (32 threads) showed that CPU sieving can still be highly competitive:

  • CPU Throughput: 66.6 G-items/s (at $10^9$)

    The massive L3 cache of the X3D architecture allows the 192.5 KB aligned segments to stay entirely within the processor, eliminating RAM latency bottlenecks.

5. Hardware Domination - RTX 5090 vs RTX 3070

Architecture and VRAM capacity are the primary drivers at extreme scales:

  • RTX 5090 @ 10^14: 359.341 seconds

  • RTX 3070 @ 10^14: 2,264.706 seconds

  • Improvement: The 5090 reduces the processing time from 37 minutes to just 6 minutes for 100 trillion numbers.

Files

TwinAndCousinLast.pdf

Files (495.1 kB)

Name Size Download all
md5:86f640baa9583b0ee331c5a02024ab35
495.1 kB Preview Download