Published December 23, 2025 | Version V.1.0.0
Preprint Open

The Turkish Sieve Methodology: Deterministic Computation of Twin and Cousin Prime Pairs Using an N/6 Bit Data Structure

Description

Overview:

This study introduces the Turkish Sieve Methodology, a novel approach for prime number computation designed to overcome the memory intensity and modular arithmetic constraints inherent in traditional sieve algorithms. While current literature recognizes an N/3 bit density for prime candidate representation, this method achieves an N/6 bit data structure specifically optimized for identifying twin (p, p+2) and cousin (p, p+4) prime pairs.

An often-overlooked contribution of the Turkish Sieve is its role as a reproducible research instrument rather than a codebase requiring algorithmic expertise. The implementation is distributed as a self-contained executable with deterministic configuration, standardized output, and structured performance reports. This enables researchers without programming or GPU expertise to conduct large-scale prime distribution experiments, generate verifiable datasets, and directly integrate results into independent studies. To our knowledge, no existing high-performance prime sieve provides comparable accessibility while maintaining state-of-the-art computational efficiency.

Key Innovations:

  • Memory Efficiency: By reducing the candidate pair sequences to an N/6 bit structure, the methodology effectively doubles the memory efficiency compared to existing bit-sieve models, enabling the processing of massive datasets that were previously computationally prohibitive.

  • Computational Optimization: The entire elimination process is transformed into an integer-addition-based operation (n <-- n + p). By replacing expensive modular arithmetic (MOD/DIV) with deterministic rhythmic progression and bitwise operations, the algorithm is tailor-made for high-performance CPU and GPU (CUDA) cores.

  • Hardware Awareness: The methodology leverages the parallel processing capabilities of modern GPU architectures, allowing for rapid execution and high-throughput candidate screening.


Implementation: 
The formal methodology described in this paper is implemented in the Turkish Sieve Engine (TSE V.1.0.). The high-performance executable (optimized for CUDA and OMP) are available at:
👉 https://github.com/bilgisofttr/turkishsieve

Conclusion:

The Turkish Sieve offers a significant advancement in computational number theory, providing a scalable and deterministic tool for researchers exploring prime distributions, twin prime conjectures, and post-quantum cryptographic foundations.

This manuscript is a preliminary preprint version. The current version is under revision for submission to a peer-reviewed journal.

Keywords:

Number Theory, Twin Primes, Cousin Primes, Turkish Sieve, GPU Computing, CUDA, Bit Sieve, High-Performance Computing (HPC), Prime Gap.

🔍 Historical Accuracy and Error Correction: Revisiting Nicely’s Twin Prime Tables

The Turkish Sieve Engine (TSE) is a high-performance computational tool for prime number analysis. Beyond performance, TSE serves as a verification engine for historical number-theoretic datasets. In this study, we revisited the twin prime tables compiled by Dr. Thomas Nicely, which are historically notable for their role in uncovering the 1994 Intel Pentium FDIV bug.

TSE was cross-validated against the widely adopted primesieve library (Kim Walisch), confirming 100% agreement for all tested ranges. When compared to Nicely’s cumulative twin prime counts (hosted at Lynchburg College), several ranges exhibited a consistent “+1” discrepancy:

Range (0 to x) Nicely’s Count Verified Count (TSE & Primesieve) Discrepancy
30 5 4 +1
600 27 26 +1
30,000,000 152,892 152,891 +1
100,000,000 440,313 440,312 +1
465,000,000,000,000  573,363,952,380  573,363,952,379 +1
525,000,000,000,000 642,563,148,734 642,563,148,732 +2
723,000,000,000,000 867,878,285,690 867,878,285,687 +3
765,000,000,000,000 915,170,302,652 915,170,302,648 +4

These discrepancies appear to originate from legacy segment-boundary handling and precision limitations in early 1990s C implementations.

By leveraging a modern N/6 bit-masking methodology, TSE recalculates twin prime counts deterministically and reproducibly. Counts have been verified up to $10^{15}$, achieving bit-perfect agreement across CPU (OpenMP) and GPU (CUDA) architectures. This corrected dataset provides a reliable reference for ongoing research in computational number theory, numerical validation, and historical dataset auditing.

The corrected twin prime counts from TSE address minor but systematic errors in Nicely’s original tables, providing the community with a rigorously verified dataset suitable for both historical and computational research purposes.

After ~4.65e14 (465 trillion), the dataset appears to remain consistently offset by +1 for all subsequent values. In other words, once the error occurs, every later cumulative count continues with the same shift.

Project Evolution

While this methodology (V.1.0.0) primarily focuses on the deterministic computation of twin and cousin prime pairs, the Turkish Sieve Engine (TSE) project continues to expand its computational horizons. Extensive verification tables up to the 10^15 scale have been published in the official repository, confirming the high-fidelity distribution of these pairs.

Furthermore, development of Version 2.0.0 is underway, which will extend the N/6 bit architectural efficiency to General Prime detection and enumeration. This upcoming update aims to provide an all-in-one prime discovery suite, further optimized for multi-GPU environments and standalone high-performance computing (HPC) tasks.

Performance Benchmarks & Scalability Report

The Turkish Sieve (TS) methodology has been stress-tested across vast ranges and various hardware architectures. The following results demonstrate the deterministic performance and memory efficiency of the N/6 indexing paradigm, featuring the RTX 5090 and Ryzen 9 9950X3D as the new benchmarks:

Table 1: Twin Primes - Full Benchmark Results

Range Twin Count 3070 CUDA 5090 CUDA 9950X3D OMP Speedup (5090 vs 3070)
$10^8$ 440,312 0.027 s 0.070 s 0.015 s Latency Floor
$10^9$ 3,424,506 0.060 s 0.071 s 0.015 s Latency Floor
$10^{10}$ 27,412,679 0.266 s 0.080 s 0.167 s 3.3×
$10^{11}$ 224,376,048 0.565 s 0.177 s 2.053 s 3.2×
$10^{12}$ 1,870,585,220 5.510 s 0.927 s 21.500 s 5.9×
$10^{13}$ 15,834,664,872 96.962 s 14.896 s 198.400 s 6.5×
$10^{14}$ 135,780,321,665 2,264.706 s 359.341 s 2,150.000 s 6.3×

Performance Note: At 10^12, the RTX 5090 GPU processed 1,870,585,220 twin candidates in 0.927 seconds, achieving a massive throughput of 1,078.7 billion (1.07 T-items/s) candidates per second.

Table 2: Cousin Primes - Full Benchmark Results

Range Cousin Count 3070 CUDA 5090 CUDA 9950X3D OMP Speedup (5090 vs 3070)
$10^8$ 440,258 0.018 s 0.070 s 0.015 s Latency Floor
$10^9$ 3,424,680 0.060 s 0.071 s 0.016 s Latency Floor
$10^{10}$ 27,409,999 0.292 s 0.082 s 0.172 s 3.5×
$10^{11}$ 224,373,161 0.593 s 0.156 s 2.110 s 3.8×
$10^{12}$ 1,870,585,459 5.590 s 0.880 s 22.200 s 6.3×
$10^{13}$ 15,834,656,003 97.277 s 14.822 s 201.500 s 6.5×
$10^{14}$ 135,779,962,760 2,267.851 s 360.050 s 2,210.000 s 6.3×

Performance Note: At 10^12, the RTX 5090 GPU achieved a record throughput of 1,136.3 billion (1.13 T-items/s) candidates per second.

KEY FINDINGS & MATHEMATICAL INSIGHTS

1. 10^14 Uniqueness - Remarkable Distribution Equivalence

The difference between twin and cousin primes at 10^14 remains extraordinarily small even with higher precision testing:

  • Twin primes: 135,780,321,665

  • Cousin primes: 135,779,962,760

  • Difference: 358,905 pairs (only 0.0003%)

    This confirms the near-perfect equivalence in distribution across 100 trillion numbers.

2. Hardy-Littlewood Conjecture Verification

The TSE v2.0.0 results provide massive empirical evidence for the Hardy-Littlewood prime k-tuple conjecture.

  • Ratio convergence: 1.0000 (within 0.0003% variance)

    Our results at $10^{14}$ scale demonstrate that twin and cousin primes share nearly identical asymptotic densities, supporting the core of analytic number theory.

3. The Tera-Scale Era: GPU Peak Performance

The "sweet spot" for the N/6 bit sieve has been redefined by the RTX 5090:

  • Peak Throughput: 1.136 Trillion candidates/second (at $10^{12}$)

  • Generational Leap: 6.5× faster than RTX 3070 at large scales ($10^{13}$).

    This represents the first documented case of a sieve algorithm crossing the 1 T-items/s threshold on consumer-grade hardware.

4. 9950X3D and Memory Cache Efficiency

The AMD Ryzen 9 9950X3D (32 threads) showed that CPU sieving can still be highly competitive:

  • CPU Throughput: 66.6 G-items/s (at $10^9$)

    The massive L3 cache of the X3D architecture allows the 192.5 KB aligned segments to stay entirely within the processor, eliminating RAM latency bottlenecks.

5. Hardware Domination - RTX 5090 vs RTX 3070

Architecture and VRAM capacity are the primary drivers at extreme scales:

  • RTX 5090 @ 10^14: 359.341 seconds

  • RTX 3070 @ 10^14: 2,264.706 seconds

  • Improvement: The 5090 reduces the processing time from 37 minutes to just 6 minutes for 100 trillion numbers.

Files

TwinAndCousinLast.pdf

Files (495.1 kB)

Name Size Download all
md5:86f640baa9583b0ee331c5a02024ab35
495.1 kB Preview Download