The Turkish Sieve Methodology: Deterministic Computation of Twin and Cousin Prime Pairs Using an N/6 Bit Data Structure

ÇAKANLI, Hüseyin

doi:10.5281/zenodo.18038661

Published December 23, 2025 | Version V.1.0.0

Preprint Open

The Turkish Sieve Methodology: Deterministic Computation of Twin and Cousin Prime Pairs Using an N/6 Bit Data Structure

ÇAKANLI, Hüseyin (Project leader)

Overview:

This study introduces the Turkish Sieve Methodology, a novel approach for prime number computation designed to overcome the memory intensity and modular arithmetic constraints inherent in traditional sieve algorithms. While current literature recognizes an N/3 bit density for prime candidate representation, this method achieves an N/6 bit data structure specifically optimized for identifying twin (p, p+2) and cousin (p, p+4) prime pairs.

An often-overlooked contribution of the Turkish Sieve is its role as a reproducible research instrument rather than a codebase requiring algorithmic expertise. The implementation is distributed as a self-contained executable with deterministic configuration, standardized output, and structured performance reports. This enables researchers without programming or GPU expertise to conduct large-scale prime distribution experiments, generate verifiable datasets, and directly integrate results into independent studies. To our knowledge, no existing high-performance prime sieve provides comparable accessibility while maintaining state-of-the-art computational efficiency.

Key Innovations:

Memory Efficiency: By reducing the candidate pair sequences to an N/6 bit structure, the methodology effectively doubles the memory efficiency compared to existing bit-sieve models, enabling the processing of massive datasets that were previously computationally prohibitive.
Computational Optimization: The entire elimination process is transformed into an integer-addition-based operation (n <-- n + p). By replacing expensive modular arithmetic (MOD/DIV) with deterministic rhythmic progression and bitwise operations, the algorithm is tailor-made for high-performance CPU and GPU (CUDA) cores.
Hardware Awareness: The methodology leverages the parallel processing capabilities of modern GPU architectures, allowing for rapid execution and high-throughput candidate screening.

Implementation:
The formal methodology described in this paper is implemented in the Turkish Sieve Engine (TSE V.1.0.). The high-performance executable (optimized for CUDA and OMP) are available at:
👉 https://github.com/bilgisofttr/turkishsieve

Conclusion:

The Turkish Sieve offers a significant advancement in computational number theory, providing a scalable and deterministic tool for researchers exploring prime distributions, twin prime conjectures, and post-quantum cryptographic foundations.

This manuscript is a preliminary preprint version. The current version is under revision for submission to a peer-reviewed journal.

Keywords:

Number Theory, Twin Primes, Cousin Primes, Turkish Sieve, GPU Computing, CUDA, Bit Sieve, High-Performance Computing (HPC), Prime Gap.

Performance Benchmarks & Scalability Report

The Turkish Sieve (TS) methodology has been stress-tested across vast ranges and various hardware architectures. The following results demonstrate the deterministic performance and memory efficiency of the N/6 indexing paradigm, featuring the RTX 5090 and Ryzen 9 9950X3D as the new benchmarks:

Table 1: Twin Primes - Full Benchmark Results

Range	Twin Count	3070 CUDA	5090 CUDA	9950X3D OMP	Speedup (5090 vs 3070)
$10^8$	440,312	0.027 s	0.070 s	0.015 s	Latency Floor
$10^9$	3,424,506	0.060 s	0.071 s	0.015 s	Latency Floor
$10^{10}$	27,412,679	0.266 s	0.080 s	0.167 s	3.3×
$10^{11}$	224,376,048	0.565 s	0.177 s	2.053 s	3.2×
$10^{12}$	1,870,585,220	5.510 s	0.927 s	21.500 s	5.9×
$10^{13}$	15,834,664,872	96.962 s	14.896 s	198.400 s	6.5×
$10^{14}$	135,780,321,665	2,264.706 s	359.341 s	2,150.000 s	6.3×

Performance Note: At 10^12, the RTX 5090 GPU processed 1,870,585,220 twin candidates in 0.927 seconds, achieving a massive throughput of 1,078.7 billion (1.07 T-items/s) candidates per second.

Table 2: Cousin Primes - Full Benchmark Results

Range	Cousin Count	3070 CUDA	5090 CUDA	9950X3D OMP	Speedup (5090 vs 3070)
$10^8$	440,258	0.018 s	0.070 s	0.015 s	Latency Floor
$10^9$	3,424,680	0.060 s	0.071 s	0.016 s	Latency Floor
$10^{10}$	27,409,999	0.292 s	0.082 s	0.172 s	3.5×
$10^{11}$	224,373,161	0.593 s	0.156 s	2.110 s	3.8×
$10^{12}$	1,870,585,459	5.590 s	0.880 s	22.200 s	6.3×
$10^{13}$	15,834,656,003	97.277 s	14.822 s	201.500 s	6.5×
$10^{14}$	135,779,962,760	2,267.851 s	360.050 s	2,210.000 s	6.3×

Performance Note: At 10^12, the RTX 5090 GPU achieved a record throughput of 1,136.3 billion (1.13 T-items/s) candidates per second.

KEY FINDINGS & MATHEMATICAL INSIGHTS

1. 10^14 Uniqueness - Remarkable Distribution Equivalence

The difference between twin and cousin primes at 10^14 remains extraordinarily small even with higher precision testing:

Twin primes: 135,780,321,665
Cousin primes: 135,779,962,760
Difference: 358,905 pairs (only 0.0003%)

This confirms the near-perfect equivalence in distribution across 100 trillion numbers.

2. Hardy-Littlewood Conjecture Verification

The TSE v2.0.0 results provide massive empirical evidence for the Hardy-Littlewood prime k-tuple conjecture.

Ratio convergence: 1.0000 (within 0.0003% variance)

Our results at $10^{14}$ scale demonstrate that twin and cousin primes share nearly identical asymptotic densities, supporting the core of analytic number theory.

3. The Tera-Scale Era: GPU Peak Performance

The "sweet spot" for the N/6 bit sieve has been redefined by the RTX 5090:

Peak Throughput: 1.136 Trillion candidates/second (at $10^{12}$)
Generational Leap: 6.5× faster than RTX 3070 at large scales ($10^{13}$).

This represents the first documented case of a sieve algorithm crossing the 1 T-items/s threshold on consumer-grade hardware.

4. 9950X3D and Memory Cache Efficiency

The AMD Ryzen 9 9950X3D (32 threads) showed that CPU sieving can still be highly competitive:

CPU Throughput: 66.6 G-items/s (at $10^9$)

The massive L3 cache of the X3D architecture allows the 192.5 KB aligned segments to stay entirely within the processor, eliminating RAM latency bottlenecks.

5. Hardware Domination - RTX 5090 vs RTX 3070

Architecture and VRAM capacity are the primary drivers at extreme scales:

RTX 5090 @ 10^14: 359.341 seconds
RTX 3070 @ 10^14: 2,264.706 seconds
Improvement: The 5090 reduces the processing time from 37 minutes to just 6 minutes for 100 trillion numbers.

Files

TwinAndCousinLast.pdf

Files (495.1 kB)

Name	Size	Download all
TwinAndCousinLast.pdf md5:86f640baa9583b0ee331c5a02024ab35	495.1 kB	Preview Download

	All versions	This version
Views	277	277
Downloads	221	221
Data volume	155.5 MB	155.5 MB

The Turkish Sieve Methodology: Deterministic Computation of Twin and Cousin Prime Pairs Using an N/6 Bit Data Structure

Authors/Creators

Description

Performance Benchmarks & Scalability Report

Table 1: Twin Primes - Full Benchmark Results

Table 2: Cousin Primes - Full Benchmark Results

KEY FINDINGS & MATHEMATICAL INSIGHTS

Files

TwinAndCousinLast.pdf

Files (495.1 kB)