A collection of database industrial techniques and optimization approaches of database operations

Databases play an essential role in our society today. Databases are embedded in sectors like corporations, institutions, and government organizations, among others. These databases are used for our video and audio streaming platforms, social gaming, finances, cloud storage, e-commerce, healthcare, economy, etc. It is therefore imperative that we learn how to properly execute database operations and efficiently implement methodologies so that we may optimize the performance of databases.


Terminologies
Database -a structured set of data that is accessible in many ways through a computer Data mart -a more specified data warehouse whose data are obtained by selecting and summarizing data from the data warehouse [1] Data warehouse -a subject-oriented, integrated, time-variant, nonvolatile collection of data used in support of management decision-making processes [2] Feature/field -a column in the table of a database Grain -the length of time associated with each record in the table [3] Joining -process of combining data from various sources into a single table or view [4] Record -a row or entry in the table of a database Schema -a logical container of tables, views, and stored procedures Table -a set of values using a model of vertical columns and horizontal rows View -a result set of a stored query on the data

Overview
A database is a usually large collection of data organized especially for rapid search and retrieval [5].Creation, retrieval, modification, and deletion are the four basic database operations.These four functions are more commonly termed as create, read, update, and delete, or CRUD.
1. Creation is the addition of new entries or records onto a table in the database.
2. Retrieval is the searching or viewing of existing records in a table in the database.
3. Modification is the editing of existing records in a table in the database for the following purposes: a. content error -value is erroneous (typographical error, false or deceptive data) b. content update -value is replaced due to changes over a period of time 4. Deletion is the removal of existing records in a table in the database.

Transient data versus periodic data
Before the approaches and techniques are discussed, it is necessary to introduce the concepts of transient data and periodic data.Transient data are data overwritten on top of previous records or data that are permanently erased when modification and deletion are executed respectively-effectively destroying the history of the data.Periodic data are data that are never physically altered or deleted.The use of these two types of data depend on the nature and philosophy of the industry.(Salisbury, D., 2007).

Successive addition
Successive addition is the addition of records onto the table one at a time.This is efficient when the addition of new data does not occur frequently and that the records to be added are few-a record count that is preferably countable by the human hands instantly (less than or equal to 10).
Successive addition is inefficient when a) record addition is getting more frequent in the number of times it is performed , or b) the record addition occurs at an interval (hourly, daily, every n days, etc.).(Vanga, S., 2011).

Bulk addition
Bulk addition is the addition of records onto the table as one whole batch.Bulk approach to record addition is highly effective when a) data is huge but manageable (hundreds to thousands of records per execution), b) the database is locally stored, and that c) the machine that the database operation is done is sufficient in terms of machine requirements for huge data processing.This is not ideal for huge, overwhelming data (thousands to millions of records per record addition), for online databases, as it would take a significant amount of time to upload these records, and for machines that do not meet huge data processing requirements as it would also slow down performance.

Partitioned addition
Partitioned addition is the addition of records onto the table by grouping data into partitions or batches.With partitioned addition, the overwhelming nature of a thousand to million record data would be reduced and the heavy load would be addressed if the database is stored online.It would also relieve some computational intensity on substandard machines.
The number of partitions is the approach's strength and weakness.Using too many partitions would result to something like using the successive addition approach.On the other hand, using a few partitions is like performing bulk addition repetitively.

In-parallel addition
In-parallel addition is the addition of records onto the table by executing bulk addition or partitioned addition in-parallel using multithreading.(Shyamsundar, A., 2016).

First occurrence approach
First occurrence approach is the retrieval of records from the table by searching the first occurrence of the start date from the table and keep on selecting the records to include until the first occurrence of the date that is beyond the end date.

Granular lookup schema approach (GLS approach)
Granular lookup schema approach is the retrieval of records from table A by using a lookup table B provided that the database is structured such that an extra column(s) for labeled grains exist(s).The label would be counterchecked at lookup Starting transaction number (indices) Q3M7 (Q3M7 = 3 rd quarter, month 7) 1 Q3M8 (Q3M8 = 3 rd quarter, month 8) 535 … … The GLS approach only had to look up for Q3M8 for August 1, 2018 and only performed a go-to instead of a linear search.The granularity of the lookup table depends on database engineer implementation preference.By introducing codes to represent periods of time, we shorten the searching time.

Exhaustive search approach for retrieval (ESR approach)
Exhaustive search approach is the retrieval of records by searching the first occurrence of a specific entity (person, product, service, location) and keep on selecting records that match until the entire table is searched.(LibGuides, n.d.).

Indexed entity approach for retrieval (IER approach)
Similar to the GLS approach, a lookup table is created containing the identifiers of all the entities in the main

Modification
A. Maintenance of historical data

Transient modification
Transient modification is the editing of data over the previous content.This is used when history is not important according to the database engineer.

Periodic modification
Periodic modification is the addition of data instead of the actual modification of data and is labeled the updated value.The only value modified is the marker column indicating it is the current value. Example: Online assignment submission

Exhaustive search approach for modification (ESM approach)
Exhaustive search approach is the modification of records from the table by searching the first occurrence of the key term from the table and keep on selecting the records to include until the end of the table.

Indexed entity approach for modification (IEM approach)
A lookup table is created containing the identifiers of all the entities in the main table.Each record in the main table is indexed and similar names have the indices aggregated in the lookup table.Data is then modified with the aid of the lookup table.This is similar to IER approach.

Transient deletion
Transient deletion is the removal of data, erasing it permanently.This is used when history is not important according to the database engineer.

Periodic deletion
Periodic deletion is the addition of data instead of the actual deletion of data and is labeled the updated value.The only value modified is the marker column indicating it is the current value.

Example:
Online assignment submission table Student

Exhaustive search approach for deletion (ESD approach)
Exhaustive search approach is the deletion of records from the table by searching the first occurrence of the key term from the table and keep on deleting the records that match the key term until the end of the table.

Indexed entity approach for deletion (IED approach)
A lookup table is created containing the identifiers of all the entities in the main table.Each record in the main table is indexed and similar names have the indices aggregated in the lookup table.Data is then deleted with the aid of the lookup table.This is similar to IER and IEM approach.a workaround by introducing an offset.If the syncing is done daily, provide x number of days as an offset just in case it doesn't run for a day or more than that.If there are overlaps, delete the duplicates that would be generated.The duplicate dn is then removed to preserve data.

Execution techniques
Now that the operation approaches and industrial techniques have been discussed, it is important to tackle the different execution techniques for the SQL query.Some of these techniques are as follows: (1) In-editor manual execution (2) Semi-automated scripted execution (3) Semi-automated bashed execution (4) Hybridized semi-automated execution (5) Automated execution

In-editor manual execution
In-editor manual execution refers to the execution of an SQL query via by-hand input in an SQL editor.This is an acceptable technique if the query is ran extremely seldom or if the query is a one-time execution.Also, this is only ideal for short queries.

Semi-automated scripted execution
Semi-automated script execution refers to the execution of an SQL query via scripts like the built-in SQL script feature, or coding one in Python, Perl, Java, etc.The query itself is located inside the script file.Note that not all languages are packaged with SQL querying capabilities and libraries might be needed to achieve such task.

Semi-automated bashed execution
Semi-automated bashed execution refers to the execution of an SQL query via bash files (Mac OS/Linux) or batch files (Windows).The query itself is saved inside a .sqlfile and accessed by the bash or batch file through SQL server command line tools like sqlcmd.
This makes the execution of the query easier with a double-click of the batch file per execution.

Hybridized semi-automated execution
Hybridized semi-automated execution refers to the execution of an SQL query via bash files or batch files that contain commands to run a script file.This combines the logic of the second and third technique.The SQL query is located inside the script file.

Automated execution
Automated execution refers to the execution of an SQL query via task scheduling manager.This takes a script file and executes it depending on the schedule that has been set.Some task scheduling managers for Windows are: Windows Task Scheduler (XP, 7), AT command, schtasks (8,10), and Scheduled Tasks Cmdlets in Windows Powershell.
For Mac OS and Linux, the most popular is the crontab-a configuration file that contains a cron table where cron jobs are placed.

Dedication
This is for the devoted database architects and database engineers who want nothing but the best implementation of their operations on databases, for the students who want to pursue the intricacies of database management, and for the professors who aim to integrate the industrial methodologies regarding databases inside the classroom.

Figure 1 .
Figure 1.2 an IDT architecture People who set up and maintain software environments use cron to schedule jobs like commands or shell scripts to run periodically at fixed times, dates, or intervals.It typically automates system maintenance or administration-though its general-purpose nature makes it useful for things like downloading files from the Internet and downloading email at regular intervals.(Indiana University Knowledge Base, 2015).Each line of a crontab is a task.It is important to understand its format to utilize it efficiently and properly.The format is as follows:# ┌───────────── minute (0 -59) # │ ┌───────────── hour (0 -23) # │ │ ┌───────────── day of the month (1 -31) # │ │ │ ┌───────────── month (1 -12) # │ │ │ │ ┌───────────── day of the week (0 -6) (Sun-Sat # │ │ │ │ │ # │ │ │ │ │ # │ │ │ │ │ # * * * * * This is the command to be executed.* any value , value list separator -range of values / step values To try out simulations of cron expressions visit https://crontab.guruwhich is a free online tool for cron schedule expressions.About the author I am a BS Computer Science student from the University of the Philippines Manila.I have also received training and industry experience on database management, data science and analytics, and machine learning.I wrote this peer-reviewed discussion paper so that I may share what I know from the past years as someone who strongly likes to work with data.Acknowledgment I would like to thank my database professor in the University of the Philippines Manila, Sir Marbert Marasigan, for sparking the interest in me to delve into databases.You have imparted knowledge that I would carry over to my career and for the rest of my life.I am also super grateful to Sir Joseph Ramos and Sir JM Patiño for sharpening my database skills-especially on the industrial techniques I have discussed-during my internship at PLDT Global Corporation.I'd also like to thank my go-to professors anytime I want to share anything I've done in the field of computer science.Thanks for peer-reviewing this discussion paper, Sir Ruahden Dang-awan and Sir Berwin Yu.Last but definitely not the least, I am very appreciative for the motivation given to me by my friends in the University of the Philippines and the University of Sto.Tomas to engage the arts and sciences of databases.

table B .
Thiswould significantly reduce search time because records are labeled in grains instead of individual dates per records.The first occurrence approach had to go through the records from transaction number (TN) 1 to TN 534 before reaching TN 535 which is the start date.
table.Each record in the main table is indexed and similar names have the indices aggregated in the lookup table.If individual case number indices are desired per row, then a simple row explosion would be done.A row explosion is the transformation of a linear record into its individual indices.

In a way, this entails that periodic deletion is essentially the same as periodic modification.
shows the multiple attempts of Michael to submit an assignment.