Skip to main content

DBMS LEARNING SHEET

                                                           

Types of DATA


Structured data are facts concerning objects and events. The most important structured data are
numeric, character, and dates. Structured data are stored in tabular form. Unstructured data are
multimedia data such as documents, photographs, maps, images, sound, and video clips.
Unstructured data are most commonly found on Web servers and Web-enabled databases.

Database

A database is a collection of data that is organised in such a way that it provides efficient retrieval of
desired information. This collected data could be in any format like printed, audio, electronic and
graphic. Consider an example of an address book, it is also a database that provides information of
personal contacts of persons.




DBMS

A Database Management System (DBMS) is a system software that allows users to efficiently
define, create and maintain databases. A DBMS makes it possible for end users to create, read,
update and delete data in a database. A DBMS serves as an interface between the database and end
users or application programs. It provides control access to the data and ensures that data is
consistent and correct by defining rules on them.


Data Model

Data models define the logical structure of a database and determines in which manner data can be
stored, organized and manipulated. The model describes the relationships between different parts of
the data. Data Model provides a database user with a conceptual framework in which we specify the
database requirements of the database user and the structure of the database to fulfill these
requirements. The most popular example of database model is the relational model.
Relational Model: In this model, data is organised in two-dimesional tables called relations. The
main highlights of this model are:
• Relations can be normalized.
• In normalized relations, values saved are atomic values.
• Each row in a relation contains a unique value.
• Each column in a relation contains values from a same domain.
                                                 


Relational DBMS

It is a type of DBMS that organizes data in such a way that it appears to the user that data is stored
in a series of interrelated tables. These tables are also known as relations. Relationships among
tables are shown through shared data.

Database Schema

Database schema also represents the logical view of the entire database but in descriptive detail. It
is a blueprint of the database by defining types of entities that are in it and relationship among them.
It also specifies what fields will be present, what would be their types and what constraints are to be
applied on the data. A database schema can be divided broadly into two categories −
Logical Database Schema: Logical data is data about database i.e. this data stores
information about how data is managed. Hence this schema defines all the logical constraints
that need to be applied on the data stored. It defines tables, views, and integrity constraints.
Physical Database Schema: This schema pertains to the actual storage of data and its form of
storage like files, indices, etc. It defines how the data will be stored in a secondary storage.
Difference between Model and Schema: A data model can, for example, be a relational model
where the data will be organised in tables whereas the schema for this model would be the set of
attributes and their corresponding domains.


Data Independence

A database system normally contains a lot of data in addition to users’ data. But as a DBMS
expands, it needs to change over time to satisfy the requirements of the users. If the entire data is
dependent, it would become a tedious and highly complex job to update it. Hence layered
architecture is used to obtain data independence. Data independence can be defined as an ability to
modify a schema definition in one level without affecting definition in the next higher level. There
are two levels of data independence:


Logical Data Independence:

 The ability to change the logical schema without changing the
physical schema is called as Logical Data Independence. The change would be absorbed by
the mapping between the physical and logival levels. For example, if some changes are made
on table format, then it should not change the data residing on the disk.
Physical Data Independence: The ability to change the physical schema without changing
the logical schema is called as Physical Data Independence. All the schemas are logical, and
the actual data is stored on the disk. Changing the data storage system should not have any
impact on logical schema.



Database Dictionary

Database dictionary or catalog contains all the information about the data present in database such
as schema definition of each table, structure of file, their type, storage format of each data item and
various constraints on the data. This stored information is known as metadata.


Database Keys

A key is a single or combination of multiple fields (attributes) within a table. Its purpose is to ensure
that each record within a table can be uniquely identified. They help to identify the relationship
between tables and also maintains data integrity (that is data should be correct and in well formed).
Super Key: A Super Key is a set of attributes (one or more) within a table that can uniquely
identify each tuple of table. Super key is a superset of Candidate key. A single table can have
multiple super keys.


Candidate Key:

A Candidate Key is a minimal super key i.e. it contains a single attribute or
the least combination of attributes that uniquely identifies each record in the table. Every table
must have at least one candidate key.

Primary Key:

A primary key is one of the candidate keys chosen to uniquely identify the
entity set. Primary key is used to identify the table. Each table has exactly one primary key.

Foreign Key:

 A foreign key is an attribute or combination of attributes in a table which refer
to primary key of another table. Foreign keys are used to define relationship between tables. It
is also known as referential integrity. A table can have more than one foreign key. If there are
two tables, table A with primary key X and table B. To link both tables, table B should also
have same attribute X in it, then X would be called foreign key in B. The table which is being
refernced (table A) is called referenced relation and corresponding attribute is called
referenced attribute and the table (table B) which refers to referenced table is called
referencing relation and corresponding attribute is called referencing attribute.

Prime Attribute:

An attribute which is part of candidate key is known as prime attribute. 

Integrity Constraints:

Integrity constraints provide a way of ensuring that changes made to the database by authorized users do not result in a loss of data consistency.

Entity Integrity Constraint:

The entity integrity constraint states that no primary key value
can be NULL . This is because the primary key value is used to identify individual tuples in a
relation. Having NULL values for the primary key implies that we cannot identify some
tuples. Integrity constraint is defined on individual relations.

Referential Integrity Constraint:

The referential integrity constraint is specified between
two relations and is used to maintain the consistency among tuples in the two relations.
Informally, the referential integrity constraint states that a tuple in one relation that refers to
another relation must refer to an existing tuple in that relation.


Joins

Join is a method to retrieve data from two or more tables which is joined to appear as single table based on a related column between them. Joining operation is done on two tuples from different
tables, if and only if a given join condition is satisfied.

Cross Join:

Cross join returns cross product of tables in join. It will return a table which
consists of records which combines each row from the first table with each row of the second
table.

Inner Join:  

Inner join is based on the fact that only the matching entries (entries which are
present in both the tables) in both tables should be listed.

Theta Join:

Theta Join combines two tuples from different tables if they satisy the theta
condition. Theta join allows all operators like >,<,>= etc.

Equi Join:

When Join uses only equality (=) comparison operator, it is said to be equi
join.

Natural Join:

Natural Join does not use any comparison operator. Natural join can be
performed only if there is at least one common attribute that exists between tables. In
addition, the common attributes must have the same name and domain. It acts on those
matching attributes where the values of attributes in both the tables are same. The
resulting joined table contains only one column for each pair of equally-named
attributes.

Outer Join:

Outer Joins include all the tuples (matched and unmatched) from the
participating tables in the resulting table.

Left Outer Join: 

This join operation returns all rows from the left table in conjunction
with the matching rows from the right table. If there are no matching for the left table in
the right table, it will return NULL values for attributes of right table in resulting table.

Right Outer Join: 

This join operation returns all rows from the right table in
conjunction with the matching rows from the left table. If there are no matching for the
right table in the left table, it will return NULL values for attributes of left table in
resulting table.

Full Outer Join: 

This join combines Left Outer Join and Right Outer Join. It returns a
table that will include all the tuples from both the participating tables. If there are no
matching tuples present in other table, then their respectice unmatched attributes are
made NULL.


SQL

SQL stands for Structured Query Language. SQL is an ANSI (American National Standards
Institute) standard computer language for accessing and manipulating database systems. SQL
statements are used to retrieve and update data in a database. Commands in SQL are further
categorised according to their functions as given below:

D D L:

D D L stands for Data Definition Language. A data base schema is specifies by a set of
definitions expressed by a special language called DDL. SQL commands like CREATE (to
create table schema), ALTER(to change table schema), DROP (to delete table schema) and
RENAME (to rename table schema) come under this.

D M L:

D M L stands for Data Manipulation Language. This language enables user to access or
manipulate data as organised by appropriate data model. SQL commands like SELECT (to
retrieve data from table), INSERT (to insert data into table) and UPDATE (to update data in
table) come under this.

D C L:

D C L stands for Data Control Language. It is used to create roles, permissions and
referential integrity as well it is used to control access to database by securing it. SQL
commands like GRANT (to give database access privilege to a user) and REVOKE (to
withdraw database access privilege from user) come under this.

T C L:

T C L stands for Transaction Control Language. It is used to manage different
transactions occuring within a database. SQL commands like COMMIT (to save work done in
database) and ROLLBACK (to restore database to original state since the last COMMIT
command in transactions) come under this. 

S D L:

S D L stands for Storage Definition Language and is used to specify the internal schema.
This language may specify the mapping between two schemas.

Entity-Relationship Model 

ER Model is a flowchart representation of “real-world entities” such as people, objects or concepts and relationships among them. While formulating real-world scenario into the database model, the ER Model creates entity set, relationship set, general attributes and constraints.

Entity:

 An entity can be a real-world object, either animate or inanimate, that can be easily
identifiable.For example, in a school database, students, teachers, classes, and courses offered can
be considered as entities. n E-R Diagram, an entity is represented using rectangles. 

Weak entity:

An entity that depends on another entity. Weak entity doesn't have key attribute of
their own. Double rectangle represents weak entity.

Attribute:

Every entity in ER Model have some properties called attributes of that entity. Each
attribute is defined by its set of values. There exists a domain or range of values that can be
Entity Weak Entity Attribute assigned to attributes. For example, in a school database, a student entity has various attributes like name, age, class, etc. An attribute is represented using eclipse.

 Types of Attribute

 Key Attribute:

 Key attribute represents the main characterstic of an Entity. It is used to represent
Primary key. For example, roll number of a student is key attribute which uniquely identifies each
student.

Composite Attribute: 

An attribute can also have their own attributes. These attributes are known as
Composite attribute. For example, address attribute can have its own attributes like stree, city, state,
etc.

Derived Attribute:

Derived attributes are the attributes that do not exist in the physical database,
but their values are derived from other attributes present in the database. For example, age of a
person can be dervied from its date of birth.

Multivalued Attribute:

Multi-value attributes may contain more than one values. For example, a
person can have more than one phone number, email_address, etc.
 Primary Key Attribute Composite Attribute Derived Attribute Multivalued Attribute
Relationship: Relationship describes the logical association among entities. For example, a student
enrolls in a subject. A relationship too can have attributes. Cardinality defines the number of entities in one entity set, which can be associated with the
number of entities of other set via relationship set. Mapping cardinalities are further divided into
four parts: 
• One to One : One entity from entity set A can be associated with at most one entity of entity
set B and vice versa. For example, One country can have only one president and vice versa.
• One to Many: One entity from entity set A can be associated with more than one entities of
entity set B however an entity from entity set B, can be associated with at most one entity. For
example, a single college has many students but a single student belongs to only one college.
• Many to One: More than one entities from entity set A can be associated with at most one
entity of entity set B, however an entity from entity set B can be associated with more than
one entity from entity set A. For example, a student enrolls for only one course but a course
can have many students.
• Many to Many: One entity from A can be associated with more than one entity from B and
vice versa. For example, multiple students have multiple subjects.
Relationship
Weak 
Relationship
Country have President
College have Student
Student enrolled Course
Functional Dependencies

Functional Dependency (FD) defines a dependency/relationship among two subsets of attributes A
and B of a given relation. For functional dependency A→B, it specifies that the value of A
determines a unique value of B in all states of a relation. In simple words, if two tuples have same
values for attributes A1 ,A2 ,...,An , then those two tuples must have to have same values for
attributes B1 ,B2 ,...,Bn .



Armstrong Axioms:
If F is a set of functional dependencies then the closure of F, denoted as F , is the set of all functional dependencies logically implied by FD. Armstrong's Axioms are a set of rules, that when applied repeatedly, generates a closure of functional dependencies.

Reflexive rule:
 If X is a set of attributes and Y is subset of X, then X holds Y.

Augmentation rule: 
If A→B holds and Y is attribute set, then AY→BY also holds. That is adding attributes in dependencies, does not change the basic dependencies.

Transitivity rule:
 Same as transitive rule in algebra, if A→B holds and B→C holds, then A→C also holds. A→B is called as a functionally that determines B.

Trivial Functional Dependency: 
If a functional dependency (FD) X→Y holds, where Y is a subset of X, then it is called a trivial FD. Trivial FDs always hold.

Normalization

Normalization is the process of analyzing the given relation schemas based on their functionaldependecies and primary keys by splitting relations into well structured relations that allow users to insert, delete and update tuples without introducing data inconsistencies. Without normalization
many problems can occur when trying to load an integrated conceptual model into the DBMS.
These problems arise from relations that are generated directly from user views are called
anomalies. There are three types of anomalies: update, deletion and insertion anomalies. Main
objective of normalization is to (1) minimize redundancy and (2) minimize the insertion, deletion
and update anomalies.

Insertion Anomaly: 

An insertion anomaly is the inability to add data to the database due to absence of other data.

Deletion Anomaly:

 A deletion anomaly is the unintended loss of data due to deletion of other data.

Update Anomaly:

An update anomaly is a data inconsistency that results from data redundancy and a partial update

Types of Normalization

First Normal Form:

 According to 1 st Normal Form, no two rows must contain repeating group of
information i.e. each set of column must have a unique value, such that multiple columns cannot be
used to fetch the same row. It simplies each attribute in a relation. The table cells must be of single
value i.e. any row having a column in which more than one value is saved must be separated in
multiple rows. Separate table for each set of related data is created and each table must be identified
Student study Subject
with a primary key.

Second Normal Form:

 In 2NF, the table is required in 1NF. According to Second Normal Form,
there must not be any partial dependency of any column on primary key. It means that each column
in the table that is not part of primary key should depend fully on primary key (primary key must be
a composite key). To convert tables into second normal form, create separate table with the
functionally dependent data and the part of the key on which it depends.

Third Normal Form: 

In 3NF, the table is required in 2NF. A table is said to be in 3NF if there is
not any transitive functional dependency i.e. every non-prime attribute of table must be non
transitively dependent on primary key. There should not be the case that a non-prime attribute is
determined by another non-prime attribute. To convert tables into third normal form, create a
separate table containing the attribute and the fields that are functionally dependent on it. 3NF is
used to reduce duplication of data by minimizing transitive redundancy.

Boyce-Code Normal Form:

 BCNF is a higher version of the 3NF. When a relation has more than one candidate key, anomalies may result even though the relation is in 3NF. 3NF does not deal satisfactorily with the case of a relation with overlapping candidate keys i.e. composite candidate keys with at least one attribute in common. BCNF is based on the concept of a determinant. A determinant is any attribute (simple or composite) on which some other attribute is fully functionally dependent. A relation is in BCNF is, and only if, every determinant is a candidate key i.e. a 3NF table which does not have multiple overlapping candidate keys. The difference between 3NF and BCNF is that for a functional dependency A→B, 3NF allows this dependency in a relation if B is a primary-key attribute and A is not a candidate key, Whereas BCNF insists that for this dependency to remain in a relation, A must be a candidate key.

Fourth Normal Form:

 In 4NF, the table is required in 3NF. 4NF doesnot contain more than one
multivalued dependency i.e. no entity can have more than a single one-to-many relationship within
an entity if the one-to-many attributes are independent of each other. It does not have non-trivial
multivalued dependencies other than a candidate key. It applies to situations involving many-to￾many relationships.

Fifth Normal Form:

 A table is in fifth normal form (5NF) or Project-Join Normal Form (PJNF) if
it is in 4NF and it cannot have a lossless decomposition into any number of smaller tables.
Transactions

A transaction is a very small unit of a program and it may contain several low level tasks. A
transaction in a database system must maintain Atomicity, Consistency, Isolation, and Durability −
commonly known as ACID properties − in order to ensure accuracy, completeness, and data
integrity.

Atomicity: 

This property states that a transaction must be treated as an atomic unit, that is,
either all of its operations are executed or none. There must be no state in a database where a
transaction is left partially completed. States should be defined either before the execution of
the transaction or after the execution/abortion/failure of the transaction.

Consistency:

 The database must remain in a consistent state after any transaction. No
transaction should have any adverse effect on the data residing in the database. If the database
was in a consistent state before the execution of a transaction, it must remain consistent after
the execution of the transaction as well.

Durability:

 The database should be durable enough to hold all its latest updates even if the
system fails or restarts. If a transaction updates a chunk of data in a database and commits,
then the database will hold the modified data. If a transaction commits but the system fails
before the data could be written on to the disk, then that data will be updated once the system
springs back into action.

Isolation:

 In a database system where more than one transaction are being executed
simultaneously and in parallel, the property of isolation states that all the transactions will be
carried out and executed as if it is the only transaction in the system. No transaction will
affect the existence of any other transaction.

Trigger:

 A trigger is a code that associated with insert, update or delete operations. The code is
executed automatically whenever the associated query is executed on a table. Triggers can be useful
to maintain integrity in database.

Stored Procedure:

 A stored procedure is like a function that contains a set of operations compiled
together. It contains a set of operations that are commonly used in an application to do some
common database tasks.

Difference between Trigger and Stored Procedure:

 Unlike Stored Procedures, Triggers cannot be called directly. They can only be associated with queries.

Checkpoint:

 A Checkpoint is like a snapshot of the DBMS state. By taking checkpoints, the DBMS
can reduce the amount of work to be done during restart in the event of subsequent crashes.

Indexes:

 A database index is a data structure that improves the speed of data retrieval operations on
a database table at the cost of additional writes and the use of more storage space to maintain the
extra copy of data. Data can be stored only in one order on disk. To support faster access according
to different values, faster search like binary search for different values is desired, For this purpose,
indexes are created on tables. These indexes need extra space on disk, but they allow faster search
according to different frequently searched values.

Clustered and Non-Clustered Indexes:

 Clustered indexes is the index according to which data is physically stored on disk. Therefore, only one clustered index can be created on a given database table. Non-clustered indexes don’t define physical ordering of data, but logical ordering. Typically, a tree is created whose leaf point to disk records. B Tree or B+Tree are used for this purpose. Query Optimization: The phase that identifies an efficient execution plan for evaluating a query that has the least estimated cost is referred to as query optimization.

Concurrency Control:  

Concurrency control is the process managing simultaneous operations against a database so that database integrity is no compromised. There are two approaches to concurrency control.


Comments

Popular posts from this blog

DIFFRENCE BETWEEN JAVA AND C++

Java Programming JAVA :- 1. Java is true Object oriented language. 2. Java does not support operator overloading. 3. It supports labels with loops and statement blocks . 4. Java does not have template classes as in C++. 5. Java compiled into byte code for the Java Virtual Machine. The source code is independent   on operating system. 6. Java does not support multiple inheritance of classes but it supports interface. 7. Runs in a protected virtual machine. 8 . Java does not support global variable. Every variable should declare in class. 9.  Java does not use pointer. 10. It Strictly enforces an object oriented Programming paradigm. C++ :- 1. C++ is basically C with Object-oriented extension. 2. C++ supports operator overloading. 3. It supports go to statement. 4. C++ has template classes. 5. Source code can be written to be platform independent C++ typically compiled into machine code. 6. C++ supports multiple inheritance ...