Basic SQL

1. Intro to SQL

1.1. SQL的特点

SQL (Structured Query Language) a general purpose, highly functional, declarative language specifically designed to work with relational databases, is the standard language for relational database management systems. It has the following characteristics:

  • 综合统一 (DDL + DML + DCL, 并且可以独立完成数据库生命周期中的全部活动)
  • 高度非过程化
  • Set-oriented 的操作方式 (而非关系数据模型采用面向记录的操作方式, 操作对象是一条记录)
  • 以同一种语法结构提供两种使用方法 (既是独立的语言, 又可以嵌入到高级语言如C++, Java中)
  • 语言简洁, 易学易用 (it’s core function only requires 9 verbs)
    • 数据定义: CREATE, DROP, ALTER
    • 数据查询: SELECT
    • 数据操作: INSERT, UPDATE, DELETE
    • 数据控制: GRANT, REVOKE

There are three major components of SQL:

  • DDL: data definition language (create, alter, drop)
  • DML: data manipulation language (insert, update, delete)
  • DCL: data control language

1.2. SQL与数据库三级模式

SQL (Structured Query Language) 是一种关系型数据库的标准查询语言, 是管理和操作关系型数据库的重要工具之一. 与此同时, 数据库通常被描述为三级模式的组合, 包括外模式, 模式和内模式.

  1. 外模式 (External Schema) : 也称用户模式, 是数据库系统中最高层次的抽象, 它定义了用户可以访问和操作的数据视图. 外模式是基于用户需求和角色设计的, 不同的用户可能有不同的外模式. 例如, 一个销售人员只需要访问客户订单信息, 而不需要了解订单的生产过程;而一个生产经理则需要访问订单的详细信息以及生产过程的相关数据.
  2. 模式 (Conceptual Schema) : 也称全局模式或逻辑模式, 是数据库系统的中间层, 它定义了整个数据库的逻辑结构和关系. 模式是由数据库管理员设计和维护的, 它表示了所有数据的集合和它们之间的关系, 但并不涉及具体的存储方式和物理结构.
  3. 内模式 (Internal Schema) : 也称存储模式或物理模式, 是数据库系统的最底层, 它定义了数据在物理媒介上的存储形式和访问路径. 内模式包括数据项的实际存储方式, 索引结构, 文件存储方式等, 它是由数据库管理系统 (DBMS) 根据模式和外模式转换而来的.

总体来说, 外模式, 模式和内模式构成了数据库系统的三级模式, 这种分层结构提供了一种灵活和高效的方式来设计和管理数据库. 通过将数据和应用程序之间的逻辑隔离开来, 可以简化不同用户之间的访问过程, 并且增强了数据库的可维护性和安全性. 同时, SQL 作为一种标准查询语言, 可以方便地操作和管理不同级别的数据.

基本表

  • 本身独立存在的表
  • 一个关系对应一个基本表
  • 一个 (或多个) 基本表对应一个存储文件
  • 一个表可以带若干索引

存储文件

  • 逻辑结构组成了关系数据库的内模式
  • 物理结构对用户是隐蔽的

视图

  • 从一个或几个基本表导出的表
  • 数据库中只存放视图的定义而不存放视图对应的数据
  • 视图是一个虚表
  • 用户可以在视图上再定义视图

2. DDL

可以根据菜鸟教程速成学习基本的SQL语句, 效果还不错

2.1. Creating Table

student entoll class

Consider such a E-R diagram

Student(snum:int, sname:String, major:String, level:String, age:int)
Class(cname:String, meets_at:String,room:String, fid:int)
Enroll(snum:int, cname:String)

The the SQL code for creating the tables is

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
CREATE TABLE Student (
snum BIGINT,
sname VARCHAR(50),
major VARCHAR(50),
level CHAR(2),
age INT,
PRIMARY KEY (snum)
);
CREATE TABLE Class (
cname VARCHAR(50),
meets_at VARCHAR(50),
room VARCHAR(50),
fid BIGINT,
PRIMARY KEY (cname)
);
CREATE TABLE Enroll (
snum BIGINT,
cname VARCHAR(50),
PRIMARY KEY (snum, cname), -- 复合主键
FOREIGN KEY (snum) REFERENCES Student(snum), -- 确保外键的完整性
FOREIGN KEY (cname) REFERENCES Class(cname) -- 确保外键的完整性
);

However if I forgot to add the two foreign keys, there are two ways to add them

  1. drop the table and recreate it

    1
    2
    DROP TABLE Enroll;
    -- continue to create the table
  2. use ALTER TABLE to add the foreign key

    1
    2
    3
    4
    ALTER TABLE Enroll 
    add FOREIGN KEY(snum) REFERENCES student(snum)
    ON UPDATE CASCADE;
    -- Add a foreign key constraint to table enroll

2.2. Inserting Data

1
2
INSERT INTO Student (snum, sname, major, level, age)
VALUES (123456789, 'Smith', 'CS', 'SR', 18);

Note that you don’t have to list all the columns, but you have to list the columns in the same order as the values
if no column is specified, then the values must be in the same order as the columns in the table

1
2
INSERT INTO table_name (column1,column2,column3,...)
VALUES (value1,value2,value3,...);

2.3. Deleting Tuple

1
2
DELETE FROM Student WHERE age < 18; -- 删除年龄小于18的学生
DELETE FROM Student; -- 删除所有学生

2.4. Updating Tuple

1
2
UPDATE Student SET major = 'International Study' 
WHERE snum = 123456789;

2.5. Select-From-Where Statements

The principal form of a query is

1
2
3
4
SELECT A1, A2, ..., An -- desired attributes
FROM R1, R2, ..., Rm -- one or more tables, i.e. relation list
WHERE P; -- selection condition
-- ORDER BY A1, A2, ..., An; -- optional

Intuitively, semantics of an SQL query is defined in terms of the following conceptual evaluation strategy:
– Compute the cross-product of relation-list.
– Apply selection (using qualifications) to discard resulting tuples if they fail qualifications.
– Apply projection to delete attributes that are not in target-list.

But this is just too SLOW!!

There are numerous ways to optimize the query, such as

  • Indexing: to speed up the search
  • Query Optimization: to find the best way to execute the query
  • Materialized Views: to store the result of the query

Set VS. Bag Semantics:

  • Set: a collection of distinct elements
  • Bag: a collection of elements that may contain duplicates (default in SQL)

By adding DISTINCT to the SELECT statement, we can eliminate duplicates

1
2
SELECT DISTINCT age
FROM Student;

2.6. Arithmetic Expressions

In SELECT clause

1
2
SELECT sname, age, age + 5
FROM Student;

In WHERE clause

1
2
3
SELECT sname, age
FROM Student
WHERE age + 5 > 20;

2.7. Like Predicate

Similar to regular expression

1
2
3
SELECT sname
FROM Student
WHERE sname LIKE 'A%'; -- find all students whose name starts with 'A'
  • ‘%’: any string of zero or more characters, similar to * in regular expression
  • ‘_’: any single character, similar to . in regular expression

2.8. Order By

1
2
3
4
SELECT sname, age
FROM Student
ORDER BY age DESC; -- sort the result by age in descending order
-- by defualt, it's ascending order

2.9. Where clause

Attribute names of the relations appearing in FROM clause

  • Comparison operators: =, <>, <, >, <=, >=
  • Arithmetic operations: +, -, /, *
  • AND, OR and NOT to combine/negate conditions
  • Operations on strings: concatenation, etc.
  • Pattern matching: s LIKE p
  • Special functions for comparing dates and times

2.10. Aliases

1
2
3
SELECT s.sname 
FROM Student s, Enroll e
WHERE s.snum = e.snum AND e.cname LIKE 'Database Systems'

A general form is

1
2
3
SELECT a1, a2, ..., ak
FROM R1 AS x1, R2 AS x2, ..., Rn AS xn -- AS is an optional keyword
WHERE <conditions>;

A psudocode is like the following

1
2
3
4
5
6
7
8
answer := {}
for x1 in R1:
for x2 in R2:
...
for xn in Rn:
if <conditions>:
answer := answer {(a1,…,ak)}
return answer

3. Other

3.1. 层次化的数据库对象命名机制

  • 一个关系数据库管理系统的实例 (Instance) 中可以建立多个数据库
  • 一个数据库中可以建立多个模式
  • 一个模式下通常包括多个表, 视图和索引等数据库对象

3.2. 数据定义

  • 模式定义: 模式(schema) 实际上是一个namespace
  • 表定义, 删除: 相关细节 (RESTRICT / CASCADE )
  • 视图和索引定义

索引是用户不可见的, 由DBA或者表的构建者创建, 由数据库 (自动) 维护

3.3. SQL中的空值

关于空值有几个平凡的注意点

  • NOT NULL 约束条件的不能取空值

  • 加了UNIQUE 限制的属性不能取空值

  • 键属性 (key attributes) 不能取空值

  • 空值与另一个值 (包括另一个空值) 的算术运算的结果为空值

  • 空值与另一个值 (包括另一个空值) 的比 较运算的结果为 UNKNOWN.

  • UNKNOWN 后, 传统二值 (TRUE, FALSE) 逻辑就扩展成了三值逻辑

3.4. View (视图)

视图是一种设计, 这篇博客介绍了为什么要设计视图