Basic SQL

1. Intro to SQL

1.1. SQL的特点

SQL (Structured Query Language) a general purpose, highly functional, declarative language specifically designed to work with relational databases, is the standard language for relational database management systems. It has the following characteristics:

  • 综合统一 (DDL + DML + DCL, 并且可以独立完成数据库生命周期中的全部活动)
  • 高度非过程化
  • Set-oriented 的操作方式 (而非关系数据模型采用面向记录的操作方式, 操作对象是一条记录)
  • 以同一种语法结构提供两种使用方法 (既是独立的语言, 又可以嵌入到高级语言如C++, Java中)
  • 语言简洁, 易学易用 (it’s core function only requires 9 verbs)
    • 数据定义: CREATE, DROP, ALTER
    • 数据查询: SELECT
    • 数据操作: INSERT, UPDATE, DELETE
    • 数据控制: GRANT, REVOKE

There are three major components of SQL:

  • DDL: data definition language (create, alter, drop)
  • DML: data manipulation language (insert, update, delete)
  • DCL: data control language

1.2. SQL与数据库三级模式

SQLStructured Query Language是一种关系型数据库的标准查询语言, 是管理和操作关系型数据库的重要工具之一. 与此同时, 数据库通常被描述为三级模式的组合, 包括外模式, 模式和内模式.

  1. 外模式External Schema: 也称用户模式, 是数据库系统中最高层次的抽象, 它定义了用户可以访问和操作的数据视图. 外模式是基于用户需求和角色设计的, 不同的用户可能有不同的外模式. 例如, 一个销售人员只需要访问客户订单信息, 而不需要了解订单的生产过程而一个生产经理则需要访问订单的详细信息以及生产过程的相关数据.
  2. 模式Conceptual Schema: 也称全局模式或逻辑模式, 是数据库系统的中间层, 它定义了整个数据库的逻辑结构和关系. 模式是由数据库管理员设计和维护的, 它表示了所有数据的集合和它们之间的关系, 但并不涉及具体的存储方式和物理结构.
  3. 内模式Internal Schema: 也称存储模式或物理模式, 是数据库系统的最底层, 它定义了数据在物理媒介上的存储形式和访问路径. 内模式包括数据项的实际存储方式, 索引结构, 文件存储方式等, 它是由数据库管理系统DBMS根据模式和外模式转换而来的.

总体来说, 外模式, 模式和内模式构成了数据库系统的三级模式, 这种分层结构提供了一种灵活和高效的方式来设计和管理数据库. 通过将数据和应用程序之间的逻辑隔离开来, 可以简化不同用户之间的访问过程, 并且增强了数据库的可维护性和安全性. 同时, SQL 作为一种标准查询语言, 可以方便地操作和管理不同级别的数据.

基本表

  • 本身独立存在的表
  • 一个关系对应一个基本表
  • 一个或多个基本表对应一个存储文件
  • 一个表可以带若干索引

存储文件

  • 逻辑结构组成了关系数据库的内模式
  • 物理结构对用户是隐蔽的

视图

  • 从一个或几个基本表导出的表
  • 数据库中只存放视图的定义而不存放视图对应的数据
  • 视图是一个虚表
  • 用户可以在视图上再定义视图

2. DDL

可以根据菜鸟教程速成学习基本的SQL语句, 效果还不错

2.1. Creating Table

student entoll class

Consider such a E-R diagram

Student(snum:int, sname:String, major:String, level:String, age:int)
Class(cname:String, meets_at:String,room:String, fid:int)
Enroll(snum:int, cname:String)

The the SQL code for creating the tables is

CREATE TABLE Student (
  snum BIGINT,
  sname VARCHAR(50),
  major VARCHAR(50),
  level CHAR(2),
  age INT,
  PRIMARY KEY (snum)
);
CREATE TABLE Class (
  cname VARCHAR(50),
  meets_at VARCHAR(50),
  room VARCHAR(50),
  fid BIGINT,
  PRIMARY KEY (cname)
);
CREATE TABLE Enroll (
  snum BIGINT,
  cname VARCHAR(50),
  PRIMARY KEY (snum, cname), -- 复合主键
  FOREIGN KEY (snum) REFERENCES Student(snum), -- 确保外键的完整性
  FOREIGN KEY (cname) REFERENCES Class(cname) -- 确保外键的完整性
);

However if I forgot to add the two foreign keys, there are two ways to add them

  1. drop the table and recreate it

    DROP TABLE Enroll;
    -- continue to create the table
    
  2. use ALTER TABLE to add the foreign key

    ALTER TABLE Enroll 
    add FOREIGN KEY(snum) REFERENCES student(snum) 
    ON UPDATE CASCADE; 
    -- Add a foreign key constraint to table enroll
    

2.2. Inserting Data

INSERT INTO Student (snum, sname, major, level, age)
VALUES (123456789, 'Smith', 'CS', 'SR', 18);

Note that you don’t have to list all the columns, but you have to list the columns in the same order as the values
if no column is specified, then the values must be in the same order as the columns in the table

INSERT INTO table_name (column1,column2,column3,...)
VALUES (value1,value2,value3,...);

2.3. Deleting Tuple

DELETE FROM Student WHERE age < 18; -- 删除年龄小于18的学生
DELETE FROM Student; -- 删除所有学生

2.4. Updating Tuple

UPDATE Student SET major = 'International Study' 
WHERE snum = 123456789;

2.5. Select-From-Where Statements

The principal form of a query is

SELECT A1, A2, ..., An -- desired attributes
FROM   R1, R2, ..., Rm -- one or more tables, i.e. relation list
WHERE  P;              -- selection condition
-- ORDER BY A1, A2, ..., An; -- optional

Intuitively, semantics of an SQL query is defined in terms of the following conceptual evaluation strategy:
– Compute the cross-product of relation-list.
– Apply selection (using qualifications) to discard resulting tuples if they fail qualifications.
– Apply projection to delete attributes that are not in target-list.

But this is just too SLOW!!

There are numerous ways to optimize the query, such as

  • Indexing: to speed up the search
  • Query Optimization: to find the best way to execute the query
  • Materialized Views: to store the result of the query

Set VS. Bag Semantics

  • Set: a collection of distinct elements
  • Bag: a collection of elements that may contain duplicates (default in SQL)

By adding DISTINCT to the SELECT statement, we can eliminate duplicates

SELECT DISTINCT age
FROM Student;

Arithmetic Expressions

In SELECT clause

SELECT sname, age, age + 5
FROM Student;

In WHERE clause

SELECT sname, age
FROM Student
WHERE age + 5 > 20;

Like Predicate

Similar to regular expression

SELECT sname
FROM Student
WHERE sname LIKE 'A%'; -- find all students whose name starts with 'A'
  • ‘%’: any string of zero or more characters, similar to * in regular expression
  • ‘_’: any single character, similar to . in regular expression

Order By

SELECT sname, age
FROM Student
ORDER BY age DESC; -- sort the result by age in descending order
-- by defualt, it's ascending order

Where clause

Attribute names of the relations appearing in FROM clause

  • Comparison operators: =, <>, <, >, <=, >=
  • Arithmetic operations: +, -, /, *
  • AND, OR and NOT to combine/negate conditions
  • Operations on strings: concatenation, etc.
  • Pattern matching: s LIKE p
  • Special functions for comparing dates and times

Aliases

SELECT s.sname 
FROM Student s, Enroll e 
WHERE s.snum = e.snum AND e.cname LIKE 'Database Systems'

A general form is

SELECT a1, a2, ..., ak
FROM R1 AS x1, R2 AS x2, ..., Rn AS xn -- AS is an optional keyword
WHERE <conditions>;

A psudocode is like the following

answer := {}
for x1 in R1:
  for x2 in R2:
    ...
      for xn in Rn:
        if <conditions>:
          answer := answer  {(a1,…,ak)}
return answer

2.1. 层次化的数据库对象命名机制

  • 一个关系数据库管理系统的实例Instance中可以建立多个数据库
  • 一个数据库中可以建立多个模式
  • 一个模式下通常包括多个表, 视图和索引等数据库对象

2.2. 数据定义

  • 模式定义
  • 表定义
  • 视图和索引定义

模式(schema) 实际上是一个namespace

模式的创建, 删除 以及相关细节

基本表的创建

基本表中的数据类型

基本表的修改

基本表的删除, 以及相关细节 (RESTRICT / CASCADE )

索引(indexing): 用来加快查询速度

索引是用户不可见的, 由DBA或者表的构建者创建, 由数据库 (自动) 维护

索引的创建

索引的修改, 删除

11. SQL中的空值

关于空值有几个平凡的注意点

  • 有NOT NULL约束条件的不能取空值

  • 加了UNIQUE限制的属性不能取空值

  • 码属性不能取空值

  • 空值与另一个值包括另一个空值的算 术运算的结果为空值

  • 空值与另一个值包括另一个空值的比 较运算的结果为UNKNOWN.

  • 有UNKNOWN后, 传统二值TRUE, FALSE逻辑就扩展成了三值逻辑

12. 视图

视图是一种设计, 这篇博客介绍了为什么要设计视图