Graph 的演算法及資料結構

Graph’s algorithm and data structure

9 min readOct 23, 2020

有新資訊會更新，若有錯誤請留點告知，感恩

Graph 表示物件之間的關係，並且最重要的是 Edge

Content:
+ 起源
+ Application of Graphs
+ Graph composition
+ Path and Connectivity
+ Connected component
+ Trees in Graph
+ Graph 最重要的兩種表示法
+ Graph Traversal
+ Breadth-First Search (BFS)
+ Depth-First Search (DFS)
+ Connectivity in Directed Graphs
+ DAGs and Topological Ordering
— — — — — — — — — — — — — — — —
+ Greedy algorithms
+ Spanning Tree
+ The Shortest Path Problem
— — — — — — — — — — — — — — — —
+ TSP (Travel Salesman Problem)
+ AOV
+ AOE
+ Flow Network
+ References

起源

18世紀的 Euler ( 也是 Graph 之父) 的 The seven bridge problem

The seven bridge problem

Q: Is is possible to walk across all the bridges exactly once and return to the starting land area?

A: Euler proofs that it is impossible (因為與土地連接的橋為奇數，必須要是偶數)

Application of Graphs

Analysis of electrical circuits (E.g. VC 電路)
Finding shortest routes (E.g. 導航、Google map)
Project planning
Identification of chemical compounds
Social sciences

Graph composition

A graph G = (V, E) => V: vertex(ices) = node(s), E: Edge
e = {u, v} (e 是node u and v 的 edge)

Very often the edges of a graph habe weights associated with them

An undirected graph: {u, v} == {v, u}
A directed graph: (u, v) != (v, u)

if e = (u, v)
then u 是 v 的鄰居 (但 v 不是 u 的鄰居) => Adjacency

Adjacency: v is one of u’s neighbor and there is an edge (u ,v)

|V| = the number of nodes = n
|E| = the number of edges = m
我們希望花費 < O(m+n) 時間讀取 G

()有方向性，{}沒有方向性

Path and Connectivity

Path: a sequence of vertices w1, w2, w3,…, wN where (wi, w(i+1)) ∈ E, ∀ 1 <= i <= N
node u and v is connected, if there is a path from u to v (不需要有 edge).
distance: u and v 間經過的最少 edge 數

Connected components

A connected Component: 一群 nodes 可以到達彼此

Strongly Connected Components

E.g.
v = a, b
e = (a, b), (b, a)

Find all nodes reachable from s

// R will consist of nodes to which s has a path
1. initialize R = {s}
2. while (there is an edge (u, v) where u∈R and v∉R)
R = R + {v}

=> Proof:

要證明的點：

R 中的 nodes 是否最終都能連到 s (has a path)
R 外不能有漏網之魚

反證：

假設在 R 之外存在 node w ，卻和 node s (R內) 間有 path

=> 必然存在一個在 R 內的 node u 與在 R 外的 node v 相連

=> an edge (u, v) + u∈R and v∉R

=> 與演算法 conflict

Trees in Graph

connected and no cycle

property

G is connected
G does not contain a cycle
G has n-1 edges

=> G is a tree if it satisfies any two of the three statements，並且任意兩個 statements 可以推導第三 statement

Graph 最重要的兩種表示法

Adjacency Matrix
Adjacency List

儲存所有 node 之後可以產生應用，例如找出 spanning tree

Adjacency Matrix

The Adjacency Matrix of G is an n*n matrix A where

A[u, v] = 1, if (u, v) ∈ E
A[u, v] = 0, otherwise
Space: 𝚯(|V|²), good for dense, not for sparse (不適合稀疏)
Undirected graph: symmetric matrix

Adjacency List

An Array for nodes
每個 index 掛上它的鄰居
Space: O(|V|+|E|), good for sparse
比 Matrix 好用

Adjacency: v is one of u’s neighbor and there is an edge (u ,v)

Adjacency Matrix vs. Adjacency List

Faster to find an edge? Matrix => O(1)
Faster to find degree (連出去的 edge)? List => time < O(n)
Faster to traverse the graph? List => time = O(nodes + the number of edges) < Matrix: O(n²)
Storage for sparse graph? List
Storage for dense graph? Matrix
Edge insertion or deletion more easily? Matrix
Better for most applications? List

Graph Traversal

1. Breadth-First Search (BFS)
2. Depth-First Search (DFS)

Breadth-First Search (BFS)

一層一層向下搜
使用 queue

應用

例如小畫家倒水上色

Adjacency list is ideal for implementing BFS
=> 演算法在 Lec07 演算法第四週課程 (1/2) 1:15:45

Testing Bipartiteness

A bipartite graph is a graph whose nodes can be patitioned into sets X and Y in such a way that every edge has one one end in X and the other end in Y (Edge 必定是在兩個 set 之間，同一個 set 間不會有 edge)

等於 Two coloring 問題

應用：晶圓製程

Depth-First Search (DFS)

Data structure: stack

Connectivity in Directed Graphs

Strong component (Strongly connected component): 一個 ”maximal” set 裡面的點皆互為 mutually reachable

Theorem: For any two nodes s and t in a directed graph, their strong components are either identical or disjoint.

Can we determine if a graph is strongly connected in linear time?

Yes.

1. pick any node s in G
2. R = BFS(s, G) // save all reachable nodes in R set
3. Rrev = BFS(s, Grev) // Grev = 所有 edge 方向翻轉
4. if (R==V==Rrev) then return true else falseTime: O(n)

DAGs and Topological Ordering

Directed Acyclic Graphs (有向無環圖)

a.k.a DAG
A directed graph without cycles
當問題有依存關係 (dependency) 或者先後關係 (precedence constraints) 經常使用

4 drivers come to the junction “simultaneously”, who goes first?

四台車「同時」到交叉路口，誰先走？

=> Deadlock! Dependencies from a cycle

Topological Ordering

把 directed graph 裡所有的點拿出來做排序

Lemma: If G has a topological ordering, then G is a DAG (反之亦然)
Lemma: In every DAG G, there is a node with no incoming edges.

TopologicalOrder(G)
1. find a node v without incoming edges
2. order v
3. G = G - {v} // delete v from G
4. if(G is not empty) then TopologicalOrder(G)

Time: from O(n²) to O(m+n)

Greedy algorithms

Interval scheduling: The greedy algorithm stays ahead (永遠取第一名的)
Scheduling to minimize lateness: An exchange argument
Shortest paths
The minimum spanning tree problem
implementing Kruskal’s algorithm: Union-find

What is a greedy algorithm?

一種短視近利的方法
Build up a solution in small steps, choosing a decision at each step myopically => 以當下最好的解作為選擇 => Intuitive and fast but usually not optimal
兩種常用證明 optimal 的方式：stays ahead, exchange argument

The Interval Scheduling Problem

應用：幫 CPU 排程、演藝廳借場地
要解決的問題：在固定時間內完成最多 request
Single resource => 單核

Greedy Rule
1. Repeat
=> use a simple rule to select a first request i1
=> Once i1 is selected, reject all request not compatible with i1.
2. Until run out of requests

Better chosen rule: earliest finish time (even better then fewest conflicts) => principle: 最早釋放 resource

Interval-Scheduling(R)
// R: undetermined requests; A: accepted requests
1. A = {}; //空集合 (empty set)
2. while (R is not empty) do 
3. choose a request i ∈ R with minimum end-time(i) // greedy rule => sorting 由小到大 => O(nlogn)
4. A = A + {i} 
5. R = R - {i} - X, where X = {j: j ∈ R and j is not compatible with i} => 每個 request 比較 => O(n^2)
6. return ARunning time from O(n^2) to O(nlogn)
用 construct S 紀錄 all requests 開始的時間
=> 加速 3~5 step to O(n)

Proof by induction:

=> 假設 O 為最佳解，我們要證明我們的解 A， |A| = |O|

=> Stay ahead: A 的每一步都比 O 快

The Interval partitioning Problem

Multiple resources
希望用最少的 resource 完成全部的 requests (這裡等於 intervals)

=> 先看 depth (單一時間軸重疊的 requests 數, resources 至少要大於這個數目) => 通常用 depth 數量的 resources 就能 schedule all requests

Interval-Partition(R)
1. {l1, ..., ln} = sort intervals(requests) in ascending order of their start time
2. for j from 1 to n do
// label {1, 2, ..., d} == the number of resources
3.   exclude the labels of all assigned intervals that are not compatible with lj
4.   if(there is a nonexcluded label from {1, 2, ..., d}) then assign a nonexcluded label to lj
6.   else leave lj unlabeled // 這行寫了但是不會發生// 每個 label 都有一個 end time

Scheduling to minimize lateness

A single resource and each request with a deadline
Goal: Schedule all requests without overlapping so as to minimize the max. lateness => 只取最大遲交值，沒有累加
Greedy rule: Earliest deadline first

Min-Lateness(R, beginTime)
1. {d1, ..., dn} = sort requests in ascending order of their deadlines
2. endTime, beginTime = 0
3. for i from 1 to n do
4.   assign request i to the time interval from (beginTime(i) = endTime) to (endTime(i) = endTime+ ti)
5.   endTime = f + ti
6. return the set of scheduled intervals

No idle time => 沒有空閒時間
Exchange argument: Gradually transform an optimal solution to the one found by the greedy algorithm without hurting its quality. => 超常用證明法
No Inversions

An inversion: 有人 deadline 比較晚卻比較早放進 schedule 中

Theorem: The greedy schedule S is optimal
Proof by contradiction
+ Let O be an optimal schedule with inversions // 不管是怎麼找到的
+ Assume O has no idle time //有就壓縮
+ If O has no inversions, then S = O. (done)
+ If O has an inversion, let i-j be an adjacent inversion
=> Swapping i and j does not increase the max. lateness and strictly decreases the number of inversions. // cut and paste
=> This contradicts definition of O.

Spanning Tree

A tree consisting solely of edges in G and including all vertices in G in called a spanning tree
= 希望用最少的 edge 把 G 中的所有頂點都連接起來
可以用 DFS, BFS 找出 spanning tree
no cycle

Minimal Cost Spanning Tree

Four greedy-method algorithms to find: Kurskal’s algorithm, Prim’s algorithm, Sollin’s algorithm and reverse-delete algorithm
Cut property: Let S be any subset of nodes, and let e = (v, w) be the minimum cost edge with one end in S and the other in V-S. Then every MST contains e.
Cycle property: Let C be any cycle in G, an let e = (v, w) be the maximum cost edge in C. Then e does not belong to any MST.

Kurskal’s algorithm

每次都選最小 cost 的「邊」，若產生 cycle 就跳下一個邊

Kruskal(G, C)
1. {all edges in ascending order of their costs}
2. T = {}
3. for each ei = (u, v) do 
4.   if (u and v are not connected by edges in T) then // different subtrees
5.     T = T + {ei} // merge these two corresponding subtrees

Data structure: Union-find => represents disjoint sets (很多集合但裡面元素不重複) => each set has a representative => merge sets with respective representatives

disjoint set: 例如一所學校的班級

Prim’s algorithm

任選一個「頂點」開始往外擴散
與 Dijkstra’s Algorithm 類似
Data structure: min heap
Cut property

Prim(G, c)
// S: the set of explored nodes
// for each u ∈ S, we store a shortest path distance c(u) (tree edge)
1. init S = {s}, d(s) = 0
2. While S != V do // 找最靠近的新的點
3.   select a node v ∉ S with at lease one edge from S for which d'(v) = min(c)
4. add v to S and define d(v) = d'(v)

Sollin’s algorithm

一次選多個邊
不重要

Reverse-delete algorithm

原理：n 個 node 只需要 n-1 條 edge
一開始看整個 Graph，從 cost 大到小刪除 edge
基本沒人用

Spanning Tree Protocol

用於 OSI 第二層，應用是防止冗餘鏈路產生，避免佔用 Router 資源

Biconnected Components

<空>

The Shortest Path Problem

Given:
+ Directed graph G = (V, E)
=> Length le = length of edge e = (u, v) ∈ E
+ Source s
Goal:
Shortest path Pv from s to each other node v ∈ V - {s}
=> 到所有 nodes 的最短路徑總和

Dijkstra’s Algorithm

Edsger W. Dijkstra 1959

Edge 沒有負值時使用
用 min heap 儲存每隔點的最小距離 d

Dijkstra(G, l)
// S: the set of explored nodes
// for each u ∈ S, we store a shortest path distance d(u) from s (start node) to u 
1. init S = {s}, d(s) = 0
2. While S != V do // 找最靠近的新的點
3.   select a node v ∉ S with at lease one edge from S for which d'(v) = min(d(u)+le)
4. add v to S and define d(v) = d'(v)

Bellman and Ford Algorithm

Edge 可以為負值
可以一次走 k 個邊 (k>0)
Dynamic programming

Condition: no negative cycle

Negative Cycles

假如存在 negative cycle ，the shortest path 無法吐出最短路徑
Bellman and Ford 可以找出 negative cycle

All pairs

<空>

TSP (Travel Salesman Problem)

可以用 Dynamic programming 但卻不是最佳解的原因是： TSP 不是 nature order

AOV (Activity on Vertex)

A directed graph
the vertices represent tasks or activities
the edges represent precedence relations between tasks

Topological order

AOV 任兩個 vertex 的先後順序會 maintain 在 output sequence 上
order 不是唯一

應用

大學修課系統 => 例如：要先修 programming I 才能修 Data Structures

AOE (Activity on Edge)

計算每一個 node 最早的到達時間

Critical Path Analysis

找出起點和終點之間，不允許 delay 的 nodes

應用

控制 project 時間

Flow Network

有「物質」在節點中流動

A flow network G = (V, E) is a directed graph

每一條 edge 可以看做「水管」，有 capacity
有 source node (generates the flow), sink node (absorbs the flow), internal node（剩下的 node）

Examples

Water pipes and water
Transportation network and traffic （運輸管理用）
Computer network and packets
Circuit network (wires) and current

應用

資源分配問題

The Maximum Flow Problem

properties:

Capacity conditions: 0 ≤ f(e) ≤ ce => 每條水管的水量必須大於0且小於水管 capacity
Conservation conditions: 除了 source and sink ，其他 internal node 的送進水流＝送出水流

Q: Can we find the upper bound of the flow? (確認水流上限) 
A: Yes!
Find cuts of min capacity = find max flow

Ford-Fulkersion: Residual Graph

Procedure
1. Start with f(e) = 0 for all edge e ∈ E and construct the residual graph
2. Find an source-sink path P in the residual graph
3. Augment flow along path P and update the residual graph
4. Repeat steps 2-3 until you get stuck

Bipartite Matching