Graph 的演算法及資料結構

Graph’s algorithm and data structure

R4 Cheng
9 min readOct 23, 2020

有新資訊會更新,若有錯誤請留點告知,感恩

Graph 表示物件之間的關係,並且最重要的是 Edge

Content:

+ 起源

+ Application of Graphs

+ Graph composition

+ Path and Connectivity

+ Connected component

+ Trees in Graph

+ Graph 最重要的兩種表示法

+ Graph Traversal

+ Breadth-First Search (BFS)

+ Depth-First Search (DFS)

+ Connectivity in Directed Graphs

+ DAGs and Topological Ordering

— — — — — — — — — — — — — — — —

+ Greedy algorithms

+ Spanning Tree

+ The Shortest Path Problem

— — — — — — — — — — — — — — — —

+ TSP (Travel Salesman Problem)

+ AOV

+ AOE

+ Flow Network

+ References

起源

18世紀的 Euler ( 也是 Graph 之父) 的 The seven bridge problem

The seven bridge problem

Q: Is is possible to walk across all the bridges exactly once and return to the starting land area?

A: Euler proofs that it is impossible (因為與土地連接的橋為奇數,必須要是偶數)

Application of Graphs

  • Analysis of electrical circuits (E.g. VC 電路)
  • Finding shortest routes (E.g. 導航、Google map)
  • Project planning
  • Identification of chemical compounds
  • Social sciences

Graph composition

  • A graph G = (V, E) => V: vertex(ices) = node(s), E: Edge
  • e = {u, v} (e 是node u and v 的 edge)

Very often the edges of a graph habe weights associated with them

  • An undirected graph: {u, v} == {v, u}
  • A directed graph: (u, v) != (v, u)
if e = (u, v)
then u 是 v 的鄰居 (但 v 不是 u 的鄰居) => Adjacency

Adjacency: v is one of u’s neighbor and there is an edge (u ,v)

  • |V| = the number of nodes = n
  • |E| = the number of edges = m
  • 我們希望花費 < O(m+n) 時間讀取 G

()有方向性,{}沒有方向性

Path and Connectivity

  1. Path: a sequence of vertices w1, w2, w3,…, wN where (wi, w(i+1)) ∈ E, ∀ 1 <= i <= N
  2. node u and v is connected, if there is a path from u to v (不需要有 edge).
  3. distance: u and v 間經過的最少 edge 數

Connected components

A connected Component: 一群 nodes 可以到達彼此

Strongly Connected Components

E.g.
v = a, b
e = (a, b), (b, a)

Find all nodes reachable from s

// R will consist of nodes to which s has a path
1. initialize R = {s}
2. while (there is an edge (u, v) where u∈R and v∉R)
R = R + {v}

=> Proof:

要證明的點:

  1. R 中的 nodes 是否最終都能連到 s (has a path)
  2. R 外不能有漏網之魚

反證:

假設在 R 之外存在 node w ,卻和 node s (R內) 間有 path

=> 必然存在一個在 R 內的 node u 與在 R 外的 node v 相連

=> an edge (u, v) + u∈R and v∉R

=> 與演算法 conflict

Trees in Graph

connected and no cycle

property

  1. G is connected
  2. G does not contain a cycle
  3. G has n-1 edges

=> G is a tree if it satisfies any two of the three statements,並且任意兩個 statements 可以推導第三 statement

Graph 最重要的兩種表示法

  1. Adjacency Matrix
  2. Adjacency List

儲存所有 node 之後可以產生應用,例如找出 spanning tree

Adjacency Matrix

The Adjacency Matrix of G is an n*n matrix A where

  • A[u, v] = 1, if (u, v) ∈ E
  • A[u, v] = 0, otherwise
  • Space: 𝚯(|V|²), good for dense, not for sparse (不適合稀疏)
  • Undirected graph: symmetric matrix

Adjacency List

  • An Array for nodes
  • 每個 index 掛上它的鄰居
  • Space: O(|V|+|E|), good for sparse
  • 比 Matrix 好用

Adjacency: v is one of u’s neighbor and there is an edge (u ,v)

Adjacency Matrix vs. Adjacency List

  • Faster to find an edge? Matrix => O(1)
  • Faster to find degree (連出去的 edge)? List => time < O(n)
  • Faster to traverse the graph? List => time = O(nodes + the number of edges) < Matrix: O(n²)
  • Storage for sparse graph? List
  • Storage for dense graph? Matrix
  • Edge insertion or deletion more easily? Matrix
  • Better for most applications? List

Graph Traversal

1. Breadth-First Search (BFS)

2. Depth-First Search (DFS)

Breadth-First Search (BFS)

  • 一層一層向下搜
  • 使用 queue

應用

例如小畫家倒水上色

Adjacency list is ideal for implementing BFS

=> 演算法在 Lec07 演算法 第四週課程 (1/2) 1:15:45

Testing Bipartiteness

A bipartite graph is a graph whose nodes can be patitioned into sets X and Y in such a way that every edge has one one end in X and the other end in Y (Edge 必定是在兩個 set 之間,同一個 set 間不會有 edge)

  • 等於 Two coloring 問題

應用:晶圓製程

Depth-First Search (DFS)

  • Data structure: stack

Connectivity in Directed Graphs

  • Strong component (Strongly connected component): 一個 ”maximal” set 裡面的點皆互為 mutually reachable

Theorem: For any two nodes s and t in a directed graph, their strong components are either identical or disjoint.

Can we determine if a graph is strongly connected in linear time?

Yes.

1. pick any node s in G
2. R = BFS(s, G) // save all reachable nodes in R set
3. Rrev = BFS(s, Grev) // Grev = 所有 edge 方向翻轉
4. if (R==V==Rrev) then return true else false
Time: O(n)

DAGs and Topological Ordering

Directed Acyclic Graphs (有向無環圖)

  • a.k.a DAG
  • A directed graph without cycles
  • 當問題有依存關係 (dependency) 或者先後關係 (precedence constraints) 經常使用

4 drivers come to the junction “simultaneously”, who goes first?

  • 四台車「同時」到交叉路口,誰先走?

=> Deadlock! Dependencies from a cycle

Topological Ordering

  • 把 directed graph 裡所有的點拿出來做排序

Lemma: If G has a topological ordering, then G is a DAG (反之亦然)

Lemma: In every DAG G, there is a node with no incoming edges.

TopologicalOrder(G)
1. find a node v without incoming edges
2. order v
3. G = G - {v} // delete v from G
4. if(G is not empty) then TopologicalOrder(G)
  • Time: from O(n²) to O(m+n)

Greedy algorithms

  • Interval scheduling: The greedy algorithm stays ahead (永遠取第一名的)
  • Scheduling to minimize lateness: An exchange argument
  • Shortest paths
  • The minimum spanning tree problem
  • implementing Kruskal’s algorithm: Union-find

What is a greedy algorithm?

  • 一種短視近利的方法
  • Build up a solution in small steps, choosing a decision at each step myopically => 以當下最好的解作為選擇 => Intuitive and fast but usually not optimal
  • 兩種常用證明 optimal 的方式:stays ahead, exchange argument

The Interval Scheduling Problem

  • 應用:幫 CPU 排程、演藝廳借場地
  • 要解決的問題:在固定時間內完成最多 request
  • Single resource => 單核
Greedy Rule
1. Repeat
=> use a simple rule to select a first request i1
=> Once i1 is selected, reject all request not compatible with i1.
2. Until run out of requests
  • Better chosen rule: earliest finish time (even better then fewest conflicts) => principle: 最早釋放 resource
Interval-Scheduling(R)
// R: undetermined requests; A: accepted requests
1. A = {}; //空集合 (empty set)
2. while (R is not empty) do
3. choose a request i ∈ R with minimum end-time(i) // greedy rule => sorting 由小到大 => O(nlogn)
4. A = A + {i}
5. R = R - {i} - X, where X = {j: j ∈ R and j is not compatible with i} => 每個 request 比較 => O(n^2)
6. return A
Running time from O(n^2) to O(nlogn)
用 construct S 紀錄 all requests 開始的時間
=> 加速 3~5 step to O(n)
  • Proof by induction:

=> 假設 O 為最佳解 ,我們要證明我們的解 A|A| = |O|

=> Stay ahead: A 的每一步都比 O

The Interval partitioning Problem

  • Multiple resources
  • 希望用最少的 resource 完成全部的 requests (這裡等於 intervals)

=> 先看 depth (單一時間軸重疊的 requests 數, resources 至少要大於這個數目) => 通常用 depth 數量的 resources 就能 schedule all requests

Interval-Partition(R)
1. {l1, ..., ln} = sort intervals(requests) in ascending order of their start time
2. for j from 1 to n do
// label {1, 2, ..., d} == the number of resources
3. exclude the labels of all assigned intervals that are not compatible with lj
4. if(there is a nonexcluded label from {1, 2, ..., d}) then assign a nonexcluded label to lj
6. else leave lj unlabeled // 這行寫了但是不會發生
// 每個 label 都有一個 end time

Scheduling to minimize lateness

  • A single resource and each request with a deadline
  • Goal: Schedule all requests without overlapping so as to minimize the max. lateness => 只取最大遲交值,沒有累加
  • Greedy rule: Earliest deadline first
Min-Lateness(R, beginTime)
1. {d1, ..., dn} = sort requests in ascending order of their deadlines
2. endTime, beginTime = 0
3. for i from 1 to n do
4. assign request i to the time interval from (beginTime(i) = endTime) to (endTime(i) = endTime+ ti)
5. endTime = f + ti
6. return the set of scheduled intervals
  • No idle time => 沒有空閒時間
  • Exchange argument: Gradually transform an optimal solution to the one found by the greedy algorithm without hurting its quality. => 超常用證明法
  • No Inversions

An inversion: 有人 deadline 比較晚卻比較早放進 schedule 中

Theorem: The greedy schedule S is optimal
Proof by contradiction
+ Let O be an optimal schedule with inversions // 不管是怎麼找到的
+ Assume O has no idle time //有就壓縮
+ If O has no inversions, then S = O. (done)
+ If O has an inversion, let i-j be an adjacent inversion
=> Swapping i and j does not increase the max. lateness and strictly decreases the number of inversions. // cut and paste
=> This contradicts definition of O.

Spanning Tree

  • A tree consisting solely of edges in G and including all vertices in G in called a spanning tree
  • = 希望用最少的 edge 把 G 中的所有頂點都連接起來
  • 可以用 DFS, BFS 找出 spanning tree
  • no cycle

Minimal Cost Spanning Tree

  • Four greedy-method algorithms to find: Kurskal’s algorithm, Prim’s algorithm, Sollin’s algorithm and reverse-delete algorithm
  • Cut property: Let S be any subset of nodes, and let e = (v, w) be the minimum cost edge with one end in S and the other in V-S. Then every MST contains e.
  • Cycle property: Let C be any cycle in G, an let e = (v, w) be the maximum cost edge in C. Then e does not belong to any MST.

Kurskal’s algorithm

  • 每次都選最小 cost 的「邊」,若產生 cycle 就跳下一個邊
Kruskal(G, C)
1. {all edges in ascending order of their costs}
2. T = {}
3. for each ei = (u, v) do
4. if (u and v are not connected by edges in T) then // different subtrees
5. T = T + {ei} // merge these two corresponding subtrees
  • Data structure: Union-find => represents disjoint sets (很多集合但裡面元素不重複) => each set has a representative => merge sets with respective representatives

disjoint set: 例如一所學校的班級

Prim’s algorithm

  • 任選一個「頂點」開始往外擴散
  • 與 Dijkstra’s Algorithm 類似
  • Data structure: min heap
  • Cut property
Prim(G, c)
// S: the set of explored nodes
// for each u ∈ S, we store a shortest path distance c(u) (tree edge)
1. init S = {s}, d(s) = 0
2. While S != V do // 找最靠近的新的點
3. select a node v ∉ S with at lease one edge from S for which d'(v) = min(c)
4. add v to S and define d(v) = d'(v)

Sollin’s algorithm

  • 一次選多個邊
  • 不重要

Reverse-delete algorithm

  • 原理:n 個 node 只需要 n-1 條 edge
  • 一開始看整個 Graph,從 cost 大到小刪除 edge
  • 基本沒人用

Spanning Tree Protocol

  • 用於 OSI 第二層,應用是防止冗餘鏈路產生,避免佔用 Router 資源

Biconnected Components

<空>

The Shortest Path Problem

Given:
+ Directed graph G = (V, E)
=> Length le = length of edge e = (u, v) ∈ E
+ Source s
Goal:
Shortest path Pv from s to each other node v ∈ V - {s}
=> 到所有 nodes 的最短路徑總和

Dijkstra’s Algorithm

Edsger W. Dijkstra 1959

  • Edge 沒有負值時使用
  • 用 min heap 儲存每隔點的最小距離 d
Dijkstra(G, l)
// S: the set of explored nodes
// for each u ∈ S, we store a shortest path distance d(u) from s (start node) to u
1. init S = {s}, d(s) = 0
2. While S != V do // 找最靠近的新的點
3. select a node v ∉ S with at lease one edge from S for which d'(v) = min(d(u)+le)
4. add v to S and define d(v) = d'(v)

Bellman and Ford Algorithm

  • Edge 可以為負值
  • 可以一次走 k 個邊 (k>0)
  • Dynamic programming

Condition: no negative cycle

Negative Cycles

  • 假如存在 negative cycle ,the shortest path 無法吐出最短路徑
  • Bellman and Ford 可以找出 negative cycle

All pairs

<空>

TSP (Travel Salesman Problem)

  • 可以用 Dynamic programming 但卻不是最佳解的原因是: TSP 不是 nature order

AOV (Activity on Vertex)

  • A directed graph
  • the vertices represent tasks or activities
  • the edges represent precedence relations between tasks

Topological order

  • AOV 任兩個 vertex 的先後順序會 maintain 在 output sequence 上
  • order 不是唯一

應用

  • 大學修課系統 => 例如:要先修 programming I 才能修 Data Structures

AOE (Activity on Edge)

  • 計算每一個 node 最早的到達時間

Critical Path Analysis

  • 找出起點和終點之間,不允許 delay 的 nodes

應用

  • 控制 project 時間

Flow Network

有「物質」在節點中流動

A flow network G = (V, E) is a directed graph

  • 每一條 edge 可以看做「水管」,有 capacity
  • 有 source node (generates the flow), sink node (absorbs the flow), internal node(剩下的 node)

Examples

  1. Water pipes and water
  2. Transportation network and traffic (運輸管理用)
  3. Computer network and packets
  4. Circuit network (wires) and current

應用

資源分配問題

The Maximum Flow Problem

properties:

  1. Capacity conditions: 0 ≤ f(e) ≤ ce => 每條水管的水量必須大於0且小於水管 capacity
  2. Conservation conditions: 除了 source and sink ,其他 internal node 的送進水流=送出水流
Q: Can we find the upper bound of the flow? (確認水流上限) 
A: Yes!
Find cuts of min capacity = find max flow

Ford-Fulkersion: Residual Graph

https://youtu.be/lY1SKWSR3pk
Procedure
1. Start with f(e) = 0 for all edge e ∈ E and construct the residual graph
2. Find an source-sink path P in the residual graph
3. Augment flow along path P and update the residual graph
4. Repeat steps 2-3 until you get stuck

Bipartite Matching

A bipartite graph is a graph whose nodes can be patitioned into sets X and Y in such a way that every edge has one one end in X and the other end in Y (Edge 必定是在兩個 set 之間,同一個 set 間不會有 edge)

用 maximal flow 解 Bipartite matching

References

  1. 江蕙如教授。Lec06 演算法 第三週課程 (2/2) (2015)
  2. 江蕙如教授。Lec07 演算法 第四週課程 (1/2) (2015)
  3. 江蕙如教授。Lec20 演算法 第十五週課程 (1/2) (2015)
  4. 彭文志教授。Lec18 資料結構 第十二週課程 (2013)

--

--

R4 Cheng
R4 Cheng

Written by R4 Cheng

「0」が過去で「1」が未来「今」は何処にもない

No responses yet