加载中...

CMU445-Project1-BufferPoolManagerInstance总结

发表于2023-07-01|更新于2025-12-18|编程

|总字数:1.1k|阅读时长:4分钟|浏览量:|评论数:

Buffer Pool

Disk-Oriented DBMS

什么是buffer pool，buffer pool有什么用（针对Disk-Oriented DBMS）

The database is all on disk, and the data in the database files is organized into pages.
In order to operate on the data the DBMS needs to bring the data into memory. It does this by having a buffer pool that manages the movement back and forth between disk and memory.
The DBMS also have an execution engine that will execute queries.
The execution engine will ask the buffer pool for a specific page, and the buffer pool will take care of bringing that page into memory and giving the execution engine a pointer to the page in memory.
The buffer pool manager will ensure that the page is there while the execution engine is operating on that memory.

Buffer Pool Organization

buffer pool是如何利用内存的，组织数据的方式

Memory region organized as an array of fixed-size pages.An array entry is called a frame.
When the DBMS requests a page, an exact copy is placed into one of these frames.

Meta-data maintained by the buffer pool

Page Table

In-memory page table (hash table) that keeps track of pages that are currently in memory. It maps page ids to frame locations in the buffer pool.
其实就是为了根据page id快速找到对应的page data，还可以判断page id对应的数据有没有在buffer pool

Dirty Flag

Threads set this flag when it modifies a page.
This indicates to storage manager that the page must be written back to disk.
其实就是判断page data有没有被修改过，如果修改过在某个时候需要将修改写回到磁盘

Pin Counter

This tracks the number of threads that are currently accessing that page (either reading or modifying it).
A thread has to increment the counter before they access the page.
If a page’s count is greater than zero, then the storage manager is not allowed to evict that page from memory.

Structure

buffer pool：BufferPoolManagerInstance，本次lab需要实现的部分
磁盘读写相关：DiskManager，已经提供，不需要实现
class Page
- Each Page object contains a block of memory that the DiskManager will use as a location to copy the contents of a physical page that it reads from disk. 将磁盘中内容读到内存中，放到page对象中
- The BufferPoolManagerInstance will reuse the same Page object to store data as it moves back and forth to disk. This means that the same Page object may contain a different physical page throughout the life of the system. 重复利用page object，所以page object里面的内容不是固定不变的
- The Page object’s identifer (page_id) keeps track of what physical page it contains. page_id代表了里面的数据对应哪个phusical page
- page具体内容存放在data_，metadata对应上面提到的两个is_dirty_,pin_count_
page table：使用自己实现的ExtendibleHashTable类，maps page_id to frame_id
buffer pool满了如何evict page：使用LRU-K算法，对应自己实现的LRUKReplacer

Buffer Pool Manager Instance

size_t pool_size_代表了buffer pool中能容纳多少个fixed-size page
Page *pages_代表了buffer pool中的page，其大小是固定的pages_ = new Page[pool_size_];代表一开始buffer pool有多少个空的page。其实就是下面这个东西，通过frame_id索引到某个page obejct
std::list<frame_id_t> free_list_代表有哪些frame_id对应的page object还可以用来存放从磁盘中读取到的page，首先从这里拿，如果free_list满了，代表buffer page中没有多余的空间了，需要evict page到disk从而reuse page承载新的内容（如果可以）
ExtendibleHashTable<page_id_t, frame_id_t> *page_table_，根据pageid可以拿到frame_id，`pages[frame_id]`可以拿到buffer pool中的某个page

Notes

NewPageImp和FetchPageImp从free_list或者LRUKReplacer拿到一个page后:
1. 如果page是dirty，需要先将其内容写回到disk，ResetMemory，pin_count需要置为0，is_dirty需要置为false
2. 需要pin_count++
3. 将该page_id从page_table中移除
4. 更新page_id（NewPage是新创建一个，FetchPage是传入page_id）
5. 将page_id和frame_id对应关系存到page_table
6. 调用replacer的SetEvictable和RecordAccess
这里其实可以更细化，比如如果是从free_list中拿的，可以不校验dirty，因为只有在DeletePgImp方法中才将frame_id放到free_list，放之前已经将其对应的memory和metadata清空。
FetchPageImp需要先判断pageid对应的数据有没有已经在page table，如果在代表数据在buffer pool已经有了，不需要从disk中拿，直接返回`pages[frame_id]即可。返回前需要pin_count++`并调用replacer的SetEvictable和RecordAccess。
UnpinPgImp传入的dirty代表是否改变dirty flag，如果是false代表维持当前page的dirty flag，所以这里需要if判断一下才赋值
FlushPgImp记得将对应的page的dirty flag置为false
DeletePgImp需要将对应page_id从page table，replacer中移除。需要Reset Memory和metadata。
从上面可以看出，很多地方需要reset page，所以这里可以抽出一个方法，避免忘记初始化某个东西。

文章作者: GreenHatHg

文章链接: https://greenhathg.github.io/2023/07/01/CMU445-Project1-BufferPoolManagerInstance%E6%80%BB%E7%BB%93/

版权声明: 本博客所有文章除特别声明外，均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来源 GreenHatHGのBlog！

cmu445 bufferpool

相关推荐

CMU445-Project1-ExtendibleHashTable总结

前提：搞懂https://www.geeksforgeeks.org/extendible-hashing-dynamic-approach-to-dbms/ Whyhttps://15445.courses.cs.cmu.edu/fall2022/project1/ 待补充，可看原论文设计要点： Page Fault Access Memory/Disk Dynamic File Organization Balance Radix Search Tree Static & Dynamic Hash Concepts $h$: fixed hash function $K$: is a key $K’$: $h(K)$, also $pseudokey$. We choose pseudokeys to be of fixed length, such as 32 bits. The pseudokeys are of fixed length, the keys need not be. $I(K)$: is associated information:...

CMU445-Project0-primer

背景链接：https://15445.courses.cs.cmu.edu/fall2022/project0/ 目的：因为后续的项目都是使用c++编写，所以提供一个入门c++项目给新手熟悉下。要求：给出基本的代码框架，填充核心代码实现trie树的插入、查找、删除。最终要求为并发版本，但是不要求性能，所以可以一把大锁直接梭哈。环境：使用clang-12，c++17标准使用到的常见cpp知识点： unique_ptr管理内存资源类的常见知识，比如构造函数，类方法等 rvalue与move shared_mutex trie树简单记录下trie树作用又称前缀树或字典树，存储公共前缀字符串比较高效，一般用于字符串查找。在项目中的应用是将(key, value)插入到trie树，类似hashmap的作用，但是使用trie树实现。普通的trie树如下：项目中要求实现的trie树：上面是将(ab, 1)和(ac, “val”)插入到trie。注意项目中的value都是同类型的，这里是不同类型结构给出的代码框架中总共有三个类 TrieNode：除了带有valu...

CMU445-Project2-BPlusTree-Delete-Single-threaded

单线程版 B+树删除操作 Delete在删除时用到两个重要的属性是，也就是node最多有多少个search-key value，最少有多少个，根据这个可以决定采用什么措施去维持整个树的平衡 Each leaf can hold up to n−1 search-key values. We allow leaf nodes to contain as few as ⌈(n−1)/2⌉ search-key values. Each nonleaf node in the tree (other than the root) has between ⌈n/2⌉ and n children. 下面举例课本中的删除过程，粗略了解一下删除的过程：已经有一棵B+树如下，从leaf node可知N=4 所以leaf node最多放3个search-key value，最少2个search-key value；nonleaf node最多4个children，最少2个children。 Coalesce一开始要删除Srini这个Index，首先查找到Srini位于哪个leaf no...

CMU445-Project1-LRU-K总结

https://15445.courses.cs.cmu.edu/fall2022/project1/ DB的Buffer Replacement Policies中的LRU和Clock算法都容易受到sequential flooding的影响，对于LRU来讲，使用LRU-K算法能够缓解这个问题。 LRU-K This component is responsible for tracking page usage in the buffer pool. The LRU-K algorithm evicts a frame whose backward k-distance is maximum of all frames in the replacer. Backward k-distance is computed as the difference in time between current timestamp and the timestamp of kth previous access. A frame with less than k historical ac...

CMU445-Project1-BPlusTree-Insert-Single-threaded

单线程版 B+树插入操作 Overview Index: The index in database system is responsible for fast data retrieval without having to search through every row in a database table, providing the basis for both rapid random lookups (快速随机查找) and efficient access of ordered records. B+Tree dynamic index structure: It is a balanced tree in which the internal pages direct the search and leaf pages contains actual data entries.B+ Tree properties Each node except root can have a maximum of M children and at least ce...

评论

数据加载中