Saturday, 24 February 2018

File System

File System
A file can be "free formed", indexed or structured collection of related bytes having meaning only to the one who created it. Or in other words an entry in a directory is the file. The file may have attributes like name, creator, date, type, permissions etc.

File Structure

A file has various kinds of structure. Some of them can be :
  • Simple Record Structure with lines of fixed or variable lengths.
  • Complex Structures like formatted document or reloadable load files.
  • No Definite Structure like sequence of words and bytes etc.

Attributes of a File

Following are some of the attributes of a file :
  • Name . It is the only information which is in human-readable form.
  • Identifier. The file is identified by a unique tag(number) within file system.
  • Type. It is needed for systems that support different types of files.
  • Location. Pointer to file location on device.
  • Size. The current size of the file.
  • Protection. This controls and assigns the power of reading, writing, executing.
  • Time, date, and user identification. This is the data for protection, security, and usage monitoring.

File Access Methods

The way that files are accessed and read into memory is determined by Access methods. Usually a single access method is supported by systems while there are OS's that support multiple access methods.
Sequential Access
  • Data is accessed one record right after another is an order.
  • Read command cause a pointer to be moved ahead by one.
  • Write command allocate space for the record and move the pointer to the new End Of File.
  • Such a method is reasonable for tape.
Direct Access
  • This method is useful for disks.
  • The file is viewed as a numbered sequence of blocks or records.
  • There are no restrictions on which blocks are read/written, it can be dobe in any order.
  • User now says "read n" rather than "read next".
  • "n" is a number relative to the beginning of file, not relative to an absolute physical disk location.
Indexed Sequential Access
  • It is built on top of Sequential access.
  • It uses an Index to control the pointer while accessing files.

What is a Directory?

Information about files is maintained by Directories. A directory can contain multiple files. It can even have directories inside of them. In Windows we also call these directories as folders.
Following is the information maintained in a directory :
  • Name : The name visible to user.
  • Type : Type of the directory.
  • Location : Device and location on the device where the file header is located.
  • Size : Number of bytes/words/blocks in the file.
  • Position : Current next-read/next-write pointers.
  • Protection : Access control on read/write/execute/delete.
  • Usage : Time of creation, access, modification etc.
  • Mounting : When the root of one file system is "grafted" into the existing tree of another file system its called Mounting.

Virtual Memory

Virtual Memory is a space where large programs can store themselves in form of pages while their execution and only the required pages or portions of processes are loaded into the main memory. This technique is useful as large virtual memory is provided for user programs when a very small physical memory is there.
In real scenarios, most processes never need all their pages at once, for following reasons :
  • Error handling code is not needed unless that specific error occurs, some of which are quite rare.
  • Arrays are often over-sized for worst-case scenarios, and only a small fraction of the arrays are actually used in practice.
  • Certain features of certain programs are rarely used.

Benefits of having Virtual Memory :
  1. Large programs can be written, as virtual space available is huge compared to physical memory.
  2. Less I/O required, leads to faster and easy swapping of processes.
  3. More physical memory available, as programs are stored on virtual memory, so they occupy very less space on actual physical memory.

Demand Paging

The basic idea behind demand paging is that when a process is swapped in, its pages are not swapped in all at once. Rather they are swapped in only when the process needs them(On demand). This is termed as lazy swapper, although a pager is a more accurate term.
Initially only those pages are loaded which will be required the process immediately.
The pages that are not moved into the memory, are marked as invalid in the page table. For an invalid entry the rest of the table is empty. In case of pages that are loaded in the memory, they are marked as valid along with the information about where to find the swapped out page.
When the process requires any of the page that is not loaded into the memory, a page fault trap is triggered and following steps are followed,
  1. The memory address which is requested by the process is first checked, to verify the request made by the process.
  2. If its found to be invalid, the process is terminated.
  3. In case the request by the process is valid, a free frame is located, possibly from a free-frame list, where the required page will be moved.
  4. A new operation is scheduled to move the necessary page from disk to the specified memory location. ( This will usually block the process on an I/O wait, allowing some other process to use the CPU in the meantime. )
  5. When the I/O operation is complete, the process's page table is updated with the new frame number, and the invalid bit is changed to valid.
  6. The instruction that caused the page fault must now be restarted from the beginning.
There are cases when no pages are loaded into the memory initially, pages are only loaded when demanded by the process by generating page faults. This is called Pure Demand Paging.
The only major issue with Demand Paging is, after a new page is loaded, the process starts execution from the beginning. Its is not a big issue for small programs, but for larger programs it affects performance drastically.

Page Replacement

As studied in Demand Paging, only certain pages of a process are loaded initially into the memory. This allows us to get more number of processes into the memory at the same time. but what happens when a process requests for more pages and no free memory is available to bring them in. Following steps can be taken to deal with this problem :
  1. Put the process in the wait queue, until any other process finishes its execution thereby freeing frames.
  2. Or, remove some other process completely from the memory to free frames.
  3. Or, find some pages that are not being used right now, move them to the disk to get free frames. This technique is called Page replacement and is most commonly used. We have some great algorithms to carry on page replacement efficiently.

Basic Page Replacement

  • Find the location of the page requested by ongoing process on the disk.
  • Find a free frame. If there is a free frame, use it. If there is no free frame, use a page-replacement algorithm to select any existing frame to be replaced, such frame is known as victim frame.
  • Write the victim frame to disk. Change all related page tables to indicate that this page is no longer in memory.
  • Move the required page and store it in the frame. Adjust all related page and frame tables to indicate the change.
  • Restart the process that was waiting for this page.

FIFO Page Replacement

  • A very simple way of Page replacement is FIFO (First in First Out)
  • As new pages are requested and are swapped in, they are added to tail of a queue and the page which is at the head becomes the victim.
  • Its not an effective way of page replacement but can be used for small systems.

LRU Page Replacement

Below is a video, which will explain LRU Page replacement algorithm in details with an example.

Thrashing

A process that is spending more time paging than executing is said to be thrashing. In other words it means, that the process doesn't have enough frames to hold all the pages for its execution, so it is swapping pages in and out very frequently to keep executing. Sometimes, the pages which will be required in the near future have to be swapped out.
Initially when the CPU utilisation is low, the process scheduling mechanism, to increase the level of multi-programming loads multiple processes into the memory at the same time, allocating a limited amount of frames to each process. As the memory fills up, process starts to spend a lot of time for the required pages to be swapped in, again leading to low CPU utilisation because most of the processes are waiting for pages. Hence the scheduler loads more processes to increase CPU utilisation, as this continues at a point of time the complete system comes to a stop.
To prevent thrashing we must provide processes with as many frames as they really need "right now".

Memory Management

Memory Management

  • Main Memory refers to a physical memory that is the internal memory to the computer. The word main is used to distinguish it from external mass storage devices such as disk drives. Main memory is also known as RAM. The computer is able to change only data that is in main memory. Therefore, every program we execute and every file we access must be copied from a storage device into main memory.
  • All the programs are loaded in the main memory for execution. Sometimes complete program is loaded into the memory, but some times a certain part or routine of the program is loaded into the main memory only when it is called by the program, this mechanism is called Dynamic Loading, this enhance the performance.
  • Also, at times one program is dependent on some other program. In such a case, rather than loading all the dependent programs, CPU links the dependent programs to the main executing program when its required. This mechanism is known as Dynamic Linking.

Swapping


  • A process needs to be in memory for execution. But sometimes there is not enough main memory to hold all the currently active processes in a timesharing system. So, excess process are kept on disk and brought in to run dynamically. Swapping is the process of bringing in each process in main memory, running it for a while and then putting it back to the disk.

Contiguous Memory Allocation


  • In contiguous memory allocation each process is contained in a single contiguous block of memory. Memory is divided into several fixed size partitions. Each partition contains exactly one process. When a partition is free, a process is selected from the input queue and loaded into it. The free blocks of memory are known as holes. The set of holes is searched to determine which hole is best to allocate.

Memory Protection


  • Memory protection is a phenomenon by which we control memory access rights on a computer. The main aim of it is to prevent a process from accessing memory that has not been allocated to it. Hence prevents a bug within a process from affecting other processes, or the operating system itself, and instead results in a segmentation fault or storage violation exception being sent to the disturbing process, generally killing of process.

Memory Allocation


  • Memory allocation is a process by which computer programs are assigned memory or space. It is of three types :


  1. First Fit-The first hole that is big enough is allocated to program.
  2. Best Fit-The smallest hole that is big enough is allocated to program.
  3. Worst Fit-The largest hole that is big enough is allocated to program.



Fragmentation


  • Fragmentation occurs in a dynamic memory allocation system when most of the free blocks are too small to satisfy any request. It is generally termed as inability to use the available memory.
  • In such situation processes are loaded and removed from the memory. As a result of this, free holes exists to satisfy a request but is non contiguous i.e. the memory is fragmented into large no. Of small holes. This phenomenon is known as External Fragmentation.
  • Also, at times the physical memory is broken into fixed size blocks and memory is allocated in unit of block sizes. The memory allocated to a space may be slightly larger than the requested memory. The difference between allocated and required memory is known as Internal fragmentation i.e. the memory that is internal to a partition but is of no use.

Paging


  • A solution to fragmentation problem is Paging. Paging is a memory management mechanism that allows the physical address space of a process to be non-contagious. Here physical memory is divided into blocks of equal size called Pages. The pages belonging to a certain process are loaded into available memory frames.

Page Table


  • A Page Table is the data structure used by a virtual memory system in a computer operating system to store the mapping between virtual address and physical addresses.
  • Virtual address is also known as Logical address and is generated by the CPU. While Physical address is the address that actually exists on memory.

Segmentation


  • Segmentation is another memory management scheme that supports the user-view of memory. Segmentation allows breaking of the virtual address space of a single process into segments that may be placed in non-contiguous areas of physical memory.

Segmentation with Paging

  • Both paging and segmentation have their advantages and disadvantages, it is better to combine these two schemes to improve on each. The combined scheme is known as 'Page the Elements'. Each segment in this scheme is divided into pages and each segment is maintained in a page table. So the logical address is divided into following 3 parts :
  • Segment numbers(S)
  • Page number (P)
  • The displacement or offset number (D)