Welcome to user documentation

Get all the help for easy navigation

Understanding the process of chunking


Chunking is a process of splitting large files into smaller units of files called chunks. In some applications, such as a remote data compression, data synchronization and data duplication. Chunking is important because it determines the duplicate detection performance of the system. Chunk file is a small and handy application designed to help you split large into pieces of a set size, so you can easily transfer them without any loss of data.

What is ChunkIT?


ChunkIt is a free web-based platform that splits large or heavy CSV and JSON files. When a user uploads a large file of up to 250MB to our platform, our Python Panda modules begin authenticating the files to determine if they are CSV or JSON. If the files are none of these determined inputs or bigger than the size, our platform would not accept them, but if the data input is true, it accepts the files after determining the state, and the process of splitting commences through the help of the panda modules. The data is stacked in smaller files querying the number of parts or the size the user needs to split it. This is done through the use of logic. When the process of splitting is completed, the result is zipped using shutil modules, ready for the user to download.

Getting started


For a user to get started with using the chunking feature of the platform, they need to create an account by registering with their email address to become an authenticated user. This means that an unauthenticated user can not use the chunking feature. However, the unauthenticated user can interact with the platform’s documentation by accessing the documentations tab in the header and footer sections. They can also go through the platform’s landing page and FAQ section to learn more about its features.

How to use ChunkIT


Upon creating an account, the user is redirected to the user dashboard where they can start uploading files they want to process. They choose the option to upload a new file, and a screen appears that allows them to upload a file. The platform currently supports chunking JSON and CSV files; however, more file formats will be supported in future versions. The user then uploads their file by either dragging and dropping it on the screen, or browsing through their device's file system. The platform can only accept files up to 250 MB in size and are also of the correct format, CSV or JSON. Once the file uploaded by the user satisfies the requirements, the user can choose the size of the chunk files they require. The size of the chunk should not be above the original file size. If it is, the file will not be chunked. If the size of the chunk is within the acceptable range, the user is directed to a screen where they can download a zipped file containing the chunked file. The user can also choose to download the files later, and their files are saved on the dashboard.

Library Page


Users can view saved files, download chunked files, download the original file and delete some or all saved files by visiting the library page.

Upload File Modal


In this modal you can upload the files you want to chunk. To start chunking, this is the step by step process you should take;
First click the “upload file” button
Next select files from your local device, github or any other storage platform you use
Next drop the selected file inside the box in the upload file modal
Next click the number of chunks you want your file to be and start chunking.

Generated File Modal


Once your files are generated into smaller chunks, you can download the files immediately or later, to download later click on the download later button.

Delete File Modal


This is the modal that helps to delete files that are no longer needed.
To delete:
After opening your saved file modal, a list of all saved files will be shown along with a delete icon.
Click on the delete icon (bin icon).
A delete confirmation pop-up will come up. Click on “delete files” to proceed.

Technologies used


Django

Django is a high-level Python web framework that enables rapid development of secure and maintainable websites. Django takes care of much of the hassle of web development, so you can focus on writing your app without needing to reinvent the wheel. It is free and open source, has a thriving and active community, great documentation, and many options for free and paid-for support.

Why Django framework?

Django's primary goal is to ease the creation of complex, database-driven websites. The framework emphasizes reusability and "pluggability" of components, less code, low coupling, rapid development, and the principle of don't repeat yourself.Python is used throughout, even for settings, files, and data models. Django also provides an optional administrative create, read, update and delete interface that is generated dynamically through introspection and configured via admin models.

Pandas

Pandas is an open-source library that is made mainly for working with relational or labeled data both easily and intuitively. It provides various data structures and operations for manipulating numerical data and time series. Pandas is fast and it has high performance & productivity for users.

Advantages of Using Pandas are;

Fast and efficient for manipulating and analyzing data.

Data from different file www2 be loaded.

Easy handling of missing data (represented as NaN) in floating point as well as non-floating point data.

Size mutability: columns can be inserted and deleted from DataFrame and higher dimensional objects.

Data set merging and joining.

Flexible reshaping and pivoting of data sets.

Provides time-series functionality.

Powerful group by functionality for performing split-apply-combine operations on data sets.

Getting Started with Work

The first step of working in pandas is to ensure whether it is installed in the Python folder or not. If not then we need to install it in our system using the pip command. Type cmd command in the search box and locate the folder using cd command where python-pip file has been installed. After locating it, type the command: pip install pandas. After the pandas have been installed into the system, you need to import the library. This module is generally imported as: import pandas as pd.

Limit

The long answer is the size limit for pandas DataFrames is 100 gigabytes (GB) of memory instead of a set number of cells.

Authorization

Pandas is a free open-source library, no authorization is needed before using, the only thing you need is to have python installed in your system and run pip install pandas after that import the module as follow: Import pandas as pd Then you are free to use it.

The Operating System(OS) module

The OS module in Python provides functions for creating and removing a directory (folder), fetching its contents, changing and identifying the current directory etc. You first need to import the OS module to interact with the underlying operating system. For this project the OS module is used to create a folder for the chunks files for easy zipping.

The Shutil Module

The shutil module offers several high-level operations on files and collections of files. The shutil module helps you automate copying files and directories. This saves the steps of opening, reading, writing and closing files when there is no actual processing. It is a utility module which can be used to accomplish tasks, such as: copying, moving, or removing directory trees. But for the purpose of this project, the shutil module is only used to zip the folder created by the OS module

Datetime Module

In Python, date and time are not a data type of their own, but a module named datetime can be imported to work with the date as well as time. Python Datetime module comes built into python, so there is no need to install it externally. So before starting with this module we need to import it. Python datetime module supplies classes to work with date and time. These classes provide a number of functions to deal with dates, times and time intervals. Date and datetime are an object in Python, so when you manipulate them, you are actually manipulating objects and not string or timestamps.

Privacy Policy


ChunkIt uses modern encryption and secure development processes to keep your data secure. ChunkIt respects your right to privacy when you make use of our software. Your personal data is treated in compliance with data protection laws. Please send any reports to (insert email). We will reply as soon as possible. Chunkit will never sell your personal data,we can only share some part of your data with any third party client only on your permission. But chunkit will not be responsible for any misuse of personal data by a third party client.