Transfer Large Files
Transfer large files is a key question when working with big data especially in geographically distributed systems such as cloud environments. There are several cloud services that offer functionality and there is also an open source protocol called Grid File Transfer Protocol (GridFTP). The specification GridFTP is an extension of the FTP protocol in order to enable the transfer of large quantities of data. The specification was created by the Open Grid Forum (OGF) and is adopted in a wide variety of implementations. For instance, the Globus Toolkit offers one implementation.
With respect to security features the GridFTP protocol relies on the Grid Security Infrastructure (GSI) that provides authentication and encryption to file transfers if needed. In order to handle large files and ‘big data’ in transfers the idea of GridFTP is to achieve much greater use of bandwidth by using multiple simultaneous Transmission Control Protocol (TCP) streams. This means that large files are downloaded in pieces simultaneously from multiple sources. Alternatively large files are downloaded in separate parallel streams even from the same source if needed. Both enable the advantage to make better use of the bandwidth. Another tuning is to use striped and interleaved transfers from multiple or single sources that in turn enable further speed increases.
There is an interesting Software-As-A-Service (SAAS) cloud service based on GridFTP that enables the transfer of large files that can be found here. The name of this cloud service is Globus Online and aims to transfer, share, publish data with an easy interface. The service is mostly used by academic institutions today. Another service often used in commercial and business sectors is the Amazon AWS Snowball cloud service. It is part of the collection of Amazon Web Services (AWS) and more pieces of information are available here.
Transfer large files details
We refer to the following video for this topic: