Cloud storage is a general term for a data storage model most commonly offered by cloud service providers through the public internet, where cloud consumers can store and access their information “in the cloud”. Cloud storage is somewhat synonymous with storage as a service (STaaS). Beyond public cloud storage, enterprises often set up their own private cloud storage to facilitate operations and their employees, and when they are linked create a hybrid cloud.
Cloud storage is often a marketing term that encompasses remote storage and access and is not a specific technology. The underlying cloud technologies grant much greater storage resilience and efficiency, and give the cloud its characteristic agility, global scalability, and the idea that owned information can be accessed “anytime, anywhere”.
There are generally three types of distinct cloud storage models:
- Private Cloud Storage — Offers the same characteristics of “the cloud” but to a private organization. Often these groups will owe, manage, and house the hardware and software that creates the private cloud. With this responsibility will also come the ensuing costs, and security concerns of managing cloud infrastructure.
- Public Cloud Storage — Public cloud, as the name implies, is available for purchase, and usually is modeled on a pay as you go scheme. Public cloud is advantageous by economies of scale, and the best cloud providers can offer significantly enhanced services far cheaper than setting up a private cloud. The public cloud space is dominated by a handful of providers, and must be compared against its intended use to ensure its services meet the needs of the company.
- Hybrid Cloud Storage — Because of the expense of private clouds, new technologies offer the features of combining both public and private cloud architectures. This configuration has allowed companies to enhance performance and resilience by relying on the benefits of public cloud. It also serves as a bridge for organizations that undergo cloud migration, allowing them to migrate incrementally, reducing migration risks.
Recently popularized models of computing, namely decentralization, have had their impact on cloud storage. While cloud storage has used distributed systems to enhance storage redundancy and resilience, cooperative storage attempts to pool the storage resources of many “nodes” in the system, without the centralized management common in other cloud models. Sometimes this is referred to as peer-to-peer storage cloud, or a cloud storage co-op.
Generally, P2P software will be installed on all participating nodes, in which each contributes some storage capacity to the system. Then the system makes the aggregate storage available to all participants. Typically there is not a dedicated server for hosting data, an application server can be used for control and management across the collective storage for sharing and retrieving data from across the cooperative.
For the cloud consumer, cloud storage is typically set up on a pay-as-you-go plan. Enterprises may be charged for each gig of storage and traffic they consume, while general consumers may have a quota they pay for monthly. Accessing the storage can be via the web, and APIs. Cloud providers attempt to make access as easy and seamless as possible.
There are a few key technologies that allow cloud providers to maximize their storage infrastructure and serve multiple users.
- Virtualization — Virtualization allows cloud providers to easily service multiple users. Infrastructure is difficult to scale, but virtualization makes it feasible by abstracting the hardware layers away from the software, or environment layer. Virtualization allows providers to optimize their underlying compute and storage resources by managing those resources separate from applications, and provisioning them to consumers depending on need.
- Mass Distributed Storage — Cloud systems require massive storage resources to accommodate the data demands of consumers today. The solution today is distributed mass storage coupled with modern cloud storage management software which builds in redundancy and reliability by diversifying storage across thousands of servers. These systems are also beneficially able to rely on “cheaper” storage hardware, because these devices are factored into a failure formula. Data centers expect cheap hard drives to die during operations and use replication and back-ups to overcome this challenge, easily replacing dead units, and prioritizing the protection of data over keeping other top tier and expensive storage devices alive.
- Parallel Programming Model — If cloud infrastructure can be imagined as thousands of systems that make up the cloud system, they must be run in parallel to work at all. The common approach is to use MapReduce, a parallel programming system developed by Google. MapReduce has simplified mass data processing, task scheduling, fault tolerance, data distribution, and load balancing.
- Data Management — The above technologies build the underlying for a data management system responsible for processing and analyzing mass and distributed data. These platforms provide tools for developing databases, managing data operations, and integrating databases from multiple vendors.
Cloud storage offers advantages and disadvantages over on-premise storage options. In balancing these pros and cons, organizations can choose to operate using only cloud storage options, however an effective strategy is to employ hybrid cloud storage configurations that can help companies design their systems to reduce the disadvantages while benefiting from their advantages. The following pros and cons should be considered when deciding on cloud storage.
Advantages
- Offloading of Cloud Management — By using cloud storage, companies can, in effect, offload administrative responsibility of their storage assets to the cloud provider. This helps to reduce costs, reclaim valuable staff time, and simply workflows.
- Rapid Setup and Implementation— Unlike most on-premise storage setups, when companies expand their storage capacity, services can be set up within hours, rather than the days it would take to physically expand on-premise infrastructure.
- Superior Scalability — The cloud has theoretically unlimited scaling potential. Additionally, costs of scaling are controlled under a pay as you go model which helps to lower expenses.
- Data and Business Continuity — Unless there is a sizable storage infrastructure, most companies will be challenged by data continuity plans. In the cloud, data is the holy grail, and every measure is ensured to protect and secure it. Companies without the resources or expertise to maintain data continuity can turn to cloud providers to ensure their data and business continuity. And likely they will do it better because they are able to leverage economies of scale and provide the latest in technology and best practices advancements.
Disadvantages
- New Security Paradigms — Data security is a priority within the cloud, but the nature of data traversing the public internet poses new security threats. Market leading cloud providers will offer significant security protections to ensure their clients' data does not suffer attack. These will include new paradigms, like moving away from the fortress security mentality, towards “zero-trust” access and authentication policies.
- Limited Administrative Controls — While migrating to the cloud helps simplify administrative controls, it can also limit them as well. Ensure that vendors are providing the monitoring that aligns with business goals.
- Network Performance — Network performance remains a concern, of which downtime and network latency are the most critical. Some solutions are increasing bandwidth, or if the need is vital, committing a dedicated line.
- Compliance Obligations — Data regulations require protection of personal information, and, concerning cloud storage, data must reside on physical devices within the country/region that it belongs. This means, knowing where data lives in the cloud, literally, is a necessary concern for organizations, especially information sensitive ones like finance, and healthcare.
Generally there are three types of storage available in the cloud, each with their own technical limitations. Those include, in their basic forms, block storage, file storage, and object storage.
- File Storage — File storage organizes data using the traditional hierarchy of files inside of folders. While this is intuitive for users, file storage suffers in performance when scaled to meet data demands.
- Block Storage — Block storage organizes data into equal-sized discrete data blocks each with unique IDs. This makes data management and retrieval straightforward, and advantageous for real-time systems, like transaction systems that need data synced in real time, because of the high volumes of data requests.
- Object Storage — Object storage organizes data using a metadata database, and stores data in a flat system, referenced by unique IDs. Data queries then request from the metadata a particular object, and the system retrieves them. Metadata databases can describe objects in much greater detail than other systems, making them ideal for big data storage, such as data lakes
Cloud storage security is the designs and blueprints of how an organization will implement and manage their cloud security. There are four major concerns when placing data in the cloud, data security, network security, endpoint protection, and identify & access control.
- Data Security — Data security addresses security measures that protect data traversing a network, and when that data comes to rest in storage. Several controls can be deployed in security data, including: encryption, public key infrastructure, deployment of encryption and tunneling protocols, use of block and streaming ciphers, and using granular storage resource controls.
- Network Security — While data can be encrypted before transit, network security is concerned with controls on the pathways between systems. Companies can deploy several network/security controls to further protect their systems, including: network segmentation, firewalls, DDoS protection, packet capture, intrusion prevention/detection systems (IPS/IDS), packet brokers, network access controls (NAC), and APIs.
- Endpoint Protection — Endpoints provide logical places for security measures, like bouncers at clubs. These measures include: host-based firewalls screen received data (standard firewalls are usually Internet, perimeter defenses), antivirus/anti-malware software, endpoint detection and response (EDR) systems provide real-time awareness, use of data loss and prevention (DLP) systems to enforce data flows, harden systems, blacklist and whitelist applications.
- Access Control — Access controls ensure those privileges are granted only to those who need them. Poor access control management can lead to difficult to prevent threat opportunities. Consider these measures: identification, authentication, and authorization systems; multi-factor authentication, or single sign-on (SSO).