Getting Started

ArcoDataHub, part of the Italian AI Factory (IT4LIA), offers an innovative and efficient way to access live-updated ARCO (Analysis-Ready Cloud-Optimized) datasets across diverse domains. Here's everything you need to know to start working with our datasets.

Datasets are published in ARCO (Analysis-Ready Cloud-Optimized) formats including Zarr, COG, GeoParquet, and FlatGeobuf, with live updates (like weather radar data updated every 5 minutes). Every user has access controls to ensure data security and proper usage quotas.

We'll show you how to obtain and use your authentication credentials, but let's start by understanding the basic workflow for accessing data through ArcoDataHub.

Step 1: Create Your Account

The first step to accessing ArcoDataHub is creating your account. Our registration process is designed to be simple and secure.

IT4LIA Integration: ArcoDataHub is part of the Italian AI Factory ecosystem, ensuring high-quality data access for Italian research institutions and international partners.

Register: Visit the registration page and fill in your details.
Email Verification: Check your email and click the verification link to activate your account.
(Optional) Profile Setup: Complete your profile with your information to help us understand your research and application needs.

Step 2: Request Dataset Access

Once your account is set up, you can request access to specific datasets. Our approval process ensures data is used for legitimate research and educational purposes.

Browse Datasets

Explore our catalog of ARCO datasets spanning meteorology, agriculture, cybersecurity, and other domains.

Submit Request

Use our request form to apply for access, including your research purpose and institutional details.

Instant Access: Currently, access to datasets is granted immediately by default, allowing you to start working with data right away. In the future, we may introduce new datasets that require manual approval for sensitive or high-value data, which would be reviewed within 2-3 business days with email notifications upon approval.

Step 3: Set Up Your Python Environment

The easiest way to get started with ArcoDataHub is using Python and Xarray. Make sure you have Python set up and install the required tools.

Install Required Packages

pip install xarray zarr dask aiohttp requests

Recommendation: We recommend using a virtual environment to manage your dependencies and avoid conflicts with other projects.

Step 4: Access Your Data

Once your access is approved, you'll receive your personal access credentials. Here's how to use them to access datasets.

Basic Data Access Example

import xarray as xr
import requests

# Your personal access credentials (from your account page)
username = "your_username"
access_key = "your_access_key"

# Example: Access a dataset
dataset_url = f"https://{username}:{access_key}@api.arcodatahub.com/S3/dataset_name.zarr"
ds = xr.open_dataset(dataset_url, engine="zarr")

# Display dataset information
print(ds.info())

# Access specific variables
if 'temperature' in ds:
    temperature = ds['temperature']
    print(f"Temperature data shape: {temperature.shape}")

Success! Once you can run this code successfully, you're ready to start analyzing live-updated ARCO data!

Advanced Configuration

For more convenient access, you can configure your credentials using environment variables or configuration files:

# Using environment variables
import os
import xarray as xr

username = os.getenv('ARCODATAHUB_USERNAME')
access_key = os.getenv('ARCODATAHUB_ACCESS_KEY')

# Configure Xarray with storage options
storage_options = {
    "client_kwargs": {"trust_env": True}
}

ds = xr.open_dataset(
    "https://api.arcodatahub.com/S3/dataset_name.zarr",
    storage_options=storage_options,
    chunks={},
    engine="zarr"
)

Best Practices & Tips

ARCO Data Management

Always check the chunking strategy used to store the dataset, and use an access pattern that minimizes fetching non-used chunks. Usually datasets are chunked by time (1 chunk = 1 timestep × height × width)
Never do operations that require loading all dataset (like computing the mean of the whole datacube)
For downloading large parts or whole archives, loop on chunks and save (like to_netcdf()) or use S3 command line tools (s5cmd)

Security

Never commit access keys to version control
Use environment variables for credentials
Regularly regenerate your access keys in the profile page

Important: The system is monitored for download and access patterns. We reserve the right to suspend accounts that exceed reasonable usage patterns to ensure fair access for all users. If you anticipate high-volume access, please contact us in advance at the registration email.

Next Steps

Now that you're set up with ArcoDataHub and ready to access live-updated ARCO datasets, here are some suggested next steps:

Check Your Requests

Visit your requests page to monitor approval status and access your API keys.

Propose New Datasets

If you have a dataset in mind that could benefit the community, let us know! Write us an email by addressing it to the registration email. We welcome suggestions for new datasets and collaborations.

Share with Others

Do you know someone who could benefit from ArcoDataHub? Spread the word!

Need Help?

If you encounter any issues or have questions about using ArcoDataHub, don't hesitate to reach out to our support team. We're here to help you make the most of Italian AI Factory's diverse ARCO data resources.