CVMFS Catchup

Europe/London
Teams

Teams

Adam Parker (High Performance Data Analytics), Shaun de Witt

Agenda and minutes

Previous actions: (new actions and updates in red)

# Action Owner Status Due
01 Enquire about Azure cloud resources for hosting CVMFS stratum-0 data server SdW

Asked John Galley. awaiting response.

Required resources:
2 cores, 4GB RAM, 500GB storage

 
02 Request VM on STFC IRIS for initial benchmarking tests AP Received the application form but not sent yet.  19/09/24
03 Demonstrate CVMFS client installation and data access on CUMULUS login nodes (unprivileged user account) SD Done 19/09/24
04 Demonstrate CVMFS client installation and data access on CUMULUS compute nodes (no internet access)  SD, AP Blocked by 05 - requires squid proxy setup first 19/09/24
05 Put in ticket for VM on the cumulus noad previously known as login-03. Find out suitable resources requirements for request beforehand.  SD   19/09/24
06 Find out how versioning works on CVMFS SD   19/09/24
07 Demonstrate CVMFS client installation and data access on CSD3 compute nodes ( internet access is available)  SD, AP   19/09/24

 

Agenda: (notes from meeting in red)

  • Status of external server requests for initial benchmarking tests (Azure and IRIS)
    Note: external VM farm will be available at UKAEA in near future, but probably not available for use for another ~3 months. 
  • Progress updates
  • (MAIN EVENT) Discuss architecture for offline compute nodes 
    Target squid proxy for cumulus and use direct access to stratum-0 on CSD3. One cumulus node has been converted from login to a VM server, SD to raise ticket for a squid VM for CVMFS.
  • Next steps and assign actions
    Initial discussions on benchmarking and testing aims:
    • compare data access performance between uda and CVMFS on CSD3
    • compare data access performance between Sam's mastapp (zarr files on S3), using s5cmd, and CVMFS 
    • depending on data transfer performance results, can consider using cvmfs for metadata only and using s5cmd instead for data access.
There are minutes attached to this event. Show them.
    • 1
      Review of actions and summary of recent progress

      See action list in event minutes section

      Speakers: Adam Parker (High Performance Data Analytics), Shaun de Witt, Stephen Dixon (The UDA man)
    • 2
      HPC client architecture and strategy discussion

      General options for CVMFS client access on offline compute nodes are:
      - Squid proxy on dedicated VM on CUMULUS network. This accepts http requests from compute nodes and forwards to stratum-1 CVMFS server over internet.
      - Local stratum-1 server on CUMULUS network. Similar requirements to squid proxy, but maintains full copy of data instead of caching some recently-used subset.
      - pre-fetch and cache all required data on shared storage on CUMULUS (through login nodes?) before running compute jobs

      Aim of discussion to decide which configuration we wish to target.

      Speakers: Adam Parker (High Performance Data Analytics), Shaun de Witt, Stephen Dixon (The UDA man)
    • 3
      Benchmark testing plan

      What are we hoping to measure and what test cases can we think of

      Speakers: Adam Parker (High Performance Data Analytics), Shaun de Witt, Stephen Dixon (The UDA man)
    • 4
      Next steps and summary of new actions

      Aims for next meeting and assigning actions

      Speakers: Adam Parker (High Performance Data Analytics), Shaun de Witt, Stephen Dixon (The UDA man)