Hi there, I am trying to setup a high-availability high-performance Storage and I have no idea where to start at all. and I have a list of question i hope you could help me with.

Do i need SAN or DAS that connect to multiple control node?
What file system? and I want to avoid Lusterfs
What hardware do i need?
What Os do i use? i ideally want to use truenas but i am comfortable with another distro.

the storage need to beable to scale to petabyte level and ideally FAST

  • imaybemistakenbut@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    What’s the data you’re backing up and will it dedupe/compress well? How are you backing up software wise, veeam/comvault or will you be doing standard base line rsync etc? Does it need to be an enterprise system that you can fold into the existing backup strategy of the location or is it going to be separate and utilise secondhand equipment? Is a self hosted object storage system out of the equation or are you looking for best bang for buck for hdd purchase?

    • crazycomputer84@alien.topOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      11 months ago

      ML dataset,
      i have no plan for software for now

      yes it dose need to be an enterprise system it onsite

      this must be self hosted

      for the buget it provided by my uni

      • imaybemistakenbut@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        11 months ago

        Cool! Traditionally ML datasets tend to compress and dedupe very well so depending on the budget I would probably look at an appliance with a software stack compatible that performs this extremely well then offload to object store as you scale out.

        What you are looking for is a scalable appliance and I would look to building out a requirements document first covering the basic questions such as, speed, capacity, data delta(growth over time), redundancy and uptime.

        Once you delve deep into these questions you’ll be asking the right questions of how and what in relation to data flow. It will then build out baseline requirements for the technology stack you require.

        When I’m scoping solutions, the destination hardware is always the last question answered as if you have it as the first question, the solution is doomed from the start.

        There is no “cheap” way to get petabyte level of storage. What you will spend on hdds without dedupe and compression would cover the cost of an appliance for dedupe and compression. So a mixture between the two is probably the best approach if the growth rate of the data can be pre-conditioned by a dedupe appliance before offloaded to object storage.

        • crazycomputer84@alien.topOPB
          link
          fedilink
          English
          arrow-up
          1
          ·
          11 months ago

          tend to compress and dedupe very well

          do you have any pointer?

          by my term cheep it is comparing to solution nvidia DDN node

          and what do you mean by the term “Once you delve deep into these questions”? because this assume i have some general knowledge about the storage stack that can scale to petabyte level storage. because i bet it safe to assume that truenas scale won’t scale to a petabyte or more

  • cruzaderNO@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    How small do you need to start?

    And you need to define “FAST” with an actual number you need.

  • ryuujinzero@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    As for hardware, there are 60 bay rackmount chassis out there that you can outfit with 60 20TB drives, which would give you just over 1.2PB. You would need a motherboard with enough PCIe lanes to use the needed number of HBA cards. If you went with HBA cards that can support 16 drives, you’d need 4 for 60 drives. I personally use the Pro WS WRX80E-SAGE SE WIFI, which might be overkill for your scenario, but the 7 16x lanes allows me 4x HBA cards + 1 GPU for Plex hardware transcoding/Steam streaming + 2 Hyper M.2 expansion cards for cache.

  • reddit-MT@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    You can go with a single large capacity server, such as: https://www.45drives.com/products/storinator-xl60-configurations.php Or go with a cluster and a distributed file system. It’s a lot less work/less complexity to go with a large single server. Performance wise, you either care most about the cost and use hard drives or you care most about speed and use some kind of solid state storage.

    If I were you, I’d call a few storage vendors and see what they say. Compare feature vs cost and then decide if you want to roll your own.