• What is it?
    • Sphere: Parallel Data Processing Framework like MapReduce
    • Sector: Userspace Distributed File System
    • Optimized for data intensive applications
    • Apache v2.0 License
  • Architecture
    • Sphere
      • Applies user defined functions to all elements like records, blocks, files or directories
      • Output can be written to Sector files or sent back to client
    • Sector
      • Is a meta file-system i.e. it stores files on the native/local file system of each slave node.
      • Sector does not split files into blocks
    • Master Server
      • Maintains status of slave nodes and other master nodes
      • Maintains the file system meta-data
      • Responds to user requests
    • Security server maintains user accounts and IP access control for masters, slaves, and clients.
      • Can use LDAP/System Accounts as account source
      • Uses Certificates to authenticate masters and slaves
    • Slave Servers
      • Store sector files
      • Process data
  • Reliability
    • Sector uses software-level replication for better reliability and availability
      • Replicas can be made either at write time (instantly) or periodically
      • User can configure replication factor, distance and location on a per-file basis
    • Redundant Topology - Multiple master servers can be setup for high availability
    • Hot Plugging - Master/Slave nodes can join and leave at runtime
    • Hot swapping - Master nodes can restart slave nodes in case of failures
    • Sector does not require any permanent meta-data. File system can be rebuilt from data.
  • Performance 2
    • Direct Data transfer channel between slaves and clients for better performance.
    • UDT is used for high speed data transfer
      • UDT is a high performance UDP-based data transfer protocol.
      • Much faster than TCP over wide area
    • Benchmark Study 2
      • TeraSort Benchmark (Sorting 1 TB data)
        • Rack Configuration
          • 1 head node
            • Dual-core Xeon @ 3.0GHz
            • 16 GB RAM
          • 30 slave nodes
            • Xeon E5410 @ 2.4 GHz
            • 16 GB RAM
            • 4 x 1 TB RAIDO Disk
            • 1 Gb/s NIC
          • Racks are located at JHU (Baltimore), StarLight (Chicago), UIC (Chicago) and Calit2 (San Diego). Inter-rack bandwidth is 20 GE.
        • Results

          Racks

          Sector/Sphere a

          Hadoop b

          1

          28m 25s

          85m 49s

          2

          15m 20s

          37m 0s

          3

          10m 19s

          25m 14s

          4

          7m 56s

          17m 45s

          Note(s):
          a Sector/Sphere do not require tuning
          b Hadoop was tuned for better performance.
      • MalStone Benchmark (by Open Cloud Consortium)
        • Configuration
          • 20 nodes
            • Xeon E5410 @ 2.4 GHz
            • 16 GB RAM
            • 4 x 1 TB RAIDO Disk
            • 1 Gb/s NIC
        • Results

           

          MalStone A

          MalStone B

          Hadoop

          454m 13s

          840m 50s

          Hadoop Streaming/Python

          87m 29s

          142m 32s

          Sector/Sphere

          33m 40s

          43m 44s

    • Multiple active master servers support load balancing
    • Spatial Locality - Slave nodes process data stored locally or on nearest storage node
  • Security
    • Authentication/Access Control is managed by a Security Server
    • Encryption of control messages and data transfers is supported
  • Interoperability
    • Client applications
      • Specify Input, Output and user defined functions
      • Input/Output are Sector directories or files
      • May process data iteratively/combinatively
    • User Defined functions are C++ functions following the Sphere specification (parameters and return value) and compiled into a dynamic library (*.so)
    • FUSE plugin to mount Sector File System locally

Source(s):
1 http://sector.sourceforge.net/pub/Sector-cloudcom-tutorial.pdf
2 http://sector.sourceforge.net/benchmark.html

  • No labels