Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

Discussion Lead: Marlon Pierce

Scribe:

Participants:

  • Jay Alameda
  • Babak Behzad
  • George Biros
  • Tim Cockerill
  • Anshu Dubey
  • Cynthia Gibas
  • Brandon Hill
  • David Hudak
  • Kevin Jorissen
  • Abani Patra
  • John Rehr
  • Sarom Sok
  • Todd Tannenbaum
  • Eric Van Wyk
  • Elias Balaras
  • Frank Löffler

Questions:

Distribution and Licensing
  • What are the licensing models adopted by our various projects?
    • Loffler - Base framework is free software, user-specific modules are free to choose, but sometimes even don't carry license unless used by a larger group of people.
    • CyberGIS has a mix of open source (MIT, Python license) and free but not open source. But every piece of code developed within the project by funded developers is open source. License choice is under investigation
    • DOE. No restrictions on Licensing. In the past had to check the nationality of the person. The licensing they do use restricts the reuse and redistribution.  If a commercial entity wants to use the code, then University has a special license with company.
    • When you open source your code and have proprietary code in it, then you could have legal problems. Have the anone contributing code sign a non exclusive content provider agreement.
    • U Washington - any code that is written on fed funds gives all fed employees free license. Have several other version of license depending on the type of developer and user.
    • Condor - switched to the apache license, downstream distributions didn't want to deal with condor public license
      • contributors need to sign a condor contributor agreement, they have full right to do anything they want to do with the code
      • an alternative, they license it to the public domain, or under the BSD
      • big archeological dig to get license straight
    • Amber - custom license originally, through UCSF
      • acadmic license - 400
      • commercial, 20,000k dollars
      • charging for the license doesn't impact the usage of the code, from what Ross can tell
      • what have done, has open sourced some of the packages, amber tools - free to download, this is done - to promote use of the amber force fields in other people's codes, these are the tools you need to set up force fields - the forcefields are
        • most of the code, don't know the history of, to find it out - a big deal
      • big collection of licenses to amber tools
      • amber tools -and amber engine
        • part of amber development team - all postdocs/students under Pi that signed, are under the license, giving the Amber folks permission to use their code
      • fee waiver for anyone who doesn't have funds to purchase the software - this is a free license -
      • pays for things like development meetings (yearly) - 50-100 students + pis to meet together, no other way to fund
      • gives consistency to the code -
      • there are plenty of patches on the web, for people who didn't want to contribute to the main package, but on balance, most people want to get their code into the tree - expose their code automatically to a large # of users, without having to advertise
    • GAMESS - research group license , able to distribute source to anyone in your institution or company
      • but we retain ownership -of source, can't distribute the source or binaries outside of the institution or company
      • this prevents code from going into commercial codes -
        • there is commercial electronic structure code- where their license says you can't compare the performance!
    • eclipse public license -
      • and commiter
    • outside contributors:
      • Amber has a lot of outside
      • what fraction of the total code?
      • what fraction of the code comes from outside contributors -
      • Amber doesn't have distrinct list of developers
      • but likely 5-10% -
      • people that contribute more than single method, they are authors of
    • Anshu - more than 50% of the code would be external, this is a recruiting tool -
      • some start out clueless
    • Pegasus uses the apache license, little or no contributed software, no contributor agreement
  • Do any of the projects have relationships with larger open source organizations, and thereby benefit from their distribution mechanisms (and potentially from their intellectual property offices, etc)?
    • Amber has been around for 30 years, having it licensed in this way has not killed it - have set version history, this is critical for scientific code, know exactly what version you used -
    • does the revenue distributed by commercial sufficient to sustain? doesn't come close to sustaining development, but it is gold plated money - pays for servers we have, things like that - doesn't pay anyone's salary -
    • sustainability: have critical mass, it self-sustains - it was led by peter kohlman, when he died, next person was able to keep it going, when he retires, there is enough crticial mass below
    • people claim open source
    • George Karypis - speaking of licensing, how is university office that you deal with - do they agree with open source, or complain
    • Ross: when wrote SI2 proposal, had requirement that software developed under this proposal - is open source, but scope is not well defined -
      • this code - developed by the proposal, is open source, but useless without the rest of the code
      • ucsd - said this was not possible, due to the agreement signed when joined the university
      • funds receive from NSF are way short on what you put into the development -
      • poorly defined areas of scope -
      • how are you going to deal with this?
      • release open source cuda md library - how well this will work will remain to be seen -
      • download amber tools, and library, demonstrate demo interface with this -
    • Anshu - no problems -
    • Eric Van Wyk - didn't talk to anyone in Minnesota -
    • industry wants some license -
    • Ross objects to the RFP requiring open source -
      • nsf, doe
    • but don't say that you can't patent a drug - 
    • not sure why the software - needs to be given away -
    • George Karypis - over years developed many sw packages, numerical solver - initial release in mid-90s - in 97, 98 - sun microsystems came, would like to incorporate this, would like a license, went to university office of technology licensing, they got up in arms, eventually agreed to give sun a license for free, to give them the right to do this
      • his experience, tools don't follow official license -
      • his experience, with U Minn - last 5-6 years, very aggressive to license sw - this correlates with cuts to funding -
      • dedicated person to software licensing, and now have firm marketing university sw - gets tricky if want to release someone for free -
    • Todd - for money, doesn't make a lot of money - university does sell support, from UW, which means email questions go to the front of the queue -
    • redistributing code, claim from us - code is apache, but condor term is registered trademark -
    • Amber - well known semiempricial code, mopac, very good code, but not originally written for dynamics, got used to do dynamics, when started post-doc, wanted to repeat some simulations from phd, and
      • found serious problem with mopac
        • and found copies of this semi-emipircal mopac code in gaussian,
        • this bug was in every single implementation of semi-empirical qmmd -
        • perfectly reproducible, incorrect results -
        • this code was "public domain", not open source -
    • Todd - not about license, but code reuse is bad sometimes -
      • need 2 independent implementations to verify -
    • flash got into clusters competition, at SC11 - downloaded code, and found out that it was a third party, bug was some physics not in the distribution, it did not adhere to our coding standards, couldn't run on clusters, the fact that we do have a license, and the distribution was illegal, could exploit to change the part of the code causing the problems - this was a incorrect use of the code - woudl have caused us lots of problems

Third party:(second bullet)

  • Jay uses eclipse.org
  • anyone uses source forge -no
  • Todd - has a contributor license agreement, which is basically apaches - this is the same license agreement, this helps us -
    • we are virally benefiting here
  • most of development is done in apache, but we do a lot of customized code for special cases, those codes - manage outside of the project basis, in source forge or our own servers -
    • dual repository - dual development - the core - try a new capability, can't just dump it into apache, have to go through a process to get this into the core functionality -
    •  
    • If so, how do you manage distribution of project contributions that are not accepted by the core open source package distribution?
  • What relationships should we encourage with other possible distribution channels (including adoption by industry?)
    • public repositories (of also different type) - Frank Loffler
    • website downloads for Condor, mirrored - couple in the US and couple in Europe
      • and downstream distros, in package repositories for debian, etc
      • and package repositories -
      • ours get updated monthly, while downstream only take stable
      • since it is apache, can use in commerical products - there are commercial products of condor that aren't called condor -
      • some are closely work with us, RedHat Merge, G is "condor" - redhat has engineers that help work on condor, they actually have them live in madison, work side by side with UW developers.  do they contribute back to the condor source tree?  they make regular contributions back, and upstream - get contributions from users - some commercial users that don't contribute anything back
      • benefits from commercial side - George gets tests on weird machines, and gets bug reports back - this is a benefit -
        • Amber - has had students hired since they are experts in the code, eg, hired by Genentech - someone who understood amber thoroughly -
        • source - is "source available", encourage folks to get source and have them compile - this is a pain -to distribute binary -
        • windows - does give binaries, as windows is so nicely defined for this
        • changes in glibc in linux, is a huge problem
        • but this sort of problem in windows doesn't exist -
        • there are people running on windows clusters, but not exotic hw + windows -
        • bluegene - is a pain - but not many people with those machines, and all of the supercomputers can compile on, have people on the staff too that can do this
        • used to be fine to give single cpu versions of the code out as binaries
        • but a large amount of functionality in code only works in parallel - have to build in MPI layer -
        • downtream industry people have QA people, do more QA
          • get to run in environments that coudln't run in before -
          • when run in different environments, or in different ways, learn something -
        • Flash - weird relationship with industry, most of them want the code as works with any infrastructure, want code for their test suites - this has our pluses for us - as on several of the HPC platforms, get to be on the platform early, works out kinks of porting the code is extremely helpful to us
          • and several vendors licensing representative workload - low level adjustments need to make in the hw -
          • they modify their architecture
        • Amber - have industry use the code, and the other side - where sell machines, where code is a benchmark - nvidia is a classic example -
          • this is used in chip design, this is in the regression test for new nvidia gpus -
          • this is a nice interaction with industry -this is possible as there is a market for
        • benchmarks for systems
          • amber
          • flash
          • gamess
    • data licenses - what does this mean -
      • anyone see this?
      • flash - from certain of simulatinos, have made data public, along with the tools for data anlysis, but this is freely available -
      • two types of data - one can make from simulations, the other used as input, where the data may be more expensive to obtain -
        • certainly, data from simulation, at some point, gets distilled down to a paper, and is archived or deleted -
        • data licenses seem to have more applicability for input?
        • Anshu disagrees, as not everyone has access to machines -turbulence, is one community that keeps increasing resolution of the the simulations
        • in many fields, lack of data - data mining, esp with thing that has to do with proprietary data, improved method - some journals require you to make data available
          • not sure if they have requirement to make sw available - something else you are thinking?
        • Yan - not dealing with data licenses yet, but know it is coming
        • and another issue - host the data, host services, and send users - have to store the data a long time for users to get it back or permanently -
        • Ross - end user license for Anton machine at PSC, have to leave data on the machine -
        •  
      •  
  • What about dual licensing models that differentiate between commercial and non-commercial use?
    • Can this model be used to enable software sustainability?
  • What is your experience/relation with your institution's technology marketing office?

Notes: