Tuesday, August 4, 2015

It's not ready yet ...

I first ran into this phrase while working on VIVO.  It's a frustrating phrase used to push back at
requests.
  • "Let's make an open source repository."
  • "Is your change up yet?"
  • "You should submit that to the project for others to use."
I thought it was just people new to open source.  People scared of submitting their code for possible judgement by a wider community.  I come from a "commit early, commit often, submit every day" philosophy.  My first programming position was pair programming, so though it was a single source I got used to talking about my decisions usually as I typed.

Recently, I heard this phrase come up again by someone who loves open source and uses many open source tools.  They wouldn't submit a tool to the team repository because it "wasn't ready".  It already had many useful things, however the individual wanted more features before giving it to everyone to use.  They were looking for perfection.

Perfection is in the eye of the beholder.  To me if it works, then it's perfect.  Build a feature, a method, anything you should release it to your team. Someone will find it useful, even if you don't think it will be.

This all comes from my pessimistic view of code, "it's all crap."  A co-worker and great friend told me this was too pessimistic a view.  Perhaps a better phrase is "Code is like a plant you nurture and grow.  Continue to nurture it with care and it will grow and get better.  Freeze it in time, like cutting a rose from the bush, and eventually it will wither and die."

We see it in our work all the time, today's great change is tomorrow's headache.  What worked well in the past, needs refactored in the future to expand its capability.  This isn't a bad thing, it's a great thing.  It means that code is a nearly living thing, constantly evolving.  Code that doesn't change and refactor dies as people move to better code bases.

How do we change this culture?  We have to change it in ourselves first.   Send our code up for review early, accept all criticism equally.  Change how we feel about others code.  Submit bug reports to software we use everyday.  Encourage our teams to submit early and give them positive and negative feedback and ask them questions.

  • "This is a great idea."
  • "I think you should do this instead."
  • "Why did you choose this method?"

After we change ourselves, perhaps we can change the system to accept "Nothing's perfect, it is good to be ok."

Wednesday, March 20, 2013

Computer Science Education

SIGCSE, aka ACM's Special Interest Group for Computer Science Education, was in Denver Colorado a few weeks ago and it was kind of fitting that even though I wasn't there, I heard about one of the best ideas in Computer Science Education I've ever heard.

Jim Baker, whom I know through +Nicholas Skaggs  and Alex Viggio, is an Adjunct Professor at the University of Colorado Boulder teaching a class in Programming Languages.  The class seems pretty standard, reading assignments, homework, programming projects, papers, etc.  However, the innovative thing is how all of that is structured around GitHub.

I've mentioned this over several posts, version control is a missing element in most computer science education.  It's necessary to understand its benefits and to be used to the way it affects your coding style.  It's dramatically shaped mine, I now try to code in segments so I can commit complete thoughts and I check my code before pushing to the remote server after breaking things for other users.  Bad habits are started early and the earlier developers learn to use a version control system the less bad habits they'll acquire.

Students in Jim's class are required to acquire a GitHub account and register their username with the class. The class uses the students Github accounts to create repositories for them and push homework assignments into the student's repositories.  It's an innovative way to release homework assignments as it allows the instructor to modify them as questions come in or issues are discovered.  It was always annoying when a class website would get updated with a change to an assignment, but you'd only learn  4 days later during the next class session.  This style changes the standard class communication structure of the students pinging the course, in this case the website, to the instructor pinging his/her students "pull down the changes to the assignment".  It's a valuable lesson to learn, pull from your remote repository often.  It's so valuable that +Philip Chase once created a cheat sheet of daily tasks for his developers that had "git pull" as item number 1.  It seemed they were incapable of pulling or pushing to the remote repository and eventually their lack of proper process caused some major malfunctions.

Projects are submitted to a standard "import, test, and evaluate" CSE system.  We had something very similar at UF to check for cheating.  However, in addition to pushing to this system the student must also submit a pull request to the class.  Pull requests offer allow for peer review, you can see a diff of your code repository vs the source repository and receive feedback from the upstream developer (in this case the instructor).  This process is used in most open source projects and is beginning to gain traction in the corporate world with code review systems like Gerrit.  Learning this process adds practical training with their otherwise theoretical class assignments.  It's also provides an extra incentive to code well in the form of extra credit.  Code that does the tasks better than the instructors or in a more innovative way have their pull requests accepted and recieve extra credit.  If a student has tests in their code they can also receive extra credit.

The class has a paper and presentation requirement.  The paper has to be written in MarkDown, a standard wiki code.  Companies are increasing their use of dynamic documentation with corporate wiki's.  It allows the documentation to change on the fly and maintain a revision history without attempting to share a document and intangle yourself with locking and unlocking.  Markdown is also just a step away from Latex, which academics use in creating articles.  The papers, after being submitted, are required to receive reviews from two classmates.  Over the students professional careers they will be required to review and be reviewed when writing documentation.  This unique approach to paper submission readies the students for either the academic world or the corporate world.


I hope that this approach grows.  I know GitHub attended SIGCSE and attempted to expand git as an educational tool.  I'm hoping someone convinces Jim to write a paper about it and document the successes and the failures.  Until then ...

Thursday, March 7, 2013

Software - Tradecraft or Science

Talking with one of my good friends in software development the other day I stumbled upon the question?  
Developing software, is it a tradecraft or is it a science?
The question came up while talking about the skills of various people where we used to work.  We had all kinds, people with masters degrees in software engineering, others with a bachelors degree in computer science.  Some didn't have a degree in software at all; such as my friend whose degree was in music.

What was interesting to note here is that degree should have indicated skill, if we were a pure science.  "I have studied more complicated theories then you and thus I am superior."  However, some of our most accomplished employees were some of our worse.  People with masters degrees and multiple classes in databases came up with some of our worst database designs.  Duplicated information in tables that should have been in a lookup table, using MyISM on a table that should have had foreign keys, the list of tragedies (in my opinion) go on.  Now I'm not an advocate for one normal form over another.  Going all the way out to fourth normal form can sometimes be extreme and a good software engineer should balance all aspects when building a program.  Its this thinking that started the wheels turning; software development or engineering has elements of a tradecraft.

In the sense that it's a trade, time tells you when by the book isn't necessary.  As you work at your trade you develop things like toolkits and acceptable short cuts.  In software we come up with tools for bug analysis, diffing files, profiling performance, writing code.  There are at least a few hundred articles on the web devoted to a developers "toolkit".  Some talk about it like finding that perfect set of wood working tools, where your hand has practically reshaped the handle from use.

In a purely technical sense it's also an art form.  Good code is not just functionally correct, but it's elegant.  Sometimes its the way the code touches the processor, ram, and disc in the most minimal and efficient way possible.  Other times its the way it does large amounts of effort with minimal amount of logic.  I'm not talking about a Perl one liner but simple, easy to read, powerful to use code.

I think like civil engineering and architecture, software engineering hints at the blending of these worlds; tradecraft and science.  The purely analytically with the artistic.  A building built on math alone would be a pretty boring building.  Architects learn to make a building soar.  In the same vein a great software engineer can make a program sing.  Don't get me wrong, we need the science too.  Without it our programs would be like the failed library building where the architect forgot to account for the weight of the books.  Just don't forget the craftmanship.

Wednesday, March 6, 2013

SimCity

When you've played every iteration of a game, like I have with Zelda, Assassins Creed, or Bioshock you see features come and go as the developers try to make the perfect game.  I've played SimCity since Simcity 2000 on Chris Pearson's old 450Z MHz Pentium II in high school.  I've seen the complexity get ramped up, and the size of the cities explode.  When I played the demo on Des' laptop for the latest SimCity, I was floored at how fun this game looked.  Sure the city was small, but services were simpler and the notification system was great.  We decided to buy two copies so we could play together and eagerly awaited its release.  Perhaps its just release day bugs, but I want my money back.

Stop making the web required

On the Meyer Briggs test I sit just barely over the line towards extrovert.  It's because in certain situations I'm a social butterfly; walking, talking, eating, etc.  However, in some situations I'm not; ie gaming. When I buy a game, I don't rush out to the multiplayer system.  I play single player all the way through and then if some of my friends are playing I'll try out the multiplayer.   I don't want to link to everyone all the time.  I shouldn't have to wait on a server to play my game.  

Many games are requiring online connections as a method of DRM.  If that's the way you want to play it, fine, require my connection at the first log in, verify my information, store my log in locally, and link me when I'm online.  If I have a laptop and want to play a game on the road or in a coffee shop, why am I getting denied because I don't have an active connection.  I can play the new StarCraft without a connection, as long as I had a connection when I installed it.  I won't earn rewards, but that's OK if all I'm looking for is a little fun.  

AutoSave does not replace Save

I don't know how many times I heard at the HelpDesk, "But I had autosave on!"  "Yes", I'd state, "lets check your settings, oh you had it saving every 8 hours, sorry you've lost your paper."  Now, nothing was more annoying in Final Fantasy than getting too into the game and forgetting to save.  Inevitably it was always when you had forgotten for an hour or two that suddenly the game froze or you were horribly killed.  However, you learned an important lesson every time, "save early and often".  

Desiree has exclaimed several times since SimCity has come out, "Crap it crashed" only to return and find her city 10 years younger than when she last saw it.  Just this past evening she was telling me about an arson problem in her SimCity and that she had finally gotten a police station to deal with it just before the crash.  Sure enough when the system let her sign in again, no police station.  

Let people save for themselves.  Let us manage our saves locally.  Its great that we can pick up wherever we left off on another computer, but don't take away the basics to give us this feature.  Simply warn us, "System saves do not transfer from computer to computer."

It's called a cluster

If the servers named North America East 1-3, West 1-2, Eastern Europe 1-2, Western Europe 1-2, etc are clustered servers, then I feel you need to send your engineers back to school.  Why in 2013 am I picking a server node.  I haven't picked a game server in years. Perhaps that's because I play on Xbox more often than anything else.  However, my last PC gaming experiences were League of Legends and StarCraft 2; both online games and I didn't pick the servers with either of them.  

Give me one huge node to hit and spread me and my friends traffic across the nodes with load balancing and clustering.  If you do it right you can simply add more nodes as traffic picks up ala Amazon. I guess that's the biggest problem.  OpenStack, Amazon EC2, and tools like Juju have shown us that clustering servers and spinning up new nodes can be trivial, if implemented correctly.

Tutorial, do I look like I ask for directions?

It's great that games come with tutorials.  If I ever have a problem, I often play through them.  However, if I've played the demo, or previous versions of the game I like to dive right in.  I'm currently stuck (one of the many reasons I'm writing and not playing) because the game says I need to play the tutorial, again (already played it once since it was released) and the tutorial keeps crashing.  

My favorite tutorials are in Call of Duty, go through door A for the tutorial or B to get into the action.  "Do you want the red pill or the blue pill?" Super easy to just jump right in or get a little help first.  

Tutorials are great.  
Good for you making one.  
Can I just play my game now?

Conclusion

It still looks like a fun game, and despite Desiree's frustrations she appears to be having fun.  However for me the game has been out for roughly 48 hours and I've played exactly 1 hour and attempted to play 2 additional hours.  That's the biggest flaw in this game.  If this was my first experience with SimCity, I'd never pick it up again.

Friday, March 1, 2013

Code Academy - Interesting Concept

Alex Viggio, my development lead at CU, sent me a link to the 2012 Crunchies Awards on Tech Crunch.  The winners were some of the usual suspects, but I saw a site that peaked my interest, CodeAcademy.com.  

Thier slogan is "Teaching to world to code." Now unlike a certain mayor, I don't believe everyone in the world can code.  It takes a 'different' mind than what everyone has.  I also don't believe everyone can paint, do carpentry, manage a team, fly a plane.  We're all different and we will find things beyond our abilities.  However, just because everyone won't be successful doesn't mean everyone shouldn't try.

I decided to take CodeAcademy for a spin.  It's important to me that people be taught to code in a responsible manner.  Bad habits develop early and lord knows I have my fair share.

The Profile

This is a requirement of almost any site, especially on the 2.0 web.  Unlike a lot of sites that attempt to integrate themselves with every known social site and application, CodeAcademy does a good job of selecting sites that represent developers
  • Github - as a developer if you haven't at least tried git and github then I believe you either have A) been living under a rock for the last few years or B) are my father and work on mainframes.  This is a great place to send new developers.  The first question I usually ask during a interview is "How do you share code?"  I then usually face-palm as they say, "thumb drive" or "email".
  • LinkedIn - lets face it facebook is for family and friends, LinkedIn is for co-workers and employers.  If you want to develop code professionally, you should have a LinkedIn account.
  • Twitter - I know several developers use it to point out interesting coding articles they find.  It's not as important as the first two, but it has been useful in the past.
They have taken a page out of gaming that I love in web 2.0 sites, badges.  You see them all over StackExchange, UbuntuFitocracy, and others.  It's like merit badges for adults, giving a little 'props' for completing something.  It's also a nice way to measure yourself against your peers and see areas where you can explore.  Perhaps this is just turning us into a needy praise driven zombies, but that's a more cynical observation of the reward system.  I perfer to look at anything that motivates us as a positive.

The rest of your profile reflects you current progress.  It shows the tracks you are currently working on and the courses you've completed.  It's a nice portal into your history allowing you to return to where you last left off.

Courses

When you first enter the "Learn" side of the the codeacademy interface you'll see a variety of tracks.  Some are coding languages: Python, Javascript, Ruby.  Others are more applied: 'API's, Web Fundamentals, Projects.  I think this is a great start, I just wish that the system had more formal languages,  C, C++, Java, and even some data languages and syntax, SQL, XPath, SPARQL.  All of the languages they presented are dynamically typed languages and I assume have a lighter interpreter since they are evaluated at run time instead of compile time.

Python Course

The first track I started was the Python course.  I already know a bit of python from some google courses and my own messing around, but its not a language I use every day.  I figured a bit of a refresher wouldn't be bad and would help me gauge the course since it's the closest to a newbie that I could get.

The courses starts off slow, going over the basics pretty much in order of any standard language book or first year programming course.  Syntax, basic data structures, conditional statements, looping, advanced data structures; its all laid out.   There are a few courses in the middle which re-iterate things like looping and feel a bit redundant. However, it's a good over view of the language, with a bit of programming basics thrown in for those without computer science degrees.

An interesting element of the track is that each course in the python track is paired with a project.  The projects attempt to apply a scenario to the information you learned in the course.  Some of them are quite well done, and allow you to define everything.  Other's are quite rigid and basically offer a paint by numbers approach.

The only issue I found with this track is that's several sections are quite buggy and do not offer assistance about why you don't pass.  For example, the challenge "Exam Statistics", requires you to computer the variance of a set of grades.  In python it looks something like 

grades = [100, 100, 90, 40, 80, 100, 85, 70, 90, 65, 90, 85, 50.5]
def print_grades(grades):
    for grade in grades:
        print grade
def grades_sum(grades):
    total = 0
    for grade in grades:
        total += grade
    return total
 
def grades_average(grades):
    sum_of_grades = grades_sum(grades)
    average = sum_of_grades / len(grades)
    return average
def grades_variance(grades, average):
    variance = 0
    for grade in grades:
        variance += (grade-average)**2
    variance = variance/len(grades)
    return variance

Which computes the answer 334.071005917. This is correct when used in the next section.  However, the section tells indicates something is wrong, "Oops try again."  That's the frustrating thing about some courses in code academy.  The feedback and review on the course is determined by the administrator.  Where one course will say "Oh you didn't print the string but the array. Print the string instead" another will just say "Oops" and hope you understand enough about why you failed.  The "Oops" courses seem to have the most restrictive requirements for passing each section and so it only increases the frustration.

HTML Course

I started this course because it was something I knew very well.  The speed of the course was very slow for me, but probably the right pace for my mother.  Its slow, re-iterates itself, and goes through every major structure of a webpage.  Overall if you know the basics of HTML, skip this course, unless you are curious like I was.

Despite the slow and boring nature of the course, I did appreciate the design that went into building it's interface.  Unlike the Python course which had a run environment at the bottom and code at the top, the HTML courses had tabs for various files (such as the index and css files here) and a panel to the right which shows the result of the code on the page.  It has a fairly active autosave, but includes a submit button to force display and call the course evaluation procedures.  It was clean easy to follow along and work in the panels.

Conclusion

While coding isn't for everyone, CodeAcademy does make it accessible to anyone willing to try.  It's courses are well thought out, and it appears to be actively expanding.  You'll find a few bumps in the road with courses that are too restrictive in their completion criteria or not descriptive enough in the requirements to pass.  However, the forums and bug tracker are active and responsive so hopefully time will fix those courses. 

It's not a bad resource for active developers that haven't touched a language or area of development before.  It also has a "Teach" section that allows developers to give back to CodeAcademy and share their knowledge with the community.  

For my part, I may try my hand at the teach section. I'll also keep an eye out for any new courses or tracks.

I leave you with a little poem from the first section in Python (which is from the Python library)

Zen of Python
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Appendix: Some sites about coding

  • codeacademy.com
  • code.org
  • codinghorror.com

Thursday, January 10, 2013

Three Tiered VIVO Build

An interesting aspect in the relationship between VIVO and Vitro (the non-ontology specific engine for VIVO and other projects) is the layered nature of the projects.  VIVO as a source repository is basically a shell that copies over it's Vitro base.

Cornell extends this further with adding their own changes to VIVO as a third level.  By modifying the build.xml file and the deploy.properties to include pointing at VIVO as a second layer, the build script can preform the same changes to VIVO as VIVO does to Vitro.

It seems a little complicated, but it removes you from the VIVO source allowing you to replace your VIVO target with the latest version and adapt more rapidly to new releases of VIVO.  There hasn't been a study yet, but I would guess that the average time from release of a new VIVO version to an institution upgrading to that version is probably around 5 months.  I know at UF it took us about that long (often waiting for the .1 release) and at CU they were still on 1.2 when I arrived (1.5 was released just before the VIVO conference in August).

I've started a wiki article on the new VIVO confluence wiki, https://wiki.duraspace.org/display/VIVO/Building+VIVO+in+3+tiers, that describes how to setup your local VIVO to run this three tier system.

Tuesday, January 8, 2013

VIVO Ingest in 1.5.1

Little update, about 3 months ago we moved from Gainesville, Florida to Boulder, Colorado as I accepted a position with the University of Colorado Boulder.  I'm back in the trenches, writing code everyday and working on VIVO for Faculty Affairs at CU Boulder.  Hopefully this will restart my attempts at writing more often and focus that writing on more technical and less managerial matters.

Updating Data Through Data Ingest in VIVO 1.5.1

I've had a lot of tasks at CU since starting, and I'll go over some of the things I've learned and written soon enough.  For now I wanted to talk about updating your Data in VIVO 1.5.1.  In the old days, when four programmers at UF embarked on Data Ingest, if you wanted to get data into VIVO you had to add it to the systems "Main Model" known as KB2.  This made data ingest difficult during the update phase because you either had to 
  1. Start over from scratch with a blank VIVO
  2. remove the previous data that you ingested which contained the data you want to change
Semantic Triple Stores, didn't have a key that we could use to link a row in KB2 to the data coming in from our source (in hindsight I believe there are ways with hash keys that we should have probably done this).   Due to this we constructed a very complicated (and time expensive) process to compare the data you are putting into VIVO against the last time you ingest from that source.  It creates an additions and a subtractions file which you then apply against the KB2 model.  Basically it was a bit like writing a remove and insert to accomplish an update in SQL.

Now in VIVO 1.5.1 this is mostly the same.  However, data doesn't have to be in the main model to be index.  So now we can separate our data by the source, or in the case of CU the tables that generated the data.  This allows for a shorter ingest process, we're only interested in dropping and adding to the models that have changes to be made.  I took CU's current process (which uses selenium scripts against the UI) and ripped out the portion that loaded data into KB2 (download of the data, use add/remove screen to load the exported data to the main graph) and I added a method to drop the import graph that we used.  This dropped my time from 3-4 hours to 1 hour for an entire ingest.

We were still rebuilding from scratch with each ingest and now that we're heading to production I wanted to make this process a little faster.  So I wrote the first of a couple of scripts towards automating the entire process.  This first script reviews the dat files for changes, allowing me to drop only the graphs that have changes and re-run thier ingest scripts.  

The process was fairly simply and I've included a couple of sites and blogs I used to figure out what to do.  By hand (which will become another script soon) I copied down the data from the previous run and ran a new export and copied down the new data.  I then pass to my new little script the two paths for the old data and the new data.  

The first step in the script is to review all of the file names and see what is missing, what's new.  Since the export process is an sql script which always uses the same file names for each of it's methods we don't have to worry misspellings, just new dat files or the lost of a dat file.
def reviewFolderForChanges(oldFilePath, newFilePath):
    #read in file path
    oldFiles = os.listdir(oldFilePath)
    newFiles = os.listdir(newFilePath)
    
    #get names of files in order of name by create list
    for newFile in newFiles:
        if newFile in oldFiles:
            reviewFileForChanges(oldFilePath+newFile, newFilePath+newFile)
        else:
            print "New File Found: " + newFile
            
    for oldFile in oldFiles:
        if oldFile not in newFiles:
            print "File Missing: " + oldFile 
 The next and final step in the process was to review the files themselves.  I could have used the python difflib, but I wanted more information from the files.  I wanted to know what was new and what had been removed.  Plus I found a nice little reference post by a Frankie Bagnardi that I wanted to implement myself.  The result has greatly increased the ability of the ingest operator (me today, probably alex and vance at other times) to make sure that the changes coming in are reflected in VIVO.  For example


Changes in file:/Users/stwi5210/Source/uccs-new-data/fis_faculty_member_positions.dat
- "http://vivo.uccs.edu/fisid_XXXXX","http://vivo.uccs.edu/deptid_XXXX","1435","Chair","http://vivoweb.org/ontology/core#FacultyAdministrativePosition","2","2"
- "http://vivo.uccs.edu/fisid_XXXXX","http://vivo.uccs.edu/deptid_XXXX","1419","Lecturer","http://vivoweb.org/ontology/core#FacultyPosition","5","4"


With information like this I go to the two individuals listed and make sure that they no longer have the positions of Chair or Lecturer.  This let me know that my ingest was successful and to report that it's ready to migrate to production.

All in all the script took about an hour to construct and then run and saved me about 40 minutes of ingesting.  Plus I was able to review VIVO after the ingests were finished for the data that should have changed, which is a big improvement over our previous methods of review.

Citation(s):

  • Compare Two Files with Python by Frankie Bagnardi- http://aboutscript.com/blog/posts/107
  • Python: iterate (and read) all files in a directory (folder) by Bogdon T - http://bogdan.org.ua/2007/08/12/python-iterate-and-read-all-files-in-a-directory-folder.html