While I was writing during the last NaNoWriMo, I really enjoyed watching my daily progress on the NaNoWriMo website word count and daily averages. After the month of November and thus NaNoWriMo ended, I wanted something to help track my continued writing progress – and I don’t have a lot of money so I can’t exactly use something like 750words.com. To feed my desire for a score sheet (although minus the visuals), I made a spreadsheet that can do the same thing.
It works the same way as the NaNoWriMo website – each day you enter in the total amount of words you’ve written thus far. Since there’s no automatic entry, you need to fill down the dates and type in the same total when there are days you’ve missed, and when you reach the end of the part that’s already filled in you have to fill down the code on the other columns that you normally don’t need to touch, but essentially it works the same as the NaNoWriMo website does during November when the contest is going on.
You can find both the OpenOffice Calc and MS Excel version of the spreadsheet over on the Projects page. Happy Writing!
After much work, I’m finally ready to start moving some parts of my master’s thesis, currently codenamed the Papers Project, to the web. Essentially, I’m taking a large group of published scientific papers and am putting them into hierarchical clusters based on word usage. This is what a search engine does to tell you which websites are similar to your search query, i.e. the word(s) or phrase(s) you type into the search field on Google, Yahoo, or similar.
While the automatic visualization part is still in process, I have some rudimentary text visualization of my clusters and I’m also going to start sharing the raw papers being used as data in the preliminary testing. You can find both the visualization and the papers over on the Projects page. As pieces of the project are finished, they’ll also be posted up.
All papers were retrieved from arXiv and ADS. Note: I do not claim copyright for any of these papers and all rights are retained by their respective copyright holders.
At the end of last year as part of my Applications of Data Science class I made a quick script in C++ to guide a robot around a simulated environment. We used the MobileRobots.com ARIA and MobileSim libraries, along with one of the pre-built environments. I used this project to teach myself C++ in a very limited amount of time, which sadly did not allow much/any time for code cleanup or behavior improvements so it’s still a bit messy – but it works!
As part of my investigation into clustering algorithms, I’ve been compiling a dictionary of terms to help me remember and to keep everything clear in my head. I’ll be adding to it as I go and will eventually turn it into a full data science dictionary, but for now it’s pretty focused on clustering specifically. So, without further ado, here it is (and the full version can be found on the Notes page):
Data Science Dictionary
This dictionary describes data science terms in plain English – or as plain of English as I can, given what the term is. Some equations are given, but most only have descriptive sentences – since the equations are fairly easy to find online but the descriptions/explanations are not.
Clustering = a data science technique where the data initially is unsupervised data, or data that doesn’t have a label or any way to indicate group membership, is grouped together into clusters of similar data points.
Cluster methods/algorithms k-means = an iterative method of assigning clusters based on a center (a centroid), then recalculating cluster membership, then recalculating the center again. The iterations stop either after a specific count or when the centers do not move with subsequent calculations. k-means++ = an improvement on the original k-means that changes the start conditions by seeding the k centers based on the distance between the data and existing points.
I hadn’t finished setting up my Japanese Verb Conjugator app for the new server, but I finally completed the changes and now it should be working. The verb list is still a bit limited but all the different types (Godan and Ichidan) and endings (-u, -ku, -su, -tsu, etc.) should be represented as choices. If you see any errors or have any questions then please let me know.
Bayesian Networks are essentially just flowcharts with probabilities and different states possible as outputs from each node in the flow. This is one of the personal projects I did for my master’s class in Bayes Nets that attempts to determine whether liquid water is possible on the surface of a specific planet or body (and potentially the habitability of that body).
This isn’t really a project in the sense of a coding/programming project, but it nonetheless is something that has been taking up a lot of my time. I’ve been writing down all the info that I find as I look into graduate programs in Planetary Science and Astronomy (which is where I’m headed after I finish my current master’s in Data Science), and I’ve decided to call it a project and post it up here for general public use. Hopefully it’ll help other people as they search for graduate schools/programs – and who knows, perhaps other people will update the list with their own investigations into the churning mass of deadlines and stress that is the graduate school application process.
Ok, I may be getting ahead of myself in that I don’t *actually* know if I’ll be able to make website updates from my new home location, but I’m going to try to do some reorganization of my website. I realize that I’ve got several interests all competing for my time, and that’s making this site a bit disorganized and, with some posts, a bit of an eyesore. I used my very beginner knowledge of PHP to get the site up and working, but I think that it’s outgrown its humble origins and needs some more PHP design love.
What with my grad school classes being lessened this year, I’m hoping that I can find time to do this. This also means that I’m reducing my own social media (Facebook and Twitter) time, except for basic communication – which I don’t know yet if I can stick to it, but I’m going to give it a try!
Apparently, this has been an open secret at Berkeley and in astronomy for some time with people in the know warning incoming students through the “grapevine” about him. After it was found in this particular case that he was sexually harassing his students over a period of 10 years (TEN YEARS), apparently the “solution” was for Mr. Marcy to enter into an agreement with the school administrators to essentially not do it again or maybe he might be suspended or dismissed. So UC Berkeley, after 10 years of abuse, you’re going to wag your finger at him and expect things will change? This is not ok.
For starters, this is not ok because there is no real consequence of failure for you or your offender/employee. You lose nothing if you just pretend that all those victims are lying or make the issue fade away. You don’t have to deal with the aftermath every day.
So let’s fix that. We can do better than this. Here are a few ideas on what UC Berkeley can do better.
The site also includes information and knowledge I’ve learned in building these projects and in my experiences in computer science, astronomy/astrophysics, data science and statistics – mostly because if I don’t post it up here then I’ll probably forget about it.