<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0"><channel><atom:link rel="hub" href="http://tumblr.superfeedr.com/" xmlns:atom="http://www.w3.org/2005/Atom"/><description>Assorted notes on statistics, R, psychological research, LaTeX, computing, etc. See also my primary blog for more substantive posts: jeromyanglim.blogspot.com</description><title>Jeromy Anglim's Notes</title><generator>Tumblr (3.0; @jeromyanglim)</generator><link>http://jeromyanglim.tumblr.com/</link><item><title>Fixing Outlook on OSX from "Microsoft Outlook must be closed because an error occurred. Any unsaved work may be lost."</title><description>&lt;p&gt;My mac recently crashed and needed to restart.
After the restart Outlook seemed to constantly crash shortly after starting. Or it would crash after immediately when I click &amp;#8220;reply&amp;#8221; to a message.&lt;/p&gt;

&lt;p&gt;The error before crashing was:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Microsoft Outlook must be closed because an error occurred. Any unsaved work may be lost.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I tried the &lt;a href="http://answers.microsoft.com/en-us/mac/forum/macoffice2011-macoutlook/error-microsoft-outlook-must-be-closed-because-an/30022391-f578-4079-917c-e896ff533766"&gt;following strategy&lt;/a&gt; of alt clicking outlook and clicking rebuild identity.
I then deleted the backup that was created (do this at own risk!).&lt;/p&gt;

&lt;p&gt;This all fixed the problem.&lt;/p&gt;</description><link>http://jeromyanglim.tumblr.com/post/51034545009</link><guid>http://jeromyanglim.tumblr.com/post/51034545009</guid><pubDate>Wed, 22 May 2013 11:25:14 +1000</pubDate><category>@osx</category></item><item><title>PowerPoint Shortcut keys on OSX</title><description>&lt;p&gt;&lt;ul&gt;&lt;li&gt;Cmd+Shift+&amp;gt; increase font size&lt;/li&gt;
&lt;li&gt;Cmd + Shift + &amp;lt; decrease font size&lt;/li&gt;
&lt;/ul&gt;&lt;/p&gt;</description><link>http://jeromyanglim.tumblr.com/post/50404770287</link><guid>http://jeromyanglim.tumblr.com/post/50404770287</guid><pubDate>Tue, 14 May 2013 16:08:48 +1000</pubDate><category>@osx</category></item><item><title>Round numbers in data frame that contains non numeric data</title><description>&lt;p&gt;I sometimes want to quickly round numbers in a data.frame that contains some non numeric data (e.g., some labels or other text columns). Using the &lt;code&gt;round&lt;/code&gt; function returns an error when a data.frame with non-numeric data is used as an argument.
This function rounds numeric variables a data.frame to the specified number of digits and leaves non-numeric data untouched.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;round_df &amp;lt;- function(x, digits) {
    # round all numeric variables
    # x: data frame 
    # digits: number of digits to round
    numeric_columns &amp;lt;- sapply(x, mode) == 'numeric'
    x[numeric_columns] &amp;lt;-  round(x[numeric_columns], digits)
    x
}

round_df(data, 3)
&lt;/code&gt;&lt;/pre&gt;</description><link>http://jeromyanglim.tumblr.com/post/50228877196</link><guid>http://jeromyanglim.tumblr.com/post/50228877196</guid><pubDate>Sun, 12 May 2013 15:22:14 +1000</pubDate><category>@rstats</category></item><item><title>Minimum correlation required for statistical significance using R</title><description>&lt;p&gt;It is traditional in psychology to indicate which correlations in a correlation matrix are statistically significant. One system involves stars on the correlation. This is distracting when most correlations are statistically significant and there are many variables. I prefer just to display a note at the bottom of the table indicating that absolute correlations above a certain amount are significant at .05.&lt;/p&gt;

&lt;p&gt;To obtain this value you can just experiment with different smaple sizes in an &lt;a href="http://www.vassarstats.net/rsig.html"&gt;online calculator like this one&lt;/a&gt; until you find a correlation that goes over the .05 threshold (or whatever your alpha is).&lt;/p&gt;

&lt;p&gt;Alternatively, to obtain this value I wrote the following function in R.
It uses the &lt;a href="http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient#Testing_using_Student.27s_t-distribution"&gt;student&amp;#8217;s t distribution method for calculating the statistical signifiance of a correlation&lt;/a&gt;.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;minimum_significant_r &amp;lt;- function(n, alpha=.05, twotail=TRUE, precision=.01) {
    # calculate minimum significant correlation
    # useful to report at bottom of correlation matrices instead of using stars
    # n: sample size
    # alpha: alpha level for determining statistical significance
    # twotail: TRUE means two tailed significance; 
    #          FALSE means one tailed significance
    # precision: precision of significant r (typically .01, .001, or .0001)

    r &amp;lt;- seq(0, 1, by=precision)
    tvalue &amp;lt;- r * sqrt((n-2)/(1-r^2))
    pvalue &amp;lt;-  1 - pt(tvalue,  df=n-2)
    if(twotail)  pvalue &amp;lt;- pvalue * 2 
    first &amp;lt;- min(which(pvalue &amp;lt; alpha))
    r[first]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So for example, if I had a sample size of 100 and I wanted to know which correlations were significant at the .05 and .01 level, I could run the following code:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;gt; minimum_significant_r(n=100, alpha=.05)
[1] 0.2
&amp;gt; minimum_significant_r(n=100, alpha=.01)
[1] 0.26
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Thus, I could add a note at the bottom of the table:&lt;/p&gt;

&lt;p&gt;$n=100$; for $r &amp;gt; |.20|, p&amp;lt;.05$.&lt;/p&gt;</description><link>http://jeromyanglim.tumblr.com/post/49842848390</link><guid>http://jeromyanglim.tumblr.com/post/49842848390</guid><pubDate>Tue, 07 May 2013 17:59:00 +1000</pubDate><category>@rstats</category></item><item><title>Initial experiences with JAGS, R and EC2</title><description>&lt;p&gt;This post discusses my epxerience getting started with EC2.&lt;/p&gt;

&lt;h3&gt;Motivations&lt;/h3&gt;

&lt;p&gt;I have some simulations combining R and jags which are going to take about five days to run in R. There&amp;#8217;s plenty of scope for parallelisation. I&amp;#8217;m trying to work out the best option for running these programs. A few options include:&lt;/p&gt;

&lt;ol&gt;&lt;li&gt;Taking a spare computer and setting it up in a spare room, and letting the program run for the five days. &lt;/li&gt;
&lt;li&gt;Use Amazon EC2 or some other cloud or grid computing option.&lt;/li&gt;
&lt;/ol&gt;&lt;h3&gt;Existing resources R on EC2&lt;/h3&gt;

&lt;p&gt;There are several posts that describe getting started with R and EC2.&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;&lt;a href="http://sas-and-r.blogspot.com.au/2012/02/rstudio-in-cloud-for-dummies.html"&gt;This post clearly describes getting started with RStudio ec2&lt;/a&gt;. This approach avoids ssh and the command line, and would be suitable for running code purely in R.&lt;/li&gt;
&lt;li&gt;&lt;a href="http://decisionstats.com/2010/09/25/running-r-on-amazon-ec2/"&gt;This post is an older one on R and EC2 using ssh&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://mobiarch.wordpress.com/2012/08/14/connecting-to-amazon-ec2-from-mac-osx/"&gt;This post describes gettting started with ssh on OSX&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://andrewgelman.com/2011/07/09/r_on_the_cloud/"&gt;Here&amp;#8217;s an extended discussion of R in the cloud&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;h3&gt;JAGS and EC2&lt;/h3&gt;

&lt;p&gt;Ultimately I&amp;#8217;m interested in calling JAGS from R using rjags. 
I found slightly less on jags in particular:&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;&lt;a href="http://empty-moon-9726.heroku.com//blog/2012/02/11/r-jags-rjags-on-an-ec2-instance/"&gt;But this post focuses on R Jags and Rjags on EC2&lt;/a&gt;. It is particularly clear. &lt;/li&gt;
&lt;li&gt;I also asked about &lt;a href="http://stats.stackexchange.com/questions/57650/simple-cloud-computing-to-run-r-jags-simulations"&gt;JAGS on EC2 on Stats.SE&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;&lt;h3&gt;General thoughts&lt;/h3&gt;

&lt;ul&gt;&lt;li&gt;Getting an account on EC2 is fairly straightforward. You do need to provide credit card information just to get access to the free options. And it is a little bit scary to think that if something went wrong (e.g., you left a high powered server on accidentally), you could rack up some serious expenses. EC2 provides some basic free options for the first year which allows you to play with it.&lt;/li&gt;
&lt;li&gt;Using &lt;a href="http://sas-and-r.blogspot.com.au/2012/02/rstudio-in-cloud-for-dummies.html"&gt;web Rstudio in the browser&lt;/a&gt; was straightforward. That said, most of my interest at the moment involves jags, and the instance I looked at did not support JAGS.&lt;/li&gt;
&lt;li&gt;Getting started with JAGS was not too difficult. Having spent a year with Ubuntu as my primary operating system, using the terminal to login to the server was fairly straight forward, and the tutorials mentioned above explain fairly well some of the quirky things about making your pair private and getting the public dns and so on.&lt;/li&gt;
&lt;li&gt;I still have a fair bit to learn about how to optimise an EC2 job for running a JAGS simulation. There&amp;#8217;s the potential of making the job more parallel and thereby run quicker. There are decision about how to organise the simulation code to work best in the EC2 environment.&lt;/li&gt;
&lt;/ul&gt;</description><link>http://jeromyanglim.tumblr.com/post/49252610073</link><guid>http://jeromyanglim.tumblr.com/post/49252610073</guid><pubDate>Tue, 30 Apr 2013 21:33:22 +1000</pubDate><category>@rstats</category></item><item><title>HTML to Markdown and back for blogspot blogging on OSX</title><description>&lt;p&gt;This post discusses workflows for editing blogger blogposts using markdown.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context:&lt;/strong&gt; I&amp;#8217;ve noticed myself posting less and less these days on my primary blog &lt;a href="http://jeromyanglim.blogspot.com"&gt;jeromyanglim.blogspot.com&lt;/a&gt;. This is due to a range of factors. I have moved more content to stack exchange sites and I&amp;#8217;ve also been posting ad hoc notes on this tumblr. I also like to think that posts on my primary blog pass a higher level of quality and general interest than this tumblr blog.&lt;/p&gt;

&lt;p&gt;Nonetheless, I think part of what holds me back from my main blog is that it is awkward to write and edit the content using markdown.
I have experimented with: (a) maintaining a set of posts in markdown on my computer that I edit and (b) then I convert the markdown to HTML using pandoc, and (c) copy and paste into Blogger.
However, this is awkward for a few reasons. (1) It means that you can use the features in blogspot to tag, sort, and search through drafts, and (2) If I want to edit an existing post, the content is now in HTML. Although there are convertors from HTML to markdown, it all just feels a bit awkward compared to clicking a single button and tweaking the markdown, as is the experience on StackExchange and on tumblr.&lt;/p&gt;

&lt;p&gt;Thus, I&amp;#8217;m interested in what other workflows are possible on blogspot that would allow me to edit in Markdown, but retain blogspot as my blogging platform.&lt;/p&gt;

&lt;h3&gt;Mou&lt;/h3&gt;

&lt;p&gt;&lt;a href="http://mouapp.com/"&gt;Mou&lt;/a&gt; is a free Markdown editor in beta  at the time of writing for OSX.
Useful features include:&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;Real time preview with various options (size of preview window; toggle display of preview)&lt;/li&gt;
&lt;li&gt;Copy post to HTML (which makes it easy to post to blogspot)&lt;/li&gt;
&lt;li&gt;Some support for mathematics and MathJaX in markdown&lt;/li&gt;
&lt;li&gt;Post direct to tumblr&lt;/li&gt;
&lt;li&gt;Export to HTML or PDF&lt;/li&gt;
&lt;li&gt;Syntax highlighting where the theme is editable.&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;The main limitation of Mou for my potential blogspot workflow is copied from blogspot in HTML format. It would be nice if Mou would convert standard HTML tags to markdown equivalents.&lt;/p&gt;

&lt;h3&gt;Markdown service tools&lt;/h3&gt;

&lt;p&gt;To overcome this problem, there is &lt;a href="http://brettterpstra.com/projects/markdown-service-tools/"&gt;Markdown service tools&lt;/a&gt;. These are a set of tools to make it easier to work with Markdown on OSX.&lt;/p&gt;

&lt;p&gt;In particular, one of the services is convert HTML to Markdown. 
Thus, in Mou or textedit, you can highlight text and then run this service, and the HTML text will be converted to Markdown.&lt;/p&gt;

&lt;p&gt;However, there may are  &lt;strong&gt;some major issues&lt;/strong&gt; regarding how the conversion is done. In particular, instead of just passing through HTML tags that have no markdown equivalent, the processor strips these. So for example, embedded YouTube clips or HTML comments are stripped. In particular, I use the &lt;code&gt;more&lt;/code&gt;  text in comments to split my posts into that which is displayed on the main page and that which is displayed on the post page.&lt;/p&gt;

&lt;p&gt;Thus, for most purposes a less destructive HTML to Markdown convertor is required.&lt;/p&gt;

&lt;h3&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;So a potential Blogspot workflow for editing existing posts is as follows.&lt;/p&gt;

&lt;ol&gt;&lt;li&gt;Click edit Post&lt;/li&gt;
&lt;li&gt;Highlight and copy HTML source&lt;/li&gt;
&lt;li&gt;Paste content into Mou&lt;/li&gt;
&lt;li&gt;Highlight and run HTML to Markdown service&lt;/li&gt;
&lt;li&gt;Edit post as desired&lt;/li&gt;
&lt;li&gt;Copy as HTML&lt;/li&gt;
&lt;li&gt;Paste back into blogspot and publish.&lt;/li&gt;
&lt;/ol&gt;&lt;p&gt;Admittedly, this is still not perfectly elegant, but it&amp;#8217;s not too bad.&lt;/p&gt;

&lt;h3&gt;StackEdit&lt;/h3&gt;

&lt;p&gt;Another option is &lt;a href="http://benweet.github.io/stackedit/"&gt;StackEdit&lt;/a&gt;. This is a web application for writing Markdown. It supports direct publishing to Blogger. You can also edit an existing post by supplying the post ID.
However, it still leaves the issue of converting an existing post back to Markdown.&lt;/p&gt;

&lt;p&gt;One StackEdit workflow for editing an existing post is:&lt;/p&gt;

&lt;ol&gt;&lt;li&gt;Click edit post in blogger and copy and paste the HTML source into StackEdit&lt;/li&gt;
&lt;li&gt;In StackEdit edit the name to return to the original name; copy and paste the tags from the original post&lt;/li&gt;
&lt;li&gt;edit the post&lt;/li&gt;
&lt;li&gt;Click publish to blogger in StackEdit; you&amp;#8217;ll need the postid which is in the URL of the previously edited post&lt;/li&gt;
&lt;/ol&gt;&lt;p&gt;This has several issues:&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;It is silly that you have to recopy the title and the tags just to make an edit&lt;/li&gt;
&lt;li&gt;This additional work removes any benefits from the direct publishing of an existing post.&lt;/li&gt;
&lt;/ul&gt;&lt;h3&gt;General Issue of HTML to Markdown conversion&lt;/h3&gt;

&lt;p&gt;A major issue with all the workflows is the task of converting the HTML in a blogspot post into a Markdown version.&lt;/p&gt;

&lt;p&gt;An alternative is just to save all the markdown versions of posts. Thus if you want to edit a post, you go to the Markdown source, make the changes, convert to HTML, and paste into the existing post. The problem with this is that it forces you to have an external directory of posts. It&amp;#8217;s typically a hassle finding your file. And if you ever make any direct changes to the HTML post, then you will have issues of synchronisation between the post and the source.&lt;/p&gt;

&lt;p&gt;Alternatively, there are many options for converting HTML to markdown. The problem with this approach is that there are many different ways to convert HTML to markdown. Some of these differences:&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;How to represent links (inline or as references)&lt;/li&gt;
&lt;li&gt;How to format whitespace and text&lt;/li&gt;
&lt;li&gt;How to represent headings&lt;/li&gt;
&lt;li&gt;What if any HTML tags should be stripped&lt;/li&gt;
&lt;li&gt;and more&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;A useful pandoc command is:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;pandoc test.html -o test.md --parse-raw --atx-headers
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In particular:&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;&lt;code&gt;--parse-raw&lt;/code&gt; ensures that non-markdown equivalent HTML is not stripped from the resulting markdown&lt;/li&gt;
&lt;li&gt;&lt;code&gt;--atx-headers&lt;/code&gt; uses the hash style of headings which I prefer (i.e., &lt;code&gt;#&lt;/code&gt; for heading 1)&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;I&amp;#8217;ve also created a workflow in automator that allows this command to be run as a service on the selected text. See the screenshot below&lt;/p&gt;

&lt;p&gt;&lt;img src="http://media.tumblr.com/34a64800b82905a6621e7bbc7815895d/tumblr_inline_mlwkrygKYY1qz4rgp.png" alt="osx service pandoc"/&gt;&lt;/p&gt;</description><link>http://jeromyanglim.tumblr.com/post/48919034681</link><guid>http://jeromyanglim.tumblr.com/post/48919034681</guid><pubDate>Fri, 26 Apr 2013 18:53:00 +1000</pubDate></item><item><title>Whiteboard Workflow</title><description>&lt;p&gt;I&amp;#8217;ve been getting into using whiteboards more these days for teaching, supervision mettings, and for my own research work. The following are a few tricks that have evolved.&lt;/p&gt;

&lt;h3&gt;Type of whiteboard&lt;/h3&gt;

&lt;p&gt;I&amp;#8217;m no expert in whiteboard surfaces. The main thing I know is that cheap boards are cheap for a reason. On cheap boards, erasing the content, especially if content has been left for a while takes effort and whiteboard cleaner. In contrast, I recently bought a vitreous enamel (porcelain/ceramic) whiteboard. On this board  content can be left for days and it still just wipes off.&lt;/p&gt;

&lt;h3&gt;Saving whiteboard contents&lt;/h3&gt;

&lt;p&gt;I have the following system in place. I take a picture with my phone, which is synchronised with dropbox which is synchronised with my laptop. These photos can then be moved to an appropriate location for storage.&lt;/p&gt;

&lt;p&gt;I have photos configured to about 3M resolution which seems to produce on my phone files a little under half a megabyte. This seems to be a good size for producing readable text but also not using excessive amounts of 3G data transfer. It also means that the file is transferred to my computer typically in about 10 seconds.&lt;/p&gt;</description><link>http://jeromyanglim.tumblr.com/post/48596358182</link><guid>http://jeromyanglim.tumblr.com/post/48596358182</guid><pubDate>Mon, 22 Apr 2013 16:02:58 +1000</pubDate></item><item><title>Statistical Simulations with R</title><description>&lt;p&gt;Statistical simulation is a powerful tool. The following post links to a few simulation resources particularly with the language R.&lt;/p&gt;

&lt;h3&gt;Examples&lt;/h3&gt;

&lt;p&gt;Here are some examples where I have used statistical simulation:&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;See whether mean posterior densities in JAGS provide reasonable estimates of population parameters.&lt;/li&gt;
&lt;li&gt;Perform a power analysis of non-standard hypotheses. G-Power 3 is good for correlations and t-tests. &lt;/li&gt;
&lt;li&gt;Simulate a dataset to prove to myself that I understand a statistical model (e.g., a factor analysis, an IRT model, etc.)&lt;/li&gt;
&lt;li&gt;Simulate a dataset for a class exercise or for a supervised student so that they can practice particular statistical techniques.&lt;/li&gt;
&lt;li&gt;Check the effect of a particular violation of an assumption on properties of a statistical test (e.g., non-normality, outliers, etc.).&lt;/li&gt;
&lt;/ul&gt;&lt;h3&gt;Simulation Tutorials&lt;/h3&gt;

&lt;ul&gt;&lt;li&gt;&lt;a href="http://personality-project.org/r/simulating-personality.html"&gt;William Revelle&lt;/a&gt; has a tutorial on simulation in personality research.&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.surefoss.org/workflow/simulation-studies-in-r-using-all-cores-and-other-tips/"&gt;Sustainable Research Blog&lt;/a&gt; has an example script.&lt;/li&gt;
&lt;li&gt;Roger Peng has a &lt;a href="http://www.youtube.com/watch?v=tvv4IA8PEzw"&gt;YouTube video on simulation with R&lt;/a&gt;. He discusses (a) the various random variable generating functions (e.g., &lt;code&gt;rnorm&lt;/code&gt;, &lt;code&gt;rpois&lt;/code&gt;, &lt;code&gt;sample&lt;/code&gt; etc.) (b) the use of &lt;code&gt;set.seed&lt;/code&gt; to make a simulation reproducible, (c) extending these ideas to simulating a simple regression, (d) &lt;/li&gt;
&lt;li&gt;John Myles White has an &lt;a href="http://www.johnmyleswhite.com/notebook/2013/01/24/writing-better-statistical-programs-in-r/"&gt;example of simulating data&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www4.stat.ncsu.edu/~davidian/st810a/simulation_handout.pdf"&gt;M. Davidian&lt;/a&gt; has a handout on statistical simulations. the handout discusses (a) reasons for simulation studies; (b) definitions of various properties of estimators such as bias, consistency, coverage, etc.; (c) how to implement in R; and (d) issues around presentation of results.&lt;/li&gt;
&lt;/ul&gt;&lt;h3&gt;General Treatments&lt;/h3&gt;

&lt;ul&gt;&lt;li&gt;Burton et al (2006) has a general tutorial on statistical simulation (with followup comments by Demitras (2007)).&lt;/li&gt;
&lt;li&gt;Ben Bolker has a chapter on &lt;a href="http://www.math.mcmaster.ca/~bolker/emdbook/chap5A.pdf"&gt;Stochastic simulation and power anlaysis&lt;/a&gt; from his book &lt;a href="http://www.math.mcmaster.ca/~bolker/emdbook/"&gt;Ecological Models and Data in R&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;&lt;h3&gt;References&lt;/h3&gt;

&lt;ul&gt;&lt;li&gt;Burton, A., Altman, D. G., Royston, P., &amp;amp; Holder, R. L. (2006). The design of simulation studies in medical statistics. Statistics in medicine, 25(24), 4279-4292. &lt;a href="http://www.soph.uab.edu/Statgenetics/Club_ssg/MPadilla_07.pdf"&gt;PDF&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Demirtas, H. (2007). The design of simulation studies in medical statistics by Andrea Burton, Douglas G. Altman, Patrick Royston and Roger L. Holder, Statistics in Medicine 2006; 25: 4279–4292. Statistics in medicine, 26(20), 3818-3821.&lt;/li&gt;
&lt;/ul&gt;</description><link>http://jeromyanglim.tumblr.com/post/48594813416</link><guid>http://jeromyanglim.tumblr.com/post/48594813416</guid><pubDate>Mon, 22 Apr 2013 15:30:44 +1000</pubDate><category>@rstats</category></item><item><title>Notes on Facets and Factors of Big 5</title><description>&lt;p&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://cogsci.stackexchange.com/questions/3394/facet-versus-scale-prediction-of-big-5-personality"&gt;A few notes on facet versus scale prediction of Big 5 personality&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://stats.stackexchange.com/q/55929/183"&gt;How to estimate population $R^2$?&lt;/a&gt;: This might be relevant to the facet versus factor comparison&lt;/li&gt;
&lt;/ul&gt;&lt;/p&gt;</description><link>http://jeromyanglim.tumblr.com/post/47781292223</link><guid>http://jeromyanglim.tumblr.com/post/47781292223</guid><pubDate>Sat, 13 Apr 2013 00:16:11 +1000</pubDate></item><item><title>Move one variable to the start of an R data.frame</title><description>&lt;p&gt;There are rearrange variables in an R data frame so that one R variable goes to the start of an R data.frame.&lt;/p&gt;

&lt;p&gt;One way is as follows:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;data &amp;lt;- data.frame(id=data$id,  subset(data, select=-id))
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;where &lt;code&gt;id&lt;/code&gt; is the variable to be moved to the start and &lt;code&gt;data&lt;/code&gt; is the data.frame.&lt;/p&gt;</description><link>http://jeromyanglim.tumblr.com/post/47440074168</link><guid>http://jeromyanglim.tumblr.com/post/47440074168</guid><pubDate>Mon, 08 Apr 2013 16:12:00 +1000</pubDate><category>@rstats</category></item><item><title>How to calculate Cohen's d from an F test?</title><description>&lt;p&gt;I was recently asked about how to get Cohen&amp;#8217;s d from a study which only reports an F test and no sample size.&lt;/p&gt;

&lt;p&gt;Assuming the study also reports group sample sizes, then you can derive d from F and d.&lt;/p&gt;

&lt;p&gt;Formula 5 in &lt;a href="http://www.bwgriffin.com/gsu/courses/edur9131/content/Effect_Sizes_pdf5.pdf"&gt;this document&lt;/a&gt;  shows how to do it.&lt;/p&gt;</description><link>http://jeromyanglim.tumblr.com/post/47259697111</link><guid>http://jeromyanglim.tumblr.com/post/47259697111</guid><pubDate>Sat, 06 Apr 2013 17:58:56 +1100</pubDate><category>@statistics</category></item><item><title>Should questions in psychology always be objective questions?</title><description>&lt;p&gt;A question was asked on &lt;a href="http://cogsci.stackexchange.com/questions/3370/should-questions-in-psychology-always-be-objective-questions"&gt;cogsi.stackexchange.com&lt;/a&gt;.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Is it correct that a subjective question is not a psychological question i.e. 
  for a question to be psychological it must be an objective question e.g. 
  about an objective probability or so instead of just what you like or&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It might get closed, so here&amp;#8217;s a copy of my answer:&lt;/p&gt;

&lt;p&gt;Your question potentially raises philosophical questions, but for it to have meaning, there are several definitional issues.&lt;/p&gt;

&lt;h3&gt;What is a psychological question?&lt;/h3&gt;

&lt;p&gt;Presumably a psychological question is any question that concerns the domain of psychology.&lt;/p&gt;

&lt;p&gt;There are many criteria that could be used to evaluate whether it is a scientifically interesting question in psychology. Failure to be interesting, does not stop it from being psychological.
A few criteria might include:&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;&lt;strong&gt;Generality:&lt;/strong&gt; Interesting psychological questions tend to have a generality of cover. Thus, they often apply to many situations and contexts.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Importance:&lt;/strong&gt; Importance can be pure or applied. Important pure questions tend to interlink with many other aspects of psychology (e.g, fundamental importance of learning, cognition, etc.). Applied questions tend to be important if the answer is likely to help people live better, healthier, more productive lives (e.g., research on work, education, depression, etc.).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Clarity&lt;/strong&gt;: An interesting question will be clear and answerable. Thus, the meaning of the question should be clear or at least operationalised to be clear.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Gap:&lt;/strong&gt; More broadly, if you are doing research yourself, then you want their to be a gap in the literature (i.e., the answer to the question is not yet known). I discuss this a little more &lt;a href="http://jeromyanglim.blogspot.com.au/2009/12/how-to-write-introduction-section-in.html"&gt;here&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;You used the example of asking someone about their favourite colour.
Thus, asking about how people choose their favourite colour or how stable is favourite colour over time or whether favourite colour is related to other psychologically relevant variables would be moderately interesting psychological questions (they are at least general, but perhaps not that important). Asking an individual person what is their favourite colour, is a question that concerns the psychology of one individual, but it lacks generality and is of almost know importance to the general public.&lt;/p&gt;

&lt;h3&gt;What is a subjective question?&lt;/h3&gt;

&lt;p&gt;This could mean that the &lt;strong&gt;topic of the question concerns evaluations&lt;/strong&gt;. For example, plenty of researchers study aesthetics, attitudes, values, which are in some sense subjective evaluations. Much research in psychology use measures that involve an element of subjectivity (e.g., personality measures, many self-report measures, qualitative ratings of performance, and so on).&lt;/p&gt;

&lt;p&gt;This could mean that the &lt;strong&gt;topic of the question is concerned with theoretical states that are not readily observable&lt;/strong&gt;. As @Artem has mentioned, the distinction between cognitive psychology and behaviourism readily captures this distinction, whereby cognitive psychology acknowledges both the existence and the value of theorising about internal cognitive states such as goals, cognitions, mental representations, information processing systems, etc. Much of psychology is concerned with such phenomena. Often several steps are required to move from the empirical phenomena to the theoretical concepts. Theories develop over time based on whether the theory is supported by the empirical evidence.&lt;/p&gt;

&lt;p&gt;In both the above cases, asking such subjective questions seems both productive, interesting and legitimate.&lt;/p&gt;

&lt;p&gt;Another way that questions can be subjective is that the &lt;strong&gt;meaning of the question is not clear and thus requires subjective interpretation&lt;/strong&gt;. In psychological science attempts are made to link into existing literatures and use accepted terms. Clarity of the meaning of the question is important.&lt;/p&gt;

&lt;h3&gt;What is a subjective answer?&lt;/h3&gt;

&lt;p&gt;I think this may reflect what you are referring to. A subjective answer is presumably one that depends on the person answering the question. Much of the scientific method in psychology is designed to increase objectivity in the results obtained. E.g., using established measures of constructs, linking into existing terminology and theories, following good practice in study design, performing appropriate analytic techniques, drawing reasoned inferences in light of the data and the established literature.&lt;/p&gt;

&lt;p&gt;There is a lot to the art of psychological science. There are better methods and worse methods. There is certainly a degree of subjectivity in the scientific method, perhaps more so in psychology than in the hard sciences.&lt;/p&gt;

&lt;p&gt;However, there are many critiques that can be made of answers to psychological questions. In general &lt;strong&gt;the scientific method attempts to provide answers that are more rigorous and part of that rigour is achieved through the removal of subjectivity&lt;/strong&gt;.&lt;/p&gt;</description><link>http://jeromyanglim.tumblr.com/post/46919921405</link><guid>http://jeromyanglim.tumblr.com/post/46919921405</guid><pubDate>Tue, 02 Apr 2013 17:23:48 +1100</pubDate></item><item><title>Explaining maximum likelihood estimation to students with minimal background in mathematics or probability</title><description>&lt;p&gt;I was recently asked about how to intuitively explain maximum likelihood estimation to students, most of whom  would not have done a class on calculus or  probability. The context was explaining what maximum likelihood estimation meant in the context of factor analysis.&lt;/p&gt;

&lt;p&gt;This was my quick proposed approach:&lt;/p&gt;

&lt;ol&gt;&lt;li&gt;&lt;p&gt;Draw a normal density and take one value  and show the probability density of that point.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Explain how the probability density of a set of independent points is just the product of the density of the individual points (this is the likelihood assuming certain parameter values).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If we assume the parameters are unknown, but the data is fixed, then for different values of the parameters the likelihood would change. At this point you could draw a plot with an example parameter like the mean on the x axis and the likelihood on the y-axis (it might look like a normal distribution; in fact in many cases it does).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Explain how the likelihood varies with the value of this unknown parameter, but how there is a mode to the likelihood. And say how it seems reasonable that if we don&amp;#8217;t know what the parameter is, perhaps our best guess would be the value that maximises the likelihood of the data&amp;#8230; and that&amp;#8217;s what maximum likelihood estimation does. It defines a likelihood for the data given parameter values, and estimates the values of the parameters to be those which maximise that likelihood.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Then explain how with factor analysis there are more parameters than just a mean, but that the idea of maximising the likelihood extends to these more complex models.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;&lt;p&gt;For a relatively accessible introduction to this topic, I quite liked Ben Bolker&amp;#8217;s book which is free online:  &lt;a href="http://www.math.mcmaster.ca/~bolker/emdbook/"&gt;http://www.math.mcmaster.ca/~bolker/emdbook/&lt;/a&gt;
Chapter 6 has a discussion of likelihood.=&lt;/p&gt;</description><link>http://jeromyanglim.tumblr.com/post/46328239239</link><guid>http://jeromyanglim.tumblr.com/post/46328239239</guid><pubDate>Tue, 26 Mar 2013 21:19:00 +1100</pubDate><category>@statistics</category></item><item><title>How to get a count of univariate outliers in SPSS?</title><description>&lt;p&gt;In an SPSS lab class, students were asked to calculate the number of outliers for a given variable in a dataset. An outlier was defined as an observation with a z-score $&amp;gt; |2.58|$.&lt;/p&gt;

&lt;p&gt;One strategy for doing this is to create a z score for the variable using &amp;#8220;Save standardised variables&amp;#8221; in &lt;code&gt;analyze - descriptives - descriptives&lt;/code&gt; dialog. This will add a z-score version of the variable supplied in the dialog box to the end of the data file. You can then sort this ascending (by right clicking the variable in the data view) and count the observations less than -2.58, and then sort descending and the observations greater than 2.58. This count can then be converted to a proportion by dividing by the sample size.&lt;/p&gt;

&lt;p&gt;I was asked about how this process can be completed in SPSS with out the manual sorting and counting.&lt;/p&gt;

&lt;p&gt;One strategy involves creating an outlier indicator variable (1 if an outlier and 0 if not an outlier). You can then run &lt;code&gt;analyze - descriptives - frequencies&lt;/code&gt; on the indicator variable to get a count and percentage of the outliers for that variable.&lt;/p&gt;

&lt;p&gt;To create the indicator variable you can use the &lt;code&gt;transform - compute&lt;/code&gt; menu. Specifically, if you have a variable called &lt;code&gt;x&lt;/code&gt;, then the expression is x &amp;gt; 2.58 | x &amp;lt; -2.58. It could also be written x &amp;gt; 2.58 | x &amp;lt; -2.58.&lt;/p&gt;

&lt;p&gt;Here is some SPSS syntax to perform the task (see &lt;a href="http://jeromyanglim.blogspot.com.au/2009/10/introduction-to-spss-syntax-advice-for.html"&gt;here&lt;/a&gt; for tips on SPSS syntax.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;* create a variable to demonstrate the example.
COMPUTE x=RV.NORMAL(0,1).
EXECUTE.

* standardise the variable and thereby create a variable called zx.
* could use analyze - descriptives - descriptives.
DESCRIPTIVES VARIABLES=x / SAVE.

* compute whether is an outlier.
* could use transform - compute.
COMPUTE outlierx = zx &amp;gt; 2.58 or zx &amp;lt; -2.58.
EXECUTE.

* run frequencies on the outlier indicator variable.
* could use analyze - descriptives - frequencies.
FREQUENCIES outlierx.
&lt;/code&gt;&lt;/pre&gt;</description><link>http://jeromyanglim.tumblr.com/post/45901789416</link><guid>http://jeromyanglim.tumblr.com/post/45901789416</guid><pubDate>Thu, 21 Mar 2013 18:52:00 +1100</pubDate><category>@statistics</category></item><item><title>Are homogeneity of variance and homoscedasticity the same assumptions?</title><description>&lt;p&gt;I was recently asked in a course on SPSS:&lt;/p&gt;

&lt;ol&gt;&lt;li&gt;Are homoscedasticity and homogeneity of variance the same assumptions?&lt;/li&gt;
&lt;li&gt;Is there a statistical test for assessing homoscedasticity in a multiple regression context?&lt;/li&gt;
&lt;/ol&gt;&lt;p&gt;I answered:
Yes, you are right  homogeneity of variance and homoscedasticity are at a deeper level the same assumption. I.e., that the error variance around predicted scores is the same for all predicted values. In the case of ANOVA or t-tests, the prediction is generally the group mean In the case of multiple regression the prediction is that which is yielded by applying the regression equation to any given set of predictor values. Some authors even use the terms &amp;#8220;homogeneity of variance&amp;#8221; or &amp;#8220;homoscedasticity&amp;#8221; in both ANOVA and multiple regression contexts.&lt;/p&gt;

&lt;p&gt;This deeper level consistency also links closesly to the fact that on a deeper level ANOVA and multiple regression are both instances of the linear model. Andy Field talks about this when he introduces ANOVA.&lt;/p&gt;

&lt;p&gt;Nonetheless, assessing the homogeneity of error variance in an ANOVA context is often performed in slightly different ways to in a multiple regression context. This is because in the ANOVA context you typically have a small number of predictions, whereas in the multiple regression context you have many predictions. Thus, in the ANOVA context, you can  compare estimates of the variance in a given group. In contrast, in multiple regression, you have to infer patterns in the error variance. This is often done by inspecing plots of predicted by residuals.&lt;/p&gt;

&lt;p&gt;As mentioned above, it is common to assess homoscedasticity in the regression context by loking at a plot of predicted values and residuals. That said, there are a wide range of statistical methods for assessing various forms of heteroscedasticty.&lt;/p&gt;

&lt;p&gt;There is a list of quite a few such tests here: &lt;a href="http://en.wikipedia.org/wiki/Heteroscedasticity#Detection"&gt;http://en.wikipedia.org/wiki/Heteroscedasticity#Detection&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As far as I can tell, none of these tests are integrated into SPSS&amp;#8217;s multiple regression procedure. That said, if you were really keen, you might be able to hunt down an external script or you could try other statistical software. That said, I generally find that the plots provide sufficient diagnostic information.&lt;/p&gt;</description><link>http://jeromyanglim.tumblr.com/post/45895285880</link><guid>http://jeromyanglim.tumblr.com/post/45895285880</guid><pubDate>Thu, 21 Mar 2013 15:58:10 +1100</pubDate></item><item><title>Clear explanation of q q plots</title><description>&lt;p&gt;&lt;iframe width="560" height="315" src="http://www.youtube.com/embed/X9_ISJ0YpGw" frameborder="0" allowfullscreen&gt;&lt;/iframe&gt;&lt;/p&gt;</description><link>http://jeromyanglim.tumblr.com/post/45828390892</link><guid>http://jeromyanglim.tumblr.com/post/45828390892</guid><pubDate>Wed, 20 Mar 2013 21:59:23 +1100</pubDate></item><item><title>Effect sizes from the literature for a power analysis</title><description>&lt;p&gt;I was recently discussing with a student the task of  reading the literature to extract effect sizes to perform a power analysis in order to guide a sample size decision. The student seemed overwhelmed by the sheer number of analyses that they were reading.&lt;/p&gt;

&lt;p&gt;UPDATE: I ended up asking the &lt;a href="http://cogsci.stackexchange.com/questions/3384/how-to-estimate-an-expected-effect-size-for-a-planned-psychology-study"&gt;question here&lt;/a&gt; and refining the answer provided below.&lt;/p&gt;

&lt;p&gt;These were a few points I made:&lt;/p&gt;

&lt;p&gt;Any given study (such as in a thesis) typically has multiple hypotheses. A given hypothesis often can be distilled to a particular parameter or effect size (e.g., correlation between two variables; cohen&amp;#8217;s d for difference between group means; r-square difference when comparing two multiple regression models; etc.). Each of these effect sizes will differ from each other. Statistical power is a property of a particular significance test which will likely correspond to a particular effect size. Thus, power and effect size is not a property of a study. Rather you could have different power estimates for different hypotheses.&lt;/p&gt;

&lt;p&gt;Thus, if you are encountering multiple effect sizes in the one paper. You need to work out whether they pertain to the same hypothesis or to different hypotheses. If they pertain to the same hypothesis, then text books and tutorials on meta-analysis can guide you through the process of creating effect size measures, converting effect size measures to a common metric, and combining effect size measures. If they pertain to different hypotheses, then you should be combining them separately.&lt;/p&gt;

&lt;p&gt;A next logical question if you have different effect sizes for different hypotheses is how do you determine sample size requirements. For example, if you are expecting some large effects and some small effects a sample size of 100 may have plenty of power for the large effects but not enough for the small effects. In this case, you could try to obtain the sample size required to have adequate power for all relevant hypotheses. Of course, sample size is often contrained by resources. Thus, you may need to be aware that you only have adequate power to test certain hypotheses.&lt;/p&gt;

&lt;p&gt;A final point is that ultimately you don&amp;#8217;t know what the population effect size is for your study. In some senses that is why you are doing the study. Meta-analytic methods provide tools for quantifying uncertainty in effect sizes, but this does not remove the uncertainty. This often requires you to acknowledge make sample size claims along the lines of &amp;#8220;if the effect size is x, then we will achieve y% power with a sample size of n&amp;#8221;.&lt;/p&gt;</description><link>http://jeromyanglim.tumblr.com/post/45749810952</link><guid>http://jeromyanglim.tumblr.com/post/45749810952</guid><pubDate>Tue, 19 Mar 2013 22:44:00 +1100</pubDate></item><item><title>P-values for significance testing on normality.</title><description>&lt;p&gt;I was recently asked about what p-value threshold to use and which normality test to use.&lt;/p&gt;

&lt;p&gt;Lilliefors test ( &lt;a href="http://en.wikipedia.org/wiki/Lilliefors_test"&gt;http://en.wikipedia.org/wiki/Lilliefors_test&lt;/a&gt; ) is an adaptation of the K-S test and is designed for when the population mean and SD are unknown. In data exploration contexts when doing a general exploration of normality of the data this would almost always be the case. The standard K-S test assumes that these mean and SD are known. Thus, I would trust the Lilliefors p-value to be accurate, and the K-S test p-value would be a little inaccurate when used in this context.&lt;/p&gt;

&lt;p&gt;You then come to the issue of p-values for concluding that normality is violated. The choice of -value is a matter of convention. Different authors recommend different conventions. The consequence of different conventions is just to make it easier or more difficult to conclude that the data is not normally distributed.&lt;/p&gt;

&lt;p&gt;More generally, an assessment of normality is typically used to feed into later decisions about how to proceed with analysis. E.g., should I transform my variables; should I use a non-parametic test instead.&lt;/p&gt;

&lt;p&gt;One reason why the K-S test and Lilliefors test is not very useful in practice is because often it answers the wrong question. We are typically interested in whether the data is approximately normally distributed. Thus, even if the data deviates just a little bit from normality probably wont effect the accuracy of the p-values in out subsequent analyses (e.g., t-tests, ANOVAs, etc.) for example.&lt;/p&gt;

&lt;p&gt;Thus, with small samples, the K-S test is under powered and fails to detect true violations of normality. And with large samples, the K-S test detects violations of normality which are not important for practical purposes. In a way analogous to why we focus on effect sizses rather than just significance tests, it is for this reason that it is important when assessing normality to look at indicators of the degree of normality (e.g., histograms, density plots, q-q plots, etc.).&lt;/p&gt;

&lt;p&gt;Particularly with small samples it may also be useful to apply your prior knowledge of the variables.&lt;/p&gt;

&lt;p&gt;For further discussion:
&lt;a href="http://stats.stackexchange.com/questions/2492/is-normality-testing-essentially-useless"&gt;http://stats.stackexchange.com/questions/2492/is-normality-testing-essentially-useless&lt;/a&gt;&lt;/p&gt;</description><link>http://jeromyanglim.tumblr.com/post/45638395489</link><guid>http://jeromyanglim.tumblr.com/post/45638395489</guid><pubDate>Mon, 18 Mar 2013 12:33:13 +1100</pubDate></item><item><title>Disabling red box Google Plus notifier</title><description>&lt;p&gt;I find the Red Box Google Plus notifier distracting. I only want to see it when I&amp;#8217;m in Google Plus. This question on Stack Exchange provides some solutions for removing it.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://webapps.stackexchange.com/questions/17591/disable-the-google-plus-red-notification-number-on-gmail-google-search-etc"&gt;http://webapps.stackexchange.com/questions/17591/disable-the-google-plus-red-notification-number-on-gmail-google-search-etc&lt;/a&gt;&lt;/p&gt;</description><link>http://jeromyanglim.tumblr.com/post/43673197997</link><guid>http://jeromyanglim.tumblr.com/post/43673197997</guid><pubDate>Fri, 22 Feb 2013 09:36:33 +1100</pubDate></item><item><title>Magic mouse cursor moves randomly after actual mouse cursor move</title><description>&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;: For a while I had been having an intermittent problem with my Magic Mouse. When I moved the cursor with the magic mouse, the cursor would then move to a random location a second or so after the actual move. The problem seemed to occur when I returned to either working at home or working at my office. The problem often seemed to correct itself after a few hours.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; The second solution presented &lt;a href="http://macs.about.com/od/tipstricks/qt/Magic-Mouse-Tracking-Error-An-Easy-Fix-For-A-Magic-Mouse-Tracking-Problem.htm"&gt;here by Tom Nelson&lt;/a&gt; seems to have fixed the problem.  The problem appears to be caused by dust on the surface or on the sensor. Thus, giving the mouse pad surface a wipe and cleaning the optical sensor with a blast of air seems to fix the problem.&lt;/p&gt;

&lt;p&gt;I assume that I was noticing the problem after returning to my home or office because dust and grit builds up over time.&lt;/p&gt;</description><link>http://jeromyanglim.tumblr.com/post/42232421019</link><guid>http://jeromyanglim.tumblr.com/post/42232421019</guid><pubDate>Mon, 04 Feb 2013 11:19:52 +1100</pubDate><category>@osx</category></item></channel></rss>
