time for self

from the NY Times a couple days ago: “Taking Time for the Self on the Path to Becoming a Doctor“. good advice that applies even if you’re training to be the non-medical kind:

“It’s partly a coping mechanism,” Dr. Ratanawongsa said. “We tell ourselves that we can do everything but not at the same time, so we are going to put off the thing that defines us as a person — time with children, running a marathon, painting, playing music — in order to get trained because being a doctor is also rewarding.”

That delayed gratification works well initially because residents believe it is only temporary. “A lot of what matters to residents at this time is the sense that they are learning to care for patients well and growing as doctors. They feel that what they are doing is going to be worth it.”

But when the imbalance persists for longer than initially expected, professional growth is not enough to sustain most young doctors. “The ones who are happier,” Dr. Ratanawongsa observed, “are the ones who have held on to one or two things and have said, ‘I’m not just another resident. I play the guitar, I run races, or I go home to family.’ They don’t do these things to the same extent as they did before residency, but they do them enough to maintain a sense of self.”

Residents who don’t find this balance are at risk of burnout, clinical depression or, more commonly, subtle forms of stress. “These residents may feel that even if they can give excellent care most of the time, there are times when they snap at a patient or don’t order a test fast enough because they are so burnt out.”

here’s a link to the paper referred to in the article (i think)

live blogging == journalism?

I came across an interesting blog post on scienceblogs.com, about the difference (or lack thereof) between live blogging a conference, and reporting on a conference. I’ve been privy to conversations happening at conferences and workshops I didn’t actually attend, via Twitter and Facebook friends, and I have been present at events that were being “liveblogged”. But it never occurred to me that journalists might object:

At a recent meeting at the Cold Spring Harbor Laboratory (CSHL) in New York state, Daniel MacArthur from the Wellcome Trust Sanger Institute in Cambridge, United Kingdom, brought into focus how fuzzy the line between journalist and scientist is becoming. In addition to reporting on genetic variation in a gene that is active in fast muscle fibers at The Biology of Genomes meeting, MacArthur wrote several on the spot blog posts covering advances discussed by the participants. Francis Collins also mentioned results on his new Web site.

A specialized Web-based news service, Genomeweb, complained. To attend CSHL meetings, reporters agree to obtain permission from a speaker before writing up any results. But MacArthur didn’t have to click that box when he registered and was free to report without getting any go-ahead. Several other participants were twittering, says CSHL meetings organizer David Stewart. “They weren’t held to the same standards” as the media, says Stewart.

The liveblogger in question, Daniel MacArthur, had this to say about it:

 However, I do want to emphasise the importance in general of conference organisers encouraging direct, crowd-sourced reporting of scientific data through online media. Science benefits from the open communication of data to the broadest possible audience (not only scientists, but also the wider community). Some conferences do benefit from sealing themselves off from the outside world, allowing freer exchange of ideas between participants - but meetings that are interested in increasing the impact of their presentations on the community as a whole would be well-served by actively embracing audience blogging.

It’s worth mentioning here that most of the dangers of live-blogging are (in my mind at least) generally over-stated. For instance, the risk of being scooped due to data posted on the web seems rather far-fetched given that most of the potential scoopers are already sitting in the audience watching the presentation. There is a fear that live-blogging distracts people from watching the seminar; I would argue in response that - given the number of people I see programming or working on their grant submission in genomics meetings - we should be grateful that live-bloggers are actually engaging directly with the material being presented.

I guess I just always figured presentations at conferences were public events, and therefore fair game for whoever wanted to post, tweet, or otherwise write about any data, results, conversations, experiences, etc. they found interesting. I’m not sure if the conferences I attend have a policy similar to that of CHSL, about asking for permission from the presenter before writing about their work. But I kindof doubt it, because the proceedings of these conferences (CHI, CSCW, GROUP, etc.) are considered to be archival publications, rather than preliminary results or works-in-progress. However, the conversations happening at the CSST workshop last week were… well, what? Public? Private? Blog-worthy? Newsworthy? Nobody asked us if we minded our comments being shared widely and “persistantly”. I wouldn’t have expected them to.

productivity

well, the paper is submitted. but man, i NEVER want to do that again. and by “that” i mean write a single-author paper in about a week. i’d been working on analysis (along with all my other dissertation- and work-related stuff) for months, but when we returned from the holiday weekend — where i tried and failed to write — all i had done was a bunch of statistics, graphs, and notes.

i’ve been using this service called RescueTime for the past several weeks as a way to track my hours for different projects i am working on, and as an indicator of my productivity in general. basically, you install a little app on your computer, and it sends data about what applications are active to the RescueTime server. you can log in and see reports of how much time you are spending looking at which apps and web pages (for $8/mo. you can get reports broken down by window title, not just application).

i have been happy to learn that i don’t “waste” as much time as i might have thought. but this past week isn’t a very accurate indication of my normal work habits. i went from notes and graphs to a 10-page ACM-format paper in a week:


(click for animated gif showing May 26 - June 1)

it’s nice to see that i’ve still “got it”, i guess. but that was not a fun week.

i highly recommend RescueTime, if like me you want to be more meta about how you spend your time, and like looking at data.

hierarchies and semantics

In my dissertation experiment, I asked ~60 people from two different graduate schools (or “communities”) on campus to label and organize a set of short documents into a hierarchy (tree structure). They used a web-based interface created specifically for the experiment, that closely resembled the file-and-folder metaphor everybody is used to in Microsoft Windows and MacOS.

Each person was instructed to organize the documents with a different “target audience” in mind: for themselves, for somebody in the same graduate program, and for somebody in the other graduate program. Twenty people were randomly assigned to each “target audience” group, ~10 from each community, and each person organized the documents once, for a single target audience. This resulted in the creation of 6 different “types” of file-and-folder hierarchies, by PRODUCER and AUDIENCE; the N in the chart below represents both the number of participants and the number of hierarchies created, by type:

I have been exploring different ways to analyze the hierarchies participants created, and I am starting to think there are three types of measures:

  1. vocabulary — word-level measures, like label agreement, average word rank, number of unique words, length of labels
  2. “topology” — structural measures, like number and size of folders, average path length, etc.
  3. semantics — this one is a little harder to measure than the others. i wanted to know whether the conceptual groupings of files might look different based on the community of the hierarchy creator, and the target audience

I used multidimensional scaling (MDS), which I wrote about a few days ago, which seemed to show that there were indeed meaningful patterns in the way documents were grouped together. But, I lost too much information with this technique — the MDS showed three distinct conceptual groups, but it was hard to determine whether structure existed within those groups.

Based on previous categorization research (Rosch et al. 1976), I expected that students from CS would create more nuanced conceptual structures for the CS-related documents, and MSI students would do the same for the information-science-related documents. but MDS was not the right technique to use for this — so I used hierarchical cluster analysis instead.

Below are two dendrograms, one that represents the clustering based on data from all of the CS students, and the other that represents data from all of the MSI students. The same three groups from the MDS are also represented here: CS, Information Science, and Security.

In the aggregate MSI student dendrogram, the Information Science cluster is broken into two parts:

In the aggregate CS student dendrogram, the same documents that make up Info Sci 1 and 2 above are merged into one cluster, while the same documents that make up the CS cluster above are broken into two groups:

My next analysis steps will be to figure out how to use this information to systematically examine all of the hierarchies for evidence of these clusters. Ideally, I would like some kind of quantitative measure that indicates to what extent individual participants created structures with these same kind of patterns — but I’m not sure how to do that yet. My ultimate goal is to be able to compare hierarchies along all three dimensions mentioned in this post: vocabulary, topology, and semantics, and find out whether differences exist according to common ground and audience design factors.

results!

after about a month of on-and-off panic about my dissertation experiment analysis (and too many marathon sessions with crappy R documentation), i now feel confident in saying that YES, i do have some pretty interesting results! i’ll be working on writing everything up for a rapidly approaching paper deadline; i’m guessing i’ll be posting bits as i work on figuring out how to say what needs to be said in the results section, and how best to describe what i think it all means.

i also have to say, what the heck did people do before the internet? i mean, here i am using open-source statistical software, doing fairly nonstandard analyses for my field. and yet, it seems like no matter what problem i encountered either with the tool or in trying to specify the model correctly, somebody else had already figured it out or written a paper on it. for example, i will definitely be citing this really fabulous journal article on generalized linear mixed models (Bolker et. al, 2009) with not one, but TWO online supplements that are equally fabulous. who knows what i would have done without having all this information at my fingertips!!

more fun with MDS (and R)

i’ve spent the past few hours futzing (yes, futzing) with some more MDS plots of the document (file) grouping data from my dissertation experiment, visualized this time by document rather than by person. each participant in the experiment organized the same 33 documents into a file-and-folder hierarchy. i chose the documents (document excerpts, really) from trade publications related to the two intellectual communities from which i recruited participants. roughly 11 documents were selected to be more familiar to one community, 11 to the other community, and 11 that could go either way in my estimation.

the document groupings from each participant’s hierarchy can be represented by a 33×33 matrix having one row and one column for each document. in each cell of the matrix there is a 1 if the two documents (row and column) that line up in that cell were grouped into the same folder; if not, there is a 0. there is one matrix for each participant, so, ~100 of them. these matrices look something like this:

  doc 1 doc 2 doc 3 doc 4
doc 1 1 1 0 0
doc 2 1 1 0 0
doc 3 0 0 1 1
doc 4 0 0 1 1

i needed a way to summarize across all of the matrices in each of the 10 conditions of the experiment, to produce one MDS plot for each condition. these MDS plots represent the pattern of similarities in how the ~10 participants in each condition grouped the documents together. so, i “stacked up” the matrices from each condition, and for each pair of documents i averaged across the 1’s and 0’s in the corresponding cells from each matrix. this produced a single matrix representing, on average, how often any two files were grouped together in the same folder.my initial thought was that i would try to identify any differences between conditions in the MDS plots. take a look at the plots below and see what you think (PNG this time; click to view them bigger):

what is most striking to me is there are definitely three clusters in both plots, and the clusters are along the same topical divisions that i used to choose the files!

even more interesting are the documents about which the participants and i disagreed on the topical grouping. in all of the 10 plots, there are several  documents that consistently appear in a different topical cluster from the one i chose:

security complexity threats, CS –> Both
computer attitudes usage, Both –> Info Sci
tv friends communication, Both –> Info Sci
ai intelligence artificial, Both –> CS
biology design changes, Info Sci –> CS
randomness design interaction, Info Sci –> CS

take a look at all the plots and see for yourself: [ 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 ]

i am intrigued to find that the clusters all look as similar as they do. however, it is important to remember that the measure upon which these MDS plots are based represents information about which documents were grouped together in the same folder, but does NOT include information about the structural complexity of the hierarchies. these plots are NOT meant to indicate that in general, participants grouped files into three folders corresponding to these topics. the overall perception of the topical groupings seems fairly similar across the conditions; but the folder structure and labels can still vary. i am currently using a mixed model regression to analyze data from the finding part of the experiment, where participants returned and searched for documents in the hierarchies other participants had created. stay tuned for more updates!