open.℘enciL: Spike Histograms made Simple (and some graphing tenets)

From Frank Harrell's graph course notes

Arguments for tweaking histSpike



histSpike(x, side=1, nint=100, frac=.05, minf=NULL, multwidth=1,

type=c('proportion','count','density'),

xlim=range(x), ylim=c(0,max(f)), xlab=deparse(substitute(x)), 

ylab=switch(type,proportion='Proportion',

count ='Frequency',

density ='Density'),

y=NULL, curve=NULL, add=FALSE, 

bottom.align=type=='density', col=par('col'), lwd=par('lwd'),

grid=FALSE, ...)

# frac: determine the relative length of the whole plot that is used to represent the maximum frequency.
# Side: axis side to use (1=bottom (default for histSpike), 2=left, 3=top (default for scat1d), 4=right)
# add=TRUE: tacks on future graphs to the plot
# tfrac: fraction of tick mark to actually draw. If tfrac<1, will draw a random fraction tfrac of the line segment at each point. This is useful for very large samples or ones with some very dense points. The default value is 1 if the number of non-missing observations n is less than 125, and max(.1, 125/n) otherwise. # Hmm... second part is somewhat dense. # eps: fraction of axis for determining overlapping points in x. For preserve=TRUE the default is 0 and original unique values are retained, bigger values of eps tends to bias observations from dense to sparse regions, but ranks are still preserved. # lwd: line width for tick marks, passed to segments
# col:color for tick marks, passed to segments
# y: specify a vector the same length as x to draw tick marks along a curve instead of by one of the axes. The y values are often predicted values from a model. The side argument is ignored when y is given. If the curve is already represented as a table look-up, you may specify it using the curve argument instead. y may be a scalar to use a constant vertical placement. #huh?
# curve: a list containing elements x and y for which linear interpolation is used to derive y values corresponding to values of x. This results in tick marks being drawn along the curve. For histSpike, interpolated y values are derived for bin midpoints.
# bottom.align: set to TRUE to have the bottoms of tick marks (for side=1 or side=3) aligned at the y-coordinate. The default behavior is to center the tick marks. For datadensity.data.frame, bottom.align defaults to TRUE if nint>1. In other words, if you are only labeling the first and last axis tick mark, the scat1d tick marks are centered on the variable's axis.
# type: used by or passed to histSpike. Set to "count" to display frequency counts rather than relative frequencies, or "density" to display a kernel density estimate computed using the density function.
# grid: set to TRUE if the R grid package is in effect for the current plot
# nint: number of intervals to divide each continuous variable's axis for datadensity. For histSpike, is the number of equal-width intervals for which to bin x, and if instead nint is a character string (e.g., nint="all"), the frequency tabulation is done with no binning. In other words, frequencies for all unique values of x are derived and plotted.
# presorted: set to TRUE to prevent from sorting for determining the order l

General form of the first argument for creating graphs:

 vertical variable ~ horizontal variable | row.conditioner * column.conditioner * page.conditioner, groups=superposition.variable)

# groups makes separate lines or symbols within a panel.

Plotting commmands in R

contour # contour plot
coplot # separate plots of different ranges
ecdf # empirical distribution function plot (Hmisc)
faces # Chernoff faces for multivariate data # What is this?!
nomogram # nomograms (Design)
persp # 3-D perspective plots of grids
pie # pie charts
plclust # plots of cluster trees from hclust
plot.Design # family of functions for ﬁtted objects
plsmo # plot smoothed nonparametric estimates (Hmisc)
scat1d # add data density (rug plot) to plot (Hmisc enhancement of rug)
survplot # survival plots (Design)
tsplot # time series plots

Interesting ones for the dissertation:

usa #map of the US #Location of other studies of Black/White populations
# doesn't really work if typed on the console
symbol.freq # diagram of frequency table (Hmisc)
qqnorm # normal probability plot
qqplot # quantile-quantile plot
plot.summary.Design #plots effect ratios and CIs (Design)
plot.summary.formula # plotting functions for summary.formula function (Hmisc)
plot # scatterplot or line plot
plot.anova.Design # Dot chart of anova table (Design)
pairs # all possible pairs of scatterplots
hist # histogram
hist.data.frame # histogram of all variables in a data frame (Hmisc)
histSpike # high–resolution “spike” histograms and density plots
labcurve # draw and label curves or label existing curves (Hmisc)
datadensity # multivariable version of Hmisc’s scat1d
# displays data density for all variables in a data frame
dotchart # displays values based on position of dots
barplot # vertical or horizontal bar graph
bpplot # box–percentile plots (Hmisc)
boxplot # side-by-side boxplots

From Frank Harrell's hmisc library for R.
histSpike: Add high-resolution spike histograms or density estimates to an existing plot



plot (density(x), type= 'l')

density plot( x) #Trellis/Lattice version

hist(x , probability=T , nclass =20 ) ;

lines(density(x)) #ditto

# probability=T scales y-axes so area under curve is 1.

Adding titles



plot(x, y, main="Main \ntitle", sub='Subtitle', adj=0)

# \n jumps one line down on the output, rather like perl.

# adj=0 Left justification

# adj=0.5 center justification

# adj=1 right justification

par(mfrow=c(2,2), oma=c(0,0,2,0))

# A 2X2 matrix of plots

# leave 2 lines for overall top title (oma is outer margins). Puts title two lines below the edge of the graph)

mtitle ('Overall title')

# A title for several graphs together

pstamp()

# date and time stamp on the lower right

Lines and Symbols



plot(x, y)

axis(3)

# add axis (ticks & labels)

axis(3, labels=FALSE)

# axis on the right and ticks only

lines (1:3, c(2,4,-1)

# add x=1:3, y=2, 4, -1 : could be useful for drawing a line at OR=1 to specify the null in the OR graphs.

points(locator())

# add clicked points

text(.2. 1.3, 'Text')

# add text

text (locator(1), "Mytext")

# add text at click

Reference Lines



abline (a=0, b=1)

# line of identity (a, b=intercept, slope)

abline (a=0, b=1, lty=2)

# dotted line, linetypes are specified with the lty option; Could maybe use this for the 95%CI lines

abline (h=c(1, 3))

# horizontal line at y=1, 3

abline(v=0)

vertical line at x=0

Interaction:
Could be shown as scatterplots or dotplots for different groups.

How is one or more categorical variable related to a single continuous numeric response variable.
The Dotplot function or the dotchart2 function in Hmisc
summary.formula creates it.

Scatterplot matrices
Show all pairwise relationships from among 3 or more continuous variables.



pairs(dataframe[, exposures])

----------------

Multiple graphs on a common scale:
Group all variables with age on the x-axis together
Group all variables with span in months together
Sort by order of values attached to categories (improves accuracy of perception). But this is not necessarily true when the order of categories is important.
Grouping is necessary for some tables but not for graphs --> Kernel density distribution is a better representation of distribution.
Minimize the use of remote legends. Curves can be labeled at points of maximum separation (see the Hmisc labcurve function).
Notations and Symbols: As consistent as possible with the other parts of the document.
Effective Coding Scheme for two lines: Thin Black Line and Thick gray scale line. (Possibly for the OR and CI bounds)
Single categorical Variable: Use a dot plot or horizontal bar chart to show the proportion corresponding to each category. Second choices for values are percentages and frequencies. The total sample size and number of missing values should be displayed somewhere on the page. If there are many categories and they are not naturally ordered, you may want to order them by the relative frequency to help the reader estimate values.

For Specific Aims.

Page 40: Odds ratio graphs

Page 41: Trend of Odds (Also do this for categorical.)

Page 42: Arrange according to contribution of variables.

open.℘enciL

Dec 27, 2011

Spike Histograms made Simple (and some graphing tenets)

0 Comments:

Post a Comment

℘

‡