top of page
cw_r_guide_video_series_thinkstock_16x9_1920x1080-100758020-orig_edited.jpg

Rstudio

Advanced R Programming

Data processing, cleanup, visualization and manipulation

Rstudio: Services

My Thesis

The following page depicts visualizations and calculations I had done for my thesis research using RStudio. The aims and objectives of this research were to remove maximum cloud cover from visible remote sensing imagery using 3 same day satellite images (Terra, Aqua and VIIRS) and a series of multi-step methods including adjacent temporal deduction, spatially filtering and the regional snow line method. With this, I wanted to address the snow melt out period and any significant patterns. Large teleconnection patterns can significantly influence snow cover patterns, and so another aim was to address the connection between several large scale teleconnection patters to snow cover melt out dates.

​

Check out the link to my github below.

EDRSClanceringSpaceDataHighway.jpg
Rstudio: About

Thesis Methodology Visualization

These images showcase the core methodology of my thesis. Purple pixels are cloud, blue are snow and green are land. This is a visualization of the 4th of April, 2015. As we move through the methodology, there is significant reduction in cloud cover. Note that prior to the spatial filter, the pixels were reclassified to binary, which is the reason for the sudden exclusion of water bodies as "cloud" or "NA" pixels. By the end of the process, there is significantly more snow/land pixels and much less cloud obstruction. This was an extremely successful date for cloud removal via the multi-step methodology.

Rstudio: Portfolio

Visualizing cloud reduction from April 1st to June 30th, 2015

This graphic was created using ggplot. It shows the daily cloud proportions for each step in the methodology. This year was highly successful for this process. It is clear that a majority of cloud was replaced or interpolated using the multi-step methodology. In some cases, such as the beginning of June, we see a decline in cloud by almost 95%.

2015.png
Rstudio: Text

VIIRS Snow Product

Terra and Aqua imagery has been combined throughout multiple projects in the literature, mostly for snow mapping. However, I decided to also add VIIRS snow product. The algorithm VIIRS uses is essentially identical to MODIS, with the exception of the cloud mask used. MODIS cloud masks are more conservative than that of VIIRS. This poses the question, how much cloud reduction was accomplished with the addition of VIIRS to the methodology?

​

This visualization was also done using ggplot. As we can see, up to 25% of cloud reduction was accomplished with the addition of VIIRS to the methodology. 

VIIRS.png
Rstudio: Text

Cloud Duration Maps

Cloud duration maps were created to show the spatial distribution of cloud covered pixels. They were created for the methodology up to the implementation of SNOWL and afterwards in order to visualize the concentration of cloud pixels in both instances.

Rstudio: Text
Rstudio: Pro Gallery

Accuracy

In remote sensing, it is critical to assess products using some sort of accuracy assessment. High resolution imagery can be used as a source of "ground truth" data measurement is not available. In southern BC, this was the case. I used Landsat imagery and mimicked the MODIS snow map algorithm to create confusion matrices comparing the two datasets. However, I added an additional threshold to the algorithm, the Normalized Difference Forest Index. Since snow in forested areas is difficult to recognize using the MODIS algorithm, this piece has been shown to have an accuracy of 94% in mountainous areas. 

​

A visualization below shows the amount of additional pixels using the NDFSI compared to using the NDVI (Normalized Difference Vegetation Index) and NDSI (Normalized Difference Snow Index) alone. 

NDFSI.png
Rstudio: Text

NDSI vs. NDFSI

These visualizations show the NDFSI and NDSI values to NDVI values for open snow, forest snow and forests for two dates, April 20th and June 7th. These dates are chosen to visualize the differences of pixel value distributions throughout the off season. The cut off for pixels being classified as snow is usually 0.4. For snow covered forests, NDSI will sometimes be lower than that, and they will be misclassified as no snow. With NDFSI, the threshold is 0.4 as well, but snow covered forests will have higher NDFSI than NDSI and thus will be classified as snow.

Rstudio: Portfolio
Rstudio: Pro Gallery

Accuracy Results

How accurate was the methodology?

The accuracy reached 98% as shown in the table.

accuracy.PNG
Rstudio: Text

Where were the inaccuracies concentrated?

Understanding where the inaccuracies are tell us a lot about where the methodology is lacking. In the images below, there are clear indications that a majority of wrong classifications are false negatives in forested regions.

AC1.png

AC1.png

AC2.png

AC2.png

AC3.png

AC3.png

Rstudio: Portfolio

Snow Patterns

In order to uncover any significant patterns in the snow cover patterns throughout the off season in this region, I created a last day of snow dataset or LDS. This dataset essentially consists of the LDS for various pixels from 2003-2019. These points were randomly sampled using QGIS at various elevations. The boxplots below show the distribution of LDS throughout the study period at different elevations.

Rstudio: Text
Rstudio: Pro Gallery

In Situ Data

Since there is a wave like pattern throughout the LDS datasets at each elevation bin, I decided to take a look at in situ snow depth measurements to solidify my findings. The graph below show the snow depth patterns throughout the study period (where available) and the median LDS in the same elevation bin. They follow almost the exact same pattern, which corroborates my findings.

in_situ.PNG
Rstudio: Text

Regression Models

Many studies have been done connecting large scale atmospheric circulation patterns to snow cover duration and snow off dates. I conducted a series of linear regression models on the Oceanic Nino Index (ONI) to the LDS datasets as well as multiple linear regression models to the Pacific North American Pattern (PNA) and the Pacific Decadal Oscillation (PDO) in various elevation bins. The results are shown below.

​

The PNA is not statistically significant in explaining the variability in the LDS dataset, but the PDO is at all elevations. The R values in the MLR do not show a significant pattern in terms of increase/decrease with elevation. However, the simple linear regression models for the 3-month ONI indices were always statistically significant and there is a pattern where we see a decrease in R values as elevation increases. This has been shown in other studies as well, as high elevation region are resilient to large scale atmospheric circulation patterns. The R values also slightly increase throughout the year. This shows that although the ONI values impact the LDS dataset from December onwards, the variability is better explained with ONI values from February to June than prior months. 

Rstudio: Text
Rstudio: Pro Gallery
bottom of page