Vernon's Blog

Scottish life stories of an autistic man

Intro to R reflective essay


Introduction


The subject of this essay is firstly about my learning journey with the programming language R and secondly how it is applicable to future work or study.


Course Background


In September 2024 I began an MSc in Data Science with the UHI. In the first semester I began the module ‘Intro to R and Data Visualisation.’ I also did modules ‘Fundamental Statistics’ and ‘Data Analytics on the Web.’ The classes and the course were entirely online. All three modules featured the use of the programming language R heavily. During the majority of the semester I had live lectures (which were also available as recordings) and then I had interactive tutorials built into the software package RStudio. Live lectures were delivered in the evenings for an hour each session. There was a module skills test at around the halfway point of the semester for ‘Intro to R’ and ‘Fundamental Statistics’. In terms of specific content for ‘Intro to R’, we learned how to display datasets with multiple parameters using ggplot, clean the data, change from ‘wide’ format to ‘tall’ format, strings versus numeric values, functions, how to join datasets, how to create basic Shiny apps and use FOR loops (the maps function) with the Purr package. And for the end of semester assignment module ‘Intro to R’ I had to create a Shiny app where the user can select from multiple datasets and which then produces a report of the dataset using R markdown. Whereas ‘Data Analytics on the Web’ consisted of two assignments that were to be worked on with support of our lecturer, using Google’s Big Query and Google Colab. ‘Fundamental Statistics’ involved topics such as linear regression, sampling, normal distribution criteria, simulation and testing, parametric vs non parametric testing.


Learning with R

Although I already had a lot of the tools of programming with my Matlab and Python study, R made using statistical data significantly easier. I guess one thing that surprised me was the ease of manipulating datasets with sometimes thousands of rows and often dozens of columns. It was also free which was surprising given how polished and smooth RStudio is (and R). Although Matlab uses matrices which can also store statistical data and Python is similar. Although R is easier to use for statistical data and is pretty intuitive. I believe the majority of stuff we learned with R this semester I could have also been done with Matlab (or Python). There is an interesting comparison of Matlab, R and Python in the following academic report: R is free and rapidly growing. “While Python maintains a strong hold on the market (at roughly 15,000 job postings in 2017), it should be noted that R is rapidly decreasing the margin of preference within the data analytics field (with 9,000 in the same time frame).” [Ozgur, Colliau, Rogers, Hughes, Myer-Tyson 2017]

One moment that felt particularly like a breakthrough was when I successfully got my first Shiny app to display the dataset I wanted from a selection of data. I managed to get my Shiny app to display my academic exam scores data in Maths, English, History, French and Science. It was great to see my data displayed at the click of a button and to be able to analyse the dataset so quickly and easily. Ultimately this data was then passed onto a report that the user could download. I couldn’t help but go out for a walk for a mile or so with all my triumphant energy.


Mechanical Engineering vs R differences

A noticeable difference between R and mechanical engineering is you can much more readily gauge how realistic your answer is from your script output as opposed to your calculation output of an engineering problem such as analysing the stress in a structure point. Because in engineering as a student (in a classroom) you are removed from the actual applications (until you spend time in industry) you can be a bit detached and unsure in your answer. For example you might calculate the stress at a point of a bridge structure. Whether its 100 mega pascals or 1 megapascal on a bridges centrepoint you don’t have much idea whether you are right. You can imagine and draw such a bridge but in practice you cannot build such a bridge just for a theoretical problem. In R the application is in front of you, you are not removed or detached. When I did the ‘Intro to R’ skills test I could view the dataset in the ‘environment’ tab and check my operations. If I wanted to order the data by height size (from the skills test dataset) from lowest to highest I could go into environment and check the output. If the code threw out an error when running it would give a reason. Progress was very visible in R and the reward of correctness was immediate.

Another point is Mechanical Engineering software packages like ANSYS often had scarce support online in terms of discussion forums, Youtube tutorial videos and even people who knew the software to ask. The user interface of programs like ANSYS and Pro Engineer was often clumsy, glitchy and just generally not a smooth experience. Software languages such as R generally also has much more online support with sites such as Stack OverFlow providing a huge number of answers to various R obstacles. Youtube has various online tutorials where you can follow step by step. Github has a huge database of people’s work available to the public, and allows you to publish your work on your profile and readily show it off to potential employers.


Online Aspect of Learning R

One major drawback of the R learning experience (and the wider MSc) was the lack of classmate interaction. In my previous degree at Strathclyde and Nanyang Technological University I made numerous acquaintanceships and friendships (in a class of about 120 students at Strathclyde) that enriched both my learning experience and my life at the time and my life afterwards. When another student encountered a problem I would help them arrive at the solution and I when I arrived at a problem there was always someone who found the answer somewhere. Living in Halls of Residence meant we were often a few doors away from each other when studying. In an online environment where few put on their webcam or microphone or ask questions in class it can be demoralizing when you hit a theoretical obstacle and it can be harder to ask for help. When learning in person you got to know people as you were waiting for a lecture to start with small talk. There were team building activities such as football and pub crawls where students got to know their classmates. The whole experience was more social and human beings are social animals as we know. Just the site of a hall full of other student struggling with the same engineering theory was motivating. “Learning from peers is quite difficult in online classroom settings. In the traditional classrooms, students can directly discuss with fellow classmates to obtain insights, ideas, and suggestions, but not in online settings.” [Tai-ming Wut & Jing Xu, 2021]. “The results showed that students preferred to complete activities face-to-face rather than online, but there was no significant difference in their test performance in the two modalities” [Grieve & Kemp 2014].

Conversely, one major positive of the program was it allowed me to study with a chronic mental health problem. The MSc program was extremely convenient. With my social network largely based here in Inverness, it would have been difficult and impractical to relocate. Economists expect workers to move like droplets of water to where they are needed, but human beings in reality like community and stability. Maslow’s hierarchy of human needs lists ‘love and belonging’ in the middle of the pyramid [Maslow 1943]. Layard talks about ‘community and friends’ as one of his big seven factors of happiness[Layard 2011].


Future Work/Study


Software and Data Science is a growing field, both globally and in Britain and Europe. “The demand for qualified software engineers far exceeds the supply, creating a gap that cannot be filled solely by the domestic workforce. According to the German Economic Institute (IW), the country could face a shortage of up to 100,000 IT professionals by 2025” [The Munich Eye July 2024]. Moreover generative AI has blown up in the last few years with many people using ChatGPT which is a growing field. “According to industry estimates, ChatGPT reached 100 million monthly users in the first two months after launch, which makes it the fastest-growing technology application in history” [Huh, Nelson &Russell 2023]. ChatGPT is approaching the intelligence of a human. In the popular book The Coming Wave by Mustafa Suleyman “it can ace standardized tests from the Bar to the Graduate Records Examination.” Also “a recent publication from my old colleague at Google showed that an adapted version of their PaLM system was able to achieve remarkable performance on questions from the U.S. Medical Licensing Examination.” Learning data science will make me a part of that. Which is why R is so relevant today. It gives me the skills to work within this industry.


Conclusion


In conclusion learning R meant I’m able to visualise and analyse datasets. Learning R allows me to create results before my eyes as opposed to in mechanical engineering where you need to go out into the field. Learning R has been difficult at times with the online study being isolating. However I believe this is worthwhile as it lets me be a part of this upcoming industry.


References

  1. [Ozgur, Colliau, Rogers, Hughes, Myer-Tyson 2017]
  1. [Tai-ming Wut & Jing Xu, 2021] In Conclusion section
  1. [Grieve & Kemp 2014]
  1. [Maslow 1943]

5. [Layard 2011]

6. [The Munich Eye July 2024]

7. [Huh, Nelson &Russell 2023]

8.’The Coming Wave’ by Mustafa Suleyman

Leave a comment