JOHOR BAHRU, 14th Feb –  An online structured course Python For Data Analysis was organized to share on using Python programming language for analysing data. The structured course started at 9.00 am and ended at 12pm using Cisco WebEx online conferencing platform. This course was organized by the Postgraduate Student Society School of Computing (PGSS-SC), with Asraful Syifaa’ Ahmad as the moderator. This session was successfully held with the PGSS-SC committee’s help, Muhammad Zafran Muhammad Zaly Shah, Muhamad Farhin Harun and Muhammad Anwar Ahmad.

This course received 683 online registrations, and there were 324 participants who turned up for the event that had registered. There are 61.7% Ph.D. level participants, 29.6% Master degree level participants, and 8.6 % Bachelor level participants. We also received participation from UTM staff. Additionally, 75.6% are from UTM, and the rest is non-UTM and also alumni. The honorable speaker for the program is Dr. Chan Weng Howe. Dr. Chan joined Faculty of Computing, Universiti Teknologi Malaysia as a senior lecturer in early 2017. He is a UTM alumni whereby he received both his Bachelor of Computer Science (Bioinformatics) and Bachelor of Computer Science (Bioinformatics). He is also an active member of the Artificial Intelligence and Bioinformatics Group (AIBIG). His main research interests are computational intelligence, bioinformatics and IoT for healthcare. He is a member of International Society for Computational Biology and Bioinformatics (ISCB), Institute of Electrical and Electronics Engineers (IEEE) and Asia Pacific Bioinformatics Network (APBioNet). He is an active researcher with several grants, publications, and copyrights.

The course starts at 9 am with the moderator briefly introducing the speaker, Dr. Chan’s background and achievements. Then, the floor was given to the speaker and he started with a brief introduction on performing analytical process with Python. He showed slides of the ecosystem and libraries that had been developed for the purpose. He also showed the analytic steps when using the Python environment. Next, he showed a slide on how to get started with the language and mentioned that it can be setup for local or cloud environment. For the course, cloud-based environment was used via Google Colab. Then, Dr Chan described the outline of the course which includes data acquisition, data pre-processing, exploratory data analysis and visualization. Before moving on to the main part of the course, he asked participants to download the hands-on resources so that they can follow along.

The course began with the first section, data acquisition. He started with a brief tutorial on setting up and using Google Colab. Then, he showed the codes on how to import and read .CSV dataset files from a public repository which uses PANDAS library. Next, he goes through the code sections which includes viewing data, adding header, getting basic info of the dataset and exporting the dataset to a new file. He then answered any questions that arise from the participants before moving on to the next section. The second section involves data pre-processing in which it is a process to clean the data and transform it into a format that is ready for analysis. It begins with identifying missing value and replacing them with a Not a Number(NaN) notation for computational speed and convenience. Next, he showed how to deal with missing data by dropping or replacing them accordingly. Then, the format of the data needs to be checked to ensure they are correct (whether they are float or integer). Afterwards, he showed how to standardize and normalize the data. Before continuing, he answered some participant questions and take a break for around 7 minutes until 10.40am. After break, Dr. Chan proceeds to continue the rest of the data pre-processing section. The next step involves binning which is grouping columns for simplifying analysis. Then, moved on to indicator variables for labelling categories. Finally, he showed again on how to export the data and managing the files generated in the Google Colab.

Afterwards, Dr. Chan moved on to the next section of the course, which is exploratory data analysis. This process uses the Seaborn and Matplotlib libraries. He mentioned that this section does not fully cover the full features of the libraries as different expertise will use different functions and methods. He then goes through the basics of the analytical process including continuous numerical variables, categorical variables, descriptive statistical analysis and basics of grouping. Then, he moved on to the visualization section and showed some examples of the visualization capability of the library including heat map and box plot. After that, there was an extra section where he touched on correlation, causation and analysis of variance (ANOVA). Finally, he showed the important variables relating to the data that was extracted from the analytical process, which he suggests can be used for machine learning (ML) modelling. He then went back to the slides and showed a summary of the Python ML deployment steps.

The speaker’s session ended around 11:30 am. Afterwards, a Q&A session was held until the end of the structured course at 12pm. There were many excellent questions and the speaker manages to answer all the questions. To wrap up the course, a photography session was held and the moderator took screenshots of all the participants. Overall, the structured course was held without any major issues. There were some slight interruptions from some unmuted microphones from participants but it was swiftly dealt with by the host. To conclude, the participants appreciate all the efforts by the SPS UTM and PGSS SC as the organizers. Additionally, 311 out of the 324 participants rate this structured course 4 and 5 stars in terms of the overall rating. From the feedbacks, the participants are very satisfied with Dr. Chan’s explanation and his interaction with the participants. Some also would like a continual course for more in depth learning of the topic. This course outcome will give a boost to motivation for PGSS SC to organize more workshops like this.