Skip to content

Pawa-IT-Solutions/data-engineer-test

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

The Problem Statement

Analyzing Student Performance

  • You are given a dataset containing information about students' performance in exams. Your task is to perform basic data analysis and answer some questions about the dataset.

The Dataset Description

The dataset contains the following columns:

  • gender: Gender of the student (male/female)
  • race/ethnicity: Race/ethnicity of the student (['luhya', 'kikuyu', 'maasai', 'luo', 'kamba', 'kisii', 'meru', 'taita'])
  • parental level of education: Parental level of education (['kcpe', 'kcse', 'polytechnic', 'university'])
  • lunch: Type of lunch received by the student (standard, free/reduced)
  • test preparation course: Whether the student completed a test preparation course (none, completed)
  • math score: Score obtained by the student in math exam (out of 100)
  • reading score: Score obtained by the student in reading exam (out of 100)
  • writing score: Score obtained by the student in the writing exam (out of 100)

The Tasks

  • Load the dataset (student_performance.csv) into a pandas DataFrame.
  • Display basic statistics about the dataset, such as mean, median, and standard deviation of exam scores.
  • Visualize the distribution of exam scores using histograms.
  • Calculate the overall average score (average of math, reading, and writing scores) for each student.

Answer the following questions regarding the data:-

  1. What is the average math score for male students?
  2. What percentage of students completed the test preparation course?
  3. Is there any correlation between parental level of education and student performance? Visualize if necessary.
  4. Are students from group C more likely to receive standard lunch compared to other groups?
    • Optionally, explore additional insights or visualizations based on your analysis.
    • Being able to answer even more unasked questions about the data will increase your odds at acing this test.

Submission:

  • Submit your code (Python script or Jupyter Notebook) along with any necessary files you created (except the dataset CSV file) and outline a brief summary of your findings.
  • This assessment evaluates candidates' ability to perform basic data analysis tasks, manipulate data using pandas, visualize data, and draw insights from the dataset.

Test Requirements:

  • Python version 3 installed
  • Jupyter Notebook / Collab
  • If you are having trouble using your own environment, a test environment will be provided for you

Good Luck!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published