Friday, February 24, 2023

The importance of data cleaning in machine learning: Best practices for preparing data for model training

Data is the lifeblood of machine learning algorithms. Without high-quality data, machine learning models are unlikely to be accurate, effective, or useful. However, the data that we work with in the real world is often far from perfect. Data can be messy, incomplete, and inconsistent, with errors, outliers, and missing values that can cause problems for machine learning algorithms.

That's where data cleaning comes in. Data cleaning, also known as data preprocessing or data wrangling, is the process of identifying and correcting errors or inconsistencies in the data before using it to train a model. Proper data cleaning is critical for ensuring the accuracy and effectiveness of the resulting model.

Why is data cleaning important?

There are several reasons why data cleaning is important for machine learning:

1- Improved accuracy:

Data cleaning helps to remove errors and inconsistencies in the data that can lead to inaccurate predictions and decisions. By ensuring that the data is accurate and consistent, the resulting model will be more reliable and effective.

For example, let's say you are building a machine learning model to predict customer churn for a telecommunications company. If the data contains errors or inconsistencies, such as incorrect or missing values for key features like customer tenure, monthly charges, or service type, the resulting model is likely to be inaccurate and unreliable. By cleaning the data and ensuring that all values are accurate and consistent, you can improve the accuracy and effectiveness of the model.

2- Better insights:

Data cleaning can help to identify patterns and trends in the data that might not be immediately apparent. By cleaning the data and exploring it in detail, you can gain a deeper understanding of the underlying relationships and make more informed decisions.

For example, let's say you are analyzing a dataset of customer reviews for a hotel chain. By cleaning the data and identifying common themes and sentiments in the reviews, you can gain insights into what customers like and dislike about the hotel chain, which can inform decisions about marketing, service, and design.

3- Reduced bias:

Data cleaning can help to reduce bias in the data that can lead to unfair or discriminatory outcomes. By removing irrelevant or redundant features and balancing the data, you can ensure that the resulting model is fair and unbiased.

For example, let's say you are building a machine learning model to predict loan approval for a bank. If the data contains biased features, such as race or gender, the resulting model is likely to be biased as well. By removing these features and ensuring that the data is balanced and representative, you can reduce the risk of bias and ensure that the model is fair and unbiased.

Best practices for data cleaning in machine learning:

Now that we've established why data cleaning is important, let's take a look at some best practices for preparing data for model training.

1- Remove duplicates:

Duplicate data can skew the results of a model, so it's important to remove any duplicate entries before training the model. For example, if you are analyzing customer purchase data, you might find that some customers have multiple entries in the dataset due to errors or inconsistencies. By removing these duplicates, you can ensure that the resulting model is based on accurate and representative data.

2- Handle missing values:

Missing values can cause errors in the model and reduce its effectiveness. You can handle missing values by either removing the affected rows or columns, or by imputing the missing values with appropriate estimates. For example, if you are analyzing customer survey data and some customers have not answered certain questions, you might choose to impute the missing values with the average or median value for that question.

3- Remove irrelevant or redundant features:

Features that are not relevant to the problem or that are highly correlated with other features can lead to overfitting or reduce the accuracy of the model. It's important to remove these features before training the model. For example, if youare analyzing customer purchase data and some features, such as the customer's name or address, are not relevant to the analysis, you might choose to remove those features.

4- Handle outliers:

gutliers are data points that are significantly different from other data points in the dataset. Outliers can skew the results of the model and reduce its effectiveness. There are several ways to handle outliers, including removing them, transforming them, or treating them as separate classes. For example, if you are analyzing sales data and there are some extreme values for a particular product, you might choose to transform those values to make them more representative of the overall distribution.

5- Normalize or scale the data:

Data normalization or scaling is the process of transforming the data so that it has a standard scale or distribution. This can improve the performance of the model, especially for algorithms that are sensitive to the scale of the features. For example, if you are analyzing customer purchase data and some features have very different scales, such as price and quantity, you might choose to scale those features to make them more comparable.

6- Balance the data:

Imbalanced data, where one class is significantly more represented than the other, can lead to biased models that are less effective. It's important to balance the data by either oversampling the minority class, downsampling the majority class, or using synthetic data generation techniques. For example, if you are analyzing medical data to predict disease outcomes and the number of positive cases is much lower than the number of negative cases, you might choose to oversample the positive cases to balance the data.

Conclusion:

Data cleaning is a critical step in preparing data for machine learning model training. By identifying and correcting errors, inconsistencies, and biases in the data, data cleaning can improve the accuracy, effectiveness, and fairness of the resulting model. Some best practices for data cleaning include removing duplicates, handling missing values, removing irrelevant or redundant features, handling outliers, normalizing or scaling the data, and balancing the data. By following these best practices, you can ensure that your machine learning models are based on accurate and representative data, and are more likely to produce reliable and useful results.

Basics of Machine Learning

Basics of Machine Learning by Saeeda Yasmeen

Machine learning has become one of the most exciting and promising fields in modern technology. From self-driving cars to voice assistants, machine learning is behind many of the advances we see today. But what exactly is machine learning, and how does it work? In this blog post, we'll answer those questions and more.

What is machine learning?

Machine learning is a type of artificial intelligence (AI) that enables systems to learn and improve from experience without being explicitly programmed. In other words, machine learning algorithms can learn from data and improve their performance over time, without needing to be explicitly told how to do so. This is different from traditional programming, where developers write explicit rules for the computer to follow.

Types of Machine Learning:

There are three main types of machine learning: supervised learning, unsupervised learning, and reinforcement learning.

Supervised learning:
Supervised learning involves training a model to make predictions based on labeled data. The model is presented with examples of inputs and their corresponding outputs, and it learns to make predictions based on those examples. For example, a supervised learning algorithm could be trained to recognize handwritten digits by being shown many examples of handwritten digits and their corresponding labels (e.g. "this is the digit '3'"). Once trained, the algorithm can then make predictions on new, unlabeled data (e.g. a new handwritten digit).
Unsupervised learning:
Unsupervised learning involves training a model to find patterns in unlabeled data. The model is presented with data without any labels or categories, and it learns to identify similarities and differences between different data points. For example, an unsupervised learning algorithm could be trained to identify groups of customers who share similar purchasing habits, without being given any information about what those groups are.
Reinforcement learning:
Reinforcement learning involves training a model to make decisions based on feedback from its environment. The model learns by interacting with its environment and receiving rewards or punishments based on its actions. For example, a reinforcement learning algorithm could be trained to play a game by receiving points for making good moves and losing points for making bad moves.

How does machine learning work?

Machine learning algorithms work by finding patterns in data and using those patterns to make predictions or decisions. To do this, the algorithm goes through several steps:

Data collection:
The first step in any machine learning project is to collect data. This can be done manually, by gathering data from various sources and organizing it into a format that can be used by the algorithm, or it can be done automatically, by using sensors or other devices to collect data in real time.
Data preprocessing:
Once the data has been collected, it needs to be preprocessed. This involves cleaning the data (removing any errors or inconsistencies), transforming the data (converting it into a format that can be used by the algorithm), and splitting the data into training and testing sets.
Model training:
After the data has been preprocessed, it's time to train the model. This involves feeding the algorithm the training data and allowing it to find patterns in the data. The algorithm adjusts its parameters to minimize the difference between its predictions and the actual labels (in the case of supervised learning) or to identify patterns in the data (in the case of unsupervised learning).
Model evaluation:
Once the model has been trained, it needs to be evaluated to see how well it performs on new, unseen data. This involves testing the model on the testing data and comparing its predictions to the actual labels. The performance of the model is measured using various metrics, depending on the task.
Model deployment:
Finally, once the model has been evaluated and deemed to perform well, it can be deployed in the real world. This involves integrating the model into a larger system or application that can make use of its predictions or decisions.

It's worth noting that machine learning is not a one-time process. Models can become outdated over time as new data becomes available, so it's important to continuously train and update models to keep them accurate and effective.

Why is Machine Learning Important?

Machine learning is important for a variety of reasons, ranging from its ability to automate repetitive tasks to its potential to revolutionize entire industries. Here are some of the key reasons why machine learning is so important:

Automation:
Machine learning algorithms can automate repetitive and time-consuming tasks, freeing up humans to focus on more complex and creative work. This can lead to increased productivity, improved efficiency, and lower costs.
Personalization:
Machine learning algorithms can personalize experiences for individual users, tailoring recommendations and content to their specific preferences and needs. This can lead to increased customer satisfaction and loyalty.
Predictive analytics:
Machine learning algorithms can analyze vast amounts of data and identify patterns that humans might miss. This can lead to more accurate predictions and better decision-making in fields ranging from healthcare to finance to transportation.
Improved safety and security:
Machine learning algorithms can be used to detect and prevent fraud, identify potential safety hazards, and improve security in a variety of settings. This can lead to a safer and more secure world for everyone.
Innovation:
Machine learning has the potential to revolutionize entire industries, from self-driving cars to personalized medicine to renewable energy. By enabling systems to learn and improve from experience, machine learning can drive innovation and lead to new and exciting discoveries.

Overall, machine learning is important because it has the potential to improve our lives in countless ways. From automating repetitive tasks to revolutionizing entire industries, machine learning is a powerful tool that can help us achieve our goals and make the world a better place.

Conclusion:

In conclusion, machine learning is a powerful tool that allows systems to learn and improve from experience without being explicitly programmed. By finding patterns in data and using those patterns to make predictions or decisions, machine learning has the potential to revolutionize many fields, from healthcare to finance to transportation. Understanding the basics of machine learning is the first step to unlocking its full potential.

Monday, February 20, 2023

Understanding Word Embeddings: Mathematical Representations of Meaningful Words

Introduction

In the field of natural language processing, understanding the meaning and context of words is crucial for tasks such as sentiment analysis, language translation, and text generation. One powerful technique for representing words in a way that captures their meaning is through word embeddings.

What are Word Embeddings?

Word embeddings are mathematical representations of words in a high-dimensional space. These embeddings are learned from large amounts of text data and can be used to perform various NLP tasks with great accuracy. The most popular method for learning word embeddings is through the use of neural network models like Word2Vec and GloVe.

The Benefits of Word Embeddings

One of the key benefits of word embeddings is that they allow us to perform mathematical operations on words. For example, we can find the cosine similarity between two words, which tells us how similar the meanings of those words are. This can be incredibly useful for tasks like text classification, where we want to determine the topic of a given piece of text.

Similarly, There are several benefits of using word embeddings in natural language processing and machine learning applications:

Improved accuracy: Word embeddings capture the meaning and context of words, which can improve the accuracy of language processing tasks such as sentiment analysis, named entity recognition, and machine translation.
Reduced dimensionality: Traditional language processing techniques require large amounts of memory and processing power to represent and manipulate language data. Word embeddings reduce the dimensionality of language data by representing words as dense vectors, which can lead to more efficient and faster processing.
Transfer learning: Word embeddings can be pre-trained on large datasets and then used as input to other language processing tasks. This allows for transfer learning, where models can learn from pre-existing knowledge and then apply that knowledge to new tasks.
Semantic relationships: Word embeddings capture the semantic relationships between words, such as synonyms, antonyms, and analogies. This can be useful for tasks such as word sense disambiguation, where the meaning of a word must be determined based on context.
Multilingual support: Word embeddings can be trained on multilingual data, allowing for language processing tasks across multiple languages. This can be useful for applications such as machine translation or sentiment analysis on social media data from multiple countries.

Example of Word Embedding Applications

Another important aspect of word embeddings is that they can be used to understand the relationships between words. For example, using embeddings, we can find the words that are most similar to a given word, or we can find the analogy between words. For instance, if we know that “king” is to “queen” as “man” is to “woman”, we can find the embedding of “king” — “man” + “woman” will be close to the embedding of “queen”.

There are many applications of word embeddings in natural language processing and machine learning. Here are some examples:

Sentiment Analysis: Word embeddings can be used to analyze the sentiment of text data, such as product reviews or social media posts. By representing words as vectors, machine learning models can identify words with positive or negative connotations and use that information to predict the overall sentiment of the text.
Named Entity Recognition: Word embeddings can be used to identify named entities, such as people, places, and organizations, in text data. By training a machine learning model on annotated data, the model can learn to recognize patterns in the text and identify named entities more accurately.
Machine Translation: Word embeddings can be used to improve the accuracy of machine translation systems. By representing words as vectors, the model can better capture the meaning and context of words in the source language and use that information to generate more accurate translations in the target language.
Information Retrieval: Word embeddings can be used to improve the performance of search engines and information retrieval systems. By representing queries and documents as vectors, the model can more accurately match queries with relevant documents, improving the relevance of search results.
Chatbots: Word embeddings can be used to improve the performance of chatbots by enabling them to understand the meaning and context of user input. By representing user input and the chatbot's responses as vectors, the model can learn to generate more accurate and relevant responses to user queries.

Conclusion

In conclusion, word embeddings are a powerful tool in the field of natural language processing, and they have been used to achieve state-of-the-art results in various NLP tasks. They allow us to understand the meaning and context of words in a mathematical way, and they can be used to perform various operations on words and understand the relationships between them. As the field of NLP continues to evolve, we can expect to see even more exciting applications of word embeddings in the future.

Friday, February 17, 2023

Dynamic drop down list with jQuery Ajax PHP MySQL

In this story, we are going to make dynamic drop down for Country , State and City selection using jQuery, Ajax, PHP and MySQL.

STEP 01: Set up Database Tables

First of all we make three tables. First table is for Countries in Database: `Dropdown`. Second Table is for States , which has Country ID as Foreign key. Third Table is for Cities of Selected States which has State ID as Foreign key. SQL Queries for creating tables are as follows:

— — — — — — — — — — — — — — — — — — — — — — — — — — — — —
— Table structure for table `country`
CREATE TABLE `country` (
`country_id` int(11) NOT NULL,
`country_code` char(3) NOT NULL DEFAULT ‘’,
`country` varchar(20) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

— Dumping data for table `country`

INSERT INTO `country` (`country_id`, `country_code`, `country`) VALUES
(1, ‘AUS’, ‘AUSTRALIA’), (2, ‘CAN’, ‘CANADA’), (3, ‘GBR’, ‘GREAT BRITAIN’), (4, ‘IND’, ‘INDIA’), (5, ‘PAK’, ‘PAKISTAN’), (6, ‘USA’, ‘USA’);

— — — — — — — — — — — — — — — — — — — — — — — — — — — — —
— Table structure for table `state`
CREATE TABLE `state` (
`state_id` int(3) NOT NULL,
`state` char(20) NOT NULL DEFAULT ‘’,
`country_code` char(3) NOT NULL DEFAULT ‘’
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

— Dumping data for table `state`
INSERT INTO `state` (`state_id`, `state`, `country_code`) VALUES
(1, ‘Maharashtra’, ‘IND’), (2, ‘Delhi’, ‘IND’), (3, ‘West Bengali’, ‘IND’), (4, ‘Tamil Nadu’, ‘IND’), (5, ‘Andhra Pradesh’, ‘IND’), (6, ‘Gujarat’, ‘IND’), (7, ‘Karnataka’, ‘IND’), (8, ‘Uttar Pradesh’, ‘IND’), (9, ‘Maharashtra’, ‘IND’), (11, ‘Sindh’, ‘PAK’), (12, ‘Punjab’, ‘PAK’), (13, ‘KPK’, ‘PAK’), (14, ‘Balochistan’, ‘PAK’), (15, ‘New York’, ‘USA’), (16, ‘California’, ‘USA’), (17, ‘Illinois’, ‘USA’), (18, ‘Texas’, ‘USA’), (19, ‘Pennsylvania’, ‘USA’), (20, ‘Arizona’, ‘USA’), (21, ‘California’, ‘USA’), (24, ‘Michigan’, ‘USA’), (25, ‘Qu?bec’, ‘CAN’), (26, ‘Alberta’, ‘CAN’), (27, ‘Ontario’, ‘CAN’), (29, ‘Manitoba’, ‘CAN’), (33, ‘British Columbia’, ‘CAN’), (35, ‘England’, ‘GBR’), (37, ‘Scotland’, ‘GBR’), (44, ‘Wales’, ‘GBR’), (46, ‘Victoria’, ‘AUS’), (48, ‘West Australia’, ‘AUS’), (49, ‘South Australia’, ‘AUS’), (50, ‘Capital Region’, ‘AUS’), (51, ‘Queensland’, ‘AUS’);

— — — — — — — — — — — — — — — — — — — — — — — — — — — — —
— Table structure for table `city`
CREATE TABLE `city` (
`city_id` int(4) NOT NULL,
`state_id` int(3) NOT NULL DEFAULT 0,
`city` char(35) NOT NULL DEFAULT ‘’
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

— Dumping data for table `city`
INSERT INTO `city` (`city_id`, `state_id`, `city`) VALUES
(1, 1, ‘Mumbai (Bombay)’), (2, 1, ‘Nagpur’), (3, 2, ‘Delhi’), (4, 3, ‘Calcutta [Kolkata]’), (5, 4, ‘Chennai (Madras)’), (6, 5, ‘Hyderabad’), (7, 6, ‘Ahmedabad’), (8, 7, ‘Bangalore’), (9, 8, ‘Kanpur’), (10, 8, ‘Lucknow’), (11, 9, ‘Mumbai (Bombay)’), (12, 9, ‘Nagpur’), (15, 11, ‘Karachi’), (16, 12, ‘Lahore’), (17, 13, ‘Peshawar’), (18, 14, ‘Quetta’), (19, 15, ‘New York’), (20, 16, ‘Los Angeles’), (21, 16, ‘San Diego’), (22, 17, ‘Chicago’), (23, 18, ‘Houston’), (24, 18, ‘Dallas’), (25, 18, ‘San Antonio’), (26, 19, ‘Philadelphia’), (27, 20, ‘Phoenix’), (28, 21, ‘Los Angeles’), (29, 21, ‘San Diego’), (36, 24, ‘Detroit’), (37, 25, ‘Montr?al’), (38, 26, ‘Calgary’), (39, 26, ‘Edmonton’), (40, 27, ‘Toronto’), (41, 27, ‘North York’), (42, 27, ‘Mississauga’), (43, 27, ‘Scarborough’), (44, 27, ‘Etobicoke’), (50, 29, ‘Winnipeg’), (63, 33, ‘Vancouver’), (69, 35, ‘London’), (70, 35, ‘Birmingham’), (71, 35, ‘Liverpool’), (72, 35, ‘Sheffield’), (73, 35, ‘Manchester’), (74, 35, ‘Leeds’), (75, 35, ‘Bristol’), (83, 37, ‘Glasgow’), (84, 37, ‘Edinburgh’), (122, 44, ‘Cardiff’), (127, 46, ‘Melbourne’), (130, 48, ‘Perth’), (131, 49, ‘Adelaide’), (132, 50, ‘Canberra’), (133, 51, ‘Brisbane’), (134, 51, ‘Gold Coast’);

— — — — — — — — — — — — — — — — — — — — — — — — — — — — —
— Indexes for table `city`
ALTER TABLE `city` ADD PRIMARY KEY (`city_id`);

— Indexes for table `country`
ALTER TABLE `country` ADD PRIMARY KEY (`country_id`), ADD UNIQUE KEY `country_code` (`country_code`);

— Indexes for table `state`
ALTER TABLE `state` ADD PRIMARY KEY (`state_id`);

— AUTO_INCREMENT for table `city`
ALTER TABLE `city` MODIFY `city_id` int(4) NOT NULL AUTO_INCREMENT, AUTO_INCREMENT=147;

— AUTO_INCREMENT for table `country`
ALTER TABLE `country` MODIFY `country_id` int(11) NOT NULL AUTO_INCREMENT, AUTO_INCREMENT=7;

— AUTO_INCREMENT for table `state`
ALTER TABLE `state` MODIFY `state_id` int(3) NOT NULL AUTO_INCREMENT, AUTO_INCREMENT=55;
COMMIT;

STEP 02: Set up Database Connection

Create ‘config.php’ file in your Project Directory and paste following code to create Database Connection for your Drop down tables. Don’t forget to edit the Database, Host and User Names or Password as per your requirements.

<?Php
$host_name = “localhost”;
$database = “dropdown”; // Change your database nae
$username = “root”; // Your database user id
$password = “”; // Your password
try {
$dbo = new PDO(‘mysql:host=’.$host_name.’;dbname=’.$database, $username, $password);
} catch (PDOException $e) {
print “Error!: “ . $e->getMessage() . “<br/>”;
die(); }
?>

STEP 03: Create View / Index File

Now, Let’s create basic front-end of our Drop-down list. Create ‘index.php’ file in your Project Directory Containing following code.

<!DOCTYPE html>
<html> <head>
<meta name=”viewport” content=”width=device-width, initial-scale=1">
</head> <body>

<select name=”Country” id=”Country”>
<?php
include “config.php”;
$query= “SELECT * from country”;
$stmt=$dbo->prepare($query);
$stmt->execute();
$rows=$stmt->fetchALL();
foreach($rows as $country){
echo “<option value=’”.$country[‘country_id’].”’>”.$country[‘country’].”</option>”;
} ?>
</select>

STEP 04: Create ‘ajax.js’ File -(jQuery Function)

Create ‘ajax.js’ file in your directory and paste following Function Code. In below Code First function fetches states on change of country selection whereas second function fetches cities whenever state is changed.

$(“#Country”).change(function(){
var country = $(‘#Country’).find(“:selected”).val();
alert(country);
$.ajax({
url: “backend-script.php”,
type: “POST”, //Post method
data: {id : country}, //id for data fetching in backend-script
success: function(html){
$(“#state”).html(html);
alert(html);
}
});
});
// # is used for selecting id’s
$(“#state”).change(function(){
var state = $(‘#state’).find(“:selected”).val();
alert(state);
$.ajax({
url: “backend-script.php”,
type: “POST”, //Post method
data: {id1 : state}, //id for data fetching in backend-script
success: function(html){
$(“#city”).html(html);
alert(html);
}
});
});

STEP 04: Create Back-end Script to fetch States or Cities

Create a file “backend-script.php” in your project directory and paste following code. This Code fetches States or Cities data according to selected IDs.

<?php
include(“config.php”);
if(isset($_POST[“id”])){
$var1=$_POST[“id”]; //For getting data from ajax
$query1=”SELECT * from state1, country WHERE state1.country_id=country.country_id and country.country_id=$var1";
$stmt1=$dbo->prepare($query1);
$stmt1->execute();
$rows=$stmt1->fetchALL();
foreach($rows as $state1){
echo “<option value=’”.$state1[‘state_id’].”’>”.$state1[‘state’].”</option>”;
} }
if(isset($_POST[“id1”])){
$var2=$_POST[“id1”];
$query2=”SELECT * from state1, city WHERE city.state_id=state1.state_id and state1.state_id=$var2";
$stmt2=$dbo->prepare($query2);
$stmt2->execute();
$rows=$stmt2->fetchALL();
foreach($rows as $city){
echo “<option value=’”.$city[‘city_id’].”’>”.$city[‘city’].”</option>”;
} }
?>

“You can, you should, and if you’re brave enough to start, you will.”

Impact of Social Media On Postgraduate Students and Young Working Personals

This Survey was conducted on students and personals from Pakistan only.

INTRODUCTION

This study aimed at finding the trend of Average Screen Time Users spend on social media and how much it affects their overall well being and time management capability. For this purpose, a sample of Postgraduate students and Young Working personals aged between 20 -37 was taken. A questionnaire survey was administered to get the response from them. The questionnaire survey contained both open and close-ended questions. The responses show the statistics of screen time of people.

Screen Time usually refers to the time a person spends using digital screens like monitors, laptops, mobile phones and multimedia etc. Whereas, Social Media is a term used to define applications and sites that allow users to share their thoughts , ideas and information including all through virtual networks and communities. Social media is internet-based and gives users quick electronic communication of content, such as personal information, documents, videos, and photos.

BACKGROUND

Being in the age of technology everybody is accustomed to using digital devices to get their usual work done. Covid-19 has given rise to the use of digital devices and average screen time of users, especially young students and working personnel.

In a reidhealth.org Blog about ‘healthy amount of screen time for adults’, It is mentioned that adults should limit screen time outside of work to less than two hours per day. Any time beyond that which you would typically spend on screens should instead be spent participating in physical activity.

Those spending six hours or more per day watching screens had a higher risk for depression. Screen time more than 8.5–10 hours causes Insomnia and Poor Sleep, Eye Strain and Headaches, Neck, Shoulder and Back Pain, Changes in Cognition etc.

https://www.reidhealth.org › blog › screen-time-for-adults

OBJECTIVES OF THE STUDY

Keeping the gap in assessing the use of social media among young students/professionals in view, this study aims at:

To find out the screen time and usage of social media among target audience.
To explore their views about impacts that social media usage and screen time creates on their well-being and everyday time management.

MATERIALS

Google Forms : The study is based on a questionnaire survey. We used Google Forms to make questionnaires containing both open and close-ended questions to distribute among the students.
Ms Excel : It was used to clean the data and perform multiple regression test
Minitab : Gathered responses were analyzed with the help of Minitab Software. Mini-tab was used to perform normality , proportion and mean tests on data where required.

QUESTIONNAIRE

Following are the relevant set of questions we asked from our target audience to conduct our study.

1 — Name

2 — Age

3 — Gender

4 — What is your average screen time per day (including study and work screen time)?

5 — How much free time do you get each day?

6 — How much time do you spend on social media per day?

7 — Average time on social media

8 — Which device do you usually use to access social media sites?

9 — Which social media app consumes most of your time?

10 — Do you think social media sites bring a bad impact on your overall well-being and everyday time management?

11 — How much do you think social media invades your privacy? Rate 1–5.

DATASET

Survey was distributed to 1000 people and 600 responses were received back from users aged between 21–37. Responses data set images are attached below.

Responses show that 300 persons which made up to 50% of the data set are not enrolled in masters whereas the remaining persons are enrolled in different postgrad disciplines.

Dataset also shows that 50% of the responses received were from Female Postgrad Students and Working Personals and 50% were from Male Postgrad Students and Working Personals.

AVERAGE DAILY SCREEN TIME OF USERS

Dataset shows that 46.6% of users have an average screen time of more than 10 hours per day. Whereas 31.7% of users have daily screen time between 5–10 hours and only 21% of users have less than 5 hours of average daily Screen time.

To check the significance of our null hypothesis ‘Screen Time > 10 hours for 46% users’, we performed a one-proportion test. The test shows a p-value > 0.05 which means we can not reject the hypothesis that almost 46% of users have more than 10 hours of average daily screen time. Whereas, the hypothesis that less than 46% of users have Screen time >10 hours will be rejected.

AVERAGE DAILY FREE TIME OF USERS

Dataset shows that 72% users have average free time of less than 5 hours per day. Whereas 25% users have daily free time between 5–10 hours and only 3% users have more than 10 hours of average free time per day.

To check the significance of our null hypothesis ‘Free Time < 5 hours for 70% users’, we performed a one-proportion test. The test shows p-value > 0.05 which means we can not reject the hypothesis that almost 70% of users have less than 5 hours of average daily free time.

AVERAGE TIME USERS SPEND ON SOCIAL MEDIA PER DAY

Users Responses show that 56.7% users spend less than 2 hours on Social Media per day. Whereas 26.7% users daily spend 2–5 hours on Social Media and only 6.7% users spend more than 5 hours on social media per day.

Normality Test

Since in the Normality test the P-Value is much less than 0.05, that indicates that our data is not distributed normally for responses received for Average time spent on social media.

To check the significance of our null hypothesis ‘Time Spent on Social Media > 2 hours for 35% users’, we performed a one-proportion test. The test shows p-value > 0.05 which means we can not reject the hypothesis that almost 35% of users have less than 5 hours of average daily free time.

MOSTLY USED DEVICE TO ACCESS SOCIAL MEDIA

Observing received responses we conclude that more than 85% of users mostly use Mobile phones to access social media.

MOSTLY USED SOCIAL MEDIA APPS

The study shows that the mostly used social media apps are Facebook, Whatsapp, Instagram. More than 90% of users responded that one of these 3 apps consumes the most of the time they spend using social media.

IMPACT OF SOCIAL MEDIA ON WELL BEING

Received responses shows that 56.7% users have experienced the bad impact of social media on their overall well being. Whereas 13.3% users disagree with the statement that ‘Social Media has a bad Impact on Users Overall Well Being’ and 30% users have neutral response.

Normality Test : Since the P-Value is much less than 0.005 that indicates that, our data is not distributed normally for responses received for Effects of social media on Overall Well-being of users.

Mean Test

To check the significance of our null hypothesis ‘Social Media has no Effect on Users overall well being’ vs alternative hypothesis ‘Social media has bad Impact on user overall well-being’, we performed a mean test. The test shows p-value = 0.00 which means our null hypothesis is rejected and we can not reject the hypothesis that the rate of effect of social media on users’ well-being is greater than 3 which means we can agree that Social media has the bad impact of users ‘ overall well being.

ANOVA Table

Using excel we applied Regression analysis on our data where we calculated the dependency of users overall well being on users total screen time, time they spend on social media and their age.

Received Model has R square = 0.898 and standard error of 0.7333 and significant f = 1.6 e-25 and all parameters have p-value of greater than 0.05.

CONCLUSION

The study concluded that more than 46% of users have screen time of more than 10 hours and users who spend more than 2 hours on social media are 35% . And Tests Shows That all users who spend more time on social media have higher risks of having a bad impact on their well- being and time management capabilities.

Friday, February 24, 2023

The importance of data cleaning in machine learning: Best practices for preparing data for model training

Why is data cleaning important?

1- Improved accuracy:

2- Better insights:

3- Reduced bias:

Best practices for data cleaning in machine learning:

1- Remove duplicates:

2- Handle missing values:

3- Remove irrelevant or redundant features:

4- Handle outliers:

5- Normalize or scale the data:

6- Balance the data:

Conclusion:

Basics of Machine Learning

What is machine learning?

Types of Machine Learning:

Supervised learning:

Unsupervised learning:

Reinforcement learning:

How does machine learning work?

Data collection:

Data preprocessing:

Model training:

Model evaluation:

Model deployment:

Why is Machine Learning Important?

Automation:

Personalization:

Predictive analytics:

Improved safety and security:

Innovation:

Conclusion:

Monday, February 20, 2023

Understanding Word Embeddings: Mathematical Representations of Meaningful Words

Introduction

What are Word Embeddings?

The Benefits of Word Embeddings

Example of Word Embedding Applications

Conclusion

Friday, February 17, 2023

Dynamic drop down list with jQuery Ajax PHP MySQL

STEP 01: Set up Database Tables

STEP 02: Set up Database Connection

STEP 03: Create View / Index File

STEP 04: Create ‘ajax.js’ File -(jQuery Function)

STEP 04: Create Back-end Script to fetch States or Cities

Impact of Social Media On Postgraduate Students and Young Working Personals

BACKGROUND

OBJECTIVES OF THE STUDY

MATERIALS

QUESTIONNAIRE

DATASET

AVERAGE DAILY SCREEN TIME OF USERS

AVERAGE DAILY FREE TIME OF USERS

AVERAGE TIME USERS SPEND ON SOCIAL MEDIA PER DAY

MOSTLY USED DEVICE TO ACCESS SOCIAL MEDIA

MOSTLY USED SOCIAL MEDIA APPS

IMPACT OF SOCIAL MEDIA ON WELL BEING

CONCLUSION