Skip to content
Snippets Groups Projects
Commit 98b3a7e0 authored by 지수's avatar 지수
Browse files

using DM_data.csv

parent 7d44809d
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
<a href="https://colab.research.google.com/github/lani009/IDS-DataMining/blob/main/%5BDM%5D_Naive_Bayes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
%% Cell type:code id: tags:
```
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import time
```
%% Cell type:code id: tags:
```
data = pd.read_csv('DM_data.csv')
data.info()
```
%% Output
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25192 entries, 0 to 25191
Data columns (total 40 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 duration 25192 non-null int64
1 protocol_type 25192 non-null int64
2 service 25192 non-null int64
3 flag 25192 non-null int64
4 src_bytes 25192 non-null int64
5 dst_bytes 25192 non-null int64
6 land 25192 non-null int64
7 wrong_fragment 25192 non-null int64
8 hot 25192 non-null int64
9 num_failed_logins 25192 non-null int64
10 logged_in 25192 non-null int64
11 num_compromised 25192 non-null int64
12 root_shell 25192 non-null int64
13 su_attempted 25192 non-null int64
14 num_root 25192 non-null int64
15 num_file_creations 25192 non-null int64
16 num_shells 25192 non-null int64
17 num_access_files 25192 non-null int64
18 is_guest_login 25192 non-null int64
19 count 25192 non-null int64
20 srv_count 25192 non-null int64
21 serror_rate 25192 non-null float64
22 srv_serror_rate 25192 non-null float64
23 rerror_rate 25192 non-null float64
24 srv_rerror_rate 25192 non-null float64
25 same_srv_rate 25192 non-null float64
26 diff_srv_rate 25192 non-null float64
27 srv_diff_host_rate 25192 non-null float64
28 dst_host_count 25192 non-null int64
29 dst_host_srv_count 25192 non-null int64
30 dst_host_same_srv_rate 25192 non-null float64
31 dst_host_diff_srv_rate 25192 non-null float64
32 dst_host_same_src_port_rate 25192 non-null float64
33 dst_host_srv_diff_host_rate 25192 non-null float64
34 dst_host_serror_rate 25192 non-null float64
35 dst_host_srv_serror_rate 25192 non-null float64
36 dst_host_rerror_rate 25192 non-null float64
37 dst_host_srv_rerror_rate 25192 non-null float64
38 class 25192 non-null int64
39 index_num 25192 non-null int64
dtypes: float64(15), int64(25)
memory usage: 7.7 MB
%% Cell type:code id: tags:
```
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
```
%% Cell type:code id: tags:
```
data_y = data["class"]
data_X = data.drop(columns = ["class","index_num"])
```
%% Cell type:code id: tags:
```
sc = MinMaxScaler()
_X = sc.fit_transform(data_X)
```
%% Cell type:code id: tags:
```
X_train, X_test, Y_train, Y_test = train_test_split(_X, data_y, test_size=0.33, random_state=42)
print(X_train.shape, X_test.shape)
print(Y_train.shape, Y_test.shape)
```
%% Output
(16878, 38) (8314, 38)
(16878,) (8314,)
%% Cell type:markdown id: tags:
## **Naive Bayes**
%% Cell type:code id: tags:
```
from sklearn.naive_bayes import GaussianNB
```
%% Cell type:code id: tags:
```
nb = GaussianNB()
```
%% Cell type:code id: tags:
```
start_time = time.time()
nb.fit(X_train, Y_train.values.ravel())
end_time = time.time()
print("Training time: ",end_time-start_time)
```
%% Output
Training time: 0.012809514999389648
%% Cell type:code id: tags:
```
start_time = time.time()
Y_test_pred = nb.predict(X_test)
end_time = time.time()
print("Testing time: ",end_time-start_time)
```
%% Output
Testing time: 0.012314796447753906
%% Cell type:code id: tags:
```
print("Train score is:", nb.score(X_train, Y_train))
print("Test score is:",nb.score(X_test,Y_test))
```
%% Output
Train score is: 0.8958407394241024
Test score is: 0.9030550878037046
%% Cell type:markdown id: tags:
NB's accuracy = 90.31%
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment