模型預測

  • predict_label (X)

預測X的類標籤,輸入數組或數組列表,返回數組或數組列表

  • 新的預測函數?

RNN是一個非常適合判斷是否是XSS攻擊的深度學習模型,通過predict_label驗證樣本,驗證一條空樣本平均耗時5.5us,但是驗證一條有效樣本耗時竟然達到1.56秒,可能和具體的樣本有關係,反正現在看到的數據性能特別差,樣本需要批量處理,否則嚴重影響性能。對於一個報文來說,需要將請求URL、請求體相關數據整理成樣本數組,來提高模型驗證速度。後續有空驗證非攻擊樣本,看看攻擊樣本的驗證速度和正常樣本的驗證速度關係。

驗證樣本從1000開始:

驗證1條樣本:1.56秒,平均一條耗時1.56s

驗證10條驗證:0.089秒,平均一條耗時8.9ms驗證100條樣本:0.247秒,平均一條耗時2.5ms驗證1000條樣本:2.08秒,平均一條耗時2.1ms

驗證樣本從8000開始:

驗證1條樣本:1.57秒,平均一條耗時1.57s

驗證10條驗證:0.0897秒,平均一條耗時9.0ms驗證100條樣本:0.249秒,平均一條耗時2.5ms驗證1000條樣本:2.053秒,平均一條耗時2.1ms

模型保存和載入

訓練15萬數據,真的需要花費一定時間,每次跑新樣本,都要重新訓練一遍,多浪費時間啊,怎麼辦呢?以前我的做法就是訓練好了,返回模型。然後把訓練樣本,放一個while(1)循環中,我們只需要更新樣本,程序就給我們樣本的分類。這相當於熱模型,程序結束了,就需要重新訓練樣本。如何能保存模型,冷啟動,模型持久化,那豈不是美滋滋。如何實現呢?其實TFLearn自帶了API,saveload,怎麼用呢?其實很簡單。

使用思路:

第一步:訓練模型,保存模型

with tf.Graph().as_default():

#模型訓練

model.save("./gut.tfl")

第二步:用第一步的模型框架,載入保存的模型

with tf.Graph().as_default():

#模型框架

model.load("./gut.tfl")

完整代碼:

import sys
import urllib
import numpy as np
import tensorflow as tf
import tflearn
from tflearn.data_utils import to_categorical, pad_sequences
from sklearn.model_selection import train_test_split
import time
def elt(line):
x = []
for i, c in enumerate(line):
c = c.lower()

x.append(ord(c))
return x

def load_file(filename,label,ms=[],ns=[]):
with open(filename) as f:
for line in f:
line = line.strip(
)
line = urllib.unquote(line)
if len(line)<= 100:
m = elt(line)
if(label):
n = 1
else:
n = 0
ms.append(m)
ns.append(n)

def load_files(file1,file2):
xs = []
ys = []
load_file(file1,1,xs,ys)
load_file(file2,0,xs,ys)
return xs,ys

def train(x,y):
graph1 = tf.Graph()
with graph1.as_default():
x_train, x_test, y_train, y_test=train_test_split( x,y, test_size=0.4,random_state=0)

x_train = pad_sequences(x_train,maxlen=100,value=0.)
x_test = pad_sequences(x_test,maxlen=100,value=0.)
y_train = to_categorical(y_train, nb_classes=2)
y_test = to_categorical(y_test, nb_classes=2)

net = tflearn.input_data([None, 100])
net = tflearn.embedding(net, input_dim=256, output_dim=128)
net = tflearn.lstm(net, 128, dropout=0.8)
net = tflearn.fully_connected(net, 2, activation=softmax)
net = tflearn.regression(net, optimizer=adam, learning_rate=0.1,
loss=categorical_crossentropy)
model = tflearn.DNN(net, tensorboard_verbose=3)
model.fit(x_train, y_train,n_epoch=1, validation_set=(x_test, y_test), show_metric=True,
batch_size=200,run_id="gut")
print("---------befor-------")
print(model.predict_label(x_test[8:9]))
model.save("./gut.tfl")
return x_test

def gut(x_test):
graph2 = tf.Graph()
with graph2.as_default():
net = tflearn.input_data([None, 100])
net = tflearn.embedding(net, input_dim=256, output_dim=128)
net = tflearn.lstm(net, 128, dropout=0.8)
net = tflearn.fully_connected(net, 2, activation=softmax)
net = tflearn.regression(net, optimizer=adam, learning_rate=0.1,
loss=categorical_crossentropy)
model = tflearn.DNN(net, tensorboard_verbose=3)
print("...........after.....")
model.load("./gut.tfl")
print(model.predict_label(x_test[8:9]))

if __name__ == "__main__":
xs,ys = load_files(sys.argv[1],sys.argv[2])
x_test = train(xs,ys)
gut(x_test)

推薦閱讀:

相关文章