模型预测

  • predict_label (X)

预测X的类标签,输入数组或数组列表,返回数组或数组列表

  • 新的预测函数?

RNN是一个非常适合判断是否是XSS攻击的深度学习模型,通过predict_label验证样本,验证一条空样本平均耗时5.5us,但是验证一条有效样本耗时竟然达到1.56秒,可能和具体的样本有关系,反正现在看到的数据性能特别差,样本需要批量处理,否则严重影响性能。对于一个报文来说,需要将请求URL、请求体相关数据整理成样本数组,来提高模型验证速度。后续有空验证非攻击样本,看看攻击样本的验证速度和正常样本的验证速度关系。

验证样本从1000开始:

验证1条样本:1.56秒,平均一条耗时1.56s

验证10条验证:0.089秒,平均一条耗时8.9ms验证100条样本:0.247秒,平均一条耗时2.5ms验证1000条样本:2.08秒,平均一条耗时2.1ms

验证样本从8000开始:

验证1条样本:1.57秒,平均一条耗时1.57s

验证10条验证:0.0897秒,平均一条耗时9.0ms验证100条样本:0.249秒,平均一条耗时2.5ms验证1000条样本:2.053秒,平均一条耗时2.1ms

模型保存和载入

训练15万数据,真的需要花费一定时间,每次跑新样本,都要重新训练一遍,多浪费时间啊,怎么办呢?以前我的做法就是训练好了,返回模型。然后把训练样本,放一个while(1)循环中,我们只需要更新样本,程序就给我们样本的分类。这相当于热模型,程序结束了,就需要重新训练样本。如何能保存模型,冷启动,模型持久化,那岂不是美滋滋。如何实现呢?其实TFLearn自带了API,saveload,怎么用呢?其实很简单。

使用思路:

第一步:训练模型,保存模型

with tf.Graph().as_default():

#模型训练

model.save("./gut.tfl")

第二步:用第一步的模型框架,载入保存的模型

with tf.Graph().as_default():

#模型框架

model.load("./gut.tfl")

完整代码:

import sys
import urllib
import numpy as np
import tensorflow as tf
import tflearn
from tflearn.data_utils import to_categorical, pad_sequences
from sklearn.model_selection import train_test_split
import time
def elt(line):
x = []
for i, c in enumerate(line):
c = c.lower()

x.append(ord(c))
return x

def load_file(filename,label,ms=[],ns=[]):
with open(filename) as f:
for line in f:
line = line.strip(
)
line = urllib.unquote(line)
if len(line)<= 100:
m = elt(line)
if(label):
n = 1
else:
n = 0
ms.append(m)
ns.append(n)

def load_files(file1,file2):
xs = []
ys = []
load_file(file1,1,xs,ys)
load_file(file2,0,xs,ys)
return xs,ys

def train(x,y):
graph1 = tf.Graph()
with graph1.as_default():
x_train, x_test, y_train, y_test=train_test_split( x,y, test_size=0.4,random_state=0)

x_train = pad_sequences(x_train,maxlen=100,value=0.)
x_test = pad_sequences(x_test,maxlen=100,value=0.)
y_train = to_categorical(y_train, nb_classes=2)
y_test = to_categorical(y_test, nb_classes=2)

net = tflearn.input_data([None, 100])
net = tflearn.embedding(net, input_dim=256, output_dim=128)
net = tflearn.lstm(net, 128, dropout=0.8)
net = tflearn.fully_connected(net, 2, activation=softmax)
net = tflearn.regression(net, optimizer=adam, learning_rate=0.1,
loss=categorical_crossentropy)
model = tflearn.DNN(net, tensorboard_verbose=3)
model.fit(x_train, y_train,n_epoch=1, validation_set=(x_test, y_test), show_metric=True,
batch_size=200,run_id="gut")
print("---------befor-------")
print(model.predict_label(x_test[8:9]))
model.save("./gut.tfl")
return x_test

def gut(x_test):
graph2 = tf.Graph()
with graph2.as_default():
net = tflearn.input_data([None, 100])
net = tflearn.embedding(net, input_dim=256, output_dim=128)
net = tflearn.lstm(net, 128, dropout=0.8)
net = tflearn.fully_connected(net, 2, activation=softmax)
net = tflearn.regression(net, optimizer=adam, learning_rate=0.1,
loss=categorical_crossentropy)
model = tflearn.DNN(net, tensorboard_verbose=3)
print("...........after.....")
model.load("./gut.tfl")
print(model.predict_label(x_test[8:9]))

if __name__ == "__main__":
xs,ys = load_files(sys.argv[1],sys.argv[2])
x_test = train(xs,ys)
gut(x_test)

推荐阅读:

相关文章