RNN 学习笔记

NLP (Natural Language Processing)


os.path.join(data_path, "ptb.train.txt")


#对输入文本进行排序,首要字段 value(频率),次要字段 key,返回类型为字典(字符:_id)

train_data = _file_to_word_ids(train_path, word_to_id)
#将输入文本按照生成的字典映射为 id 序列

vocabulary = len(word_to_id)

with tf.name_scope(name, "PTBProducer", [raw_data, batch_size, num_steps]):
#定义命名空间,不同命名空间内的 Variable name 属性可以相同


raw_data = tf.convert_to_tensor(raw_data, name="raw_data", dtype=tf.int32)
#将 python 中的数据类型转换为 tensor 张量

batch_size = tf.size(raw_data)
#同一时间输入的数据量,为了充分利用 GPU 并行计算能力

data = tf.reshape(raw_data[0 : batch_size * batch_len,[batch_size, batch_len])

>>> a = ([2,2],[3,3])
([2, 2], [3, 3])
>>> b=numpy.reshape(b,(2,-1))#可以自动推断-1的值,至多出现一次
array([[2, 2],
       [3, 3]])
>>> b[0:1]#表示输出b[0]和b[1]
array([[2, 2]])
batch_len = data_len // batch_size
# 这里的data指所有训练样例
data = tf.reshape(raw_data[0 : batch_size * batch_len],
                      [batch_size, batch_len])
# TF仅支持定长输入,这里设定RNN网络的序列长度为num_steps,即 A 的数量
epoch_size = (batch_len - 1) // num_steps 
#batch_size 指每个每个 batch 中有多少 sentence 即行数
#num_steps 指样本序列长度,即输入字符次数,为 bathc 的列数
#epoch_size 指在一个周期内需要迭代多少 batch

i = tf.train.range_input_producer(NUM_EXPOCHES, num_epochs=1, shuffle=False).dequeue()

NUM_EXPOCHES 表示返回的 list 中只会出现 0..NUM_EXPOCHES-1,num_epochs 表示循环次数

NUM_EXPOCHES = 3 num_epochs = 2 输出为 0,1,2,0,1,2 若不指定 num_epochs 则可以产生无限的 batch

x = tf.slice(data, [0, i * num_steps], [batch_size, num_steps])
y = tf.slice(data, [0, i * num_steps + 1], [batch_size, num_steps])

silce 传入起始位置,和裁剪长度

[0,inum_steps]表示从第一行,第 inum_steps-1 个开始截取
前面已经将 data reshape 行数即为 batch_size

>>> x
[[1, 2, 3], [4, 5, 6]]
>>> out=tf.slice(x,[0,1],[2,1])
#只能切出矩形tf.slice(x,[0,1],[2,3]) error:3列超出范围
>>> sess.run(out)
       [5]], dtype=int32)

当tf.get_variable_scope().reuse == False,调用该函数会创建新的变量
当tf.get_variable_scope().reuse == True,调用该函数会重用已经创建的变量

with tf.variable_scope("foo"):
      v = tf.get_variable("v", [1])
with tf.variable_scope("foo", reuse=True):
      v1 = tf.get_variable("v", [1])
assert v1 is v


cell 中隐藏层神经元个数


使用One-hot 方法编码的向量会很高维也很稀疏。假设我们在做自然语言处理(NLP)中遇到了一个包含2000个词的字典,当时用One-hot编码时,每一个词会被一个包含2000个整数的向量来表示,其中1999个数字是0,要是我的字典再大一点的话这种方法的计算效率岂不是大打折扣?
训练神经网络的过程中,每个嵌入的向量都会得到更新。如果你看到了博客上面的图片你就会发现在多维空间中词与词之间有多少相似性,这使我们能可视化的了解词语之间的关系,不仅仅是词语,任何能通过嵌入层 Embedding 转换成向量的内容都可以这样做。

tf.nn.rnn_cell.DrououtWrapper(cell, input_keep_prob=1.0, output_keep_prob=1.0)第一个就是输入的循环神经网络的 cell,可以设定为BasicLSTMCell等等。第二个参数就是输入数据使用 dropout,1时不执行dropout。第三个参数为输出层的 dropout,输出,不同的循环层之间使用,或者全连接层,不会在同一层的循环体中使用。

cells = [tf.nn.rnn_cell.DropoutWrapper(cell, output_keep_prob=1.0-self.dropout) for cell in cells]

tf.nn.rnn_cell.MultiRNNCell([list RNNcell], state_is_tuple=True).这个函数里面主要这两个参数,第一个参数就是输入的RNN实例形成的列表,第二个参数就是让状态是一个元组,官方推荐就是用True。

 multi_cell = tf.nn.rnn_cell.MultiRNNCell(cells)

Softmax函数,或称归一化指数函数,是逻辑函数的一种推广。它能将一个含任意实数的K维向量 “压缩”到另一个K维实向量 中,使得每一个元素的范围都在 之间,并且所有元素的和为1。

tf.concat(concat_dim, values, name=’concat’)
第一个参数concat_dim:连接维度,concat_dim 是 0,那么在某一个shape的第一个维度上连,对应到实际,就是叠放到列上
可以这么理解:在维度 0 上将 shape 为(2,2)的两个张量 concat 结果为 shape 为(4,2)的张量
values 是一个 tensor 的 list 或者 tuple。

t1 = [[1, 2, 3], [4, 5, 6]]
t2 = [[7, 8, 9], [10, 11, 12]]
tf.concat(0, [t1, t2]) == > [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]

RNN 学习笔记》有3,190个想法

  1. MatBuct说:

    Dove Acquistare Cialis In Italia Achat Kamagra Securise [url=http://cialibuy.com]Buy Cialis[/url] Which Bacteria Does Cephalexin Treat Amoxil Otic Medicinale Levitra

  2. MatBuct说:

    Generico Cialis Safe Place To Buy Priligy Viagra Sicher Online Bestellen [url=http://cialibuy.com]generic cialis from india[/url] Purchase isotretinoin pills next day pharmacy amex Priligy Madrid

  3. Awamoub说:

    Success comes from failure essay short essay on liberty, expository essay on how to cook jollof rice essay sat format, university of chicago admission essay.
    buy a paper Whether your professor assigned a research paper, term paper, book report, essays, thesis papers, literature reviews or any other academic or non-academic writing tasks, knows exactly how to help you and how to make your essay be perfect.

  4. elosuh说:

    The bonus codes must then be entered when making a deposit, although it must be within the timeframe stated and there will be certain terms and conditions that need to be met to take advantage of each offer.
    casinospel These techniques, however, are impossible to apply in virtual games of blackjack, largely due to the fact the cards in the shoe or deck are reshuffled automatically after each hand.

  5. Nitomia说:

    Within academia, plagiarism by students, professors, or researchers is considered academic dishonesty or academic fraud, and offenders are subject to academic censure, up to and including expulsion. paper writing service On the other hand, if the testimonials and best online essay writing services reviews are negative, for example, the complaints about prices, customer service, or the quality of the work, you can reject the website and look for other options.

  6. Jamslulk说:

    After these three modules you can decide yourself if you want to continue the other 4 modules of the course, and can enroll in a later version of the complete course. write my paper for me The process of writing a research paper should comprise of investigating the evidence base to retrieve relevant research papers and accordingly based on the findings an appropriate methodology should be chosen to address the research topic.

  7. theanof说:

    Essayist day crossword sthree sakthi karanam essay in malayalam language, essay writing about lying, vidyarthi jeevan par essay in hindi, essay dr rajendra prasad hindi language. custom paper Fill in the type of paper you need, the topic or question to be handled, formatting rules that have to be followed, the volume of the work, and the deadline for delivery.

  8. adosern说:

    Factory, we make ordering custom coffee cups and soup containers simple, affordable and hassle-free for everyone — wholesale buyers, small business owners, wedding planners and special event organizers. write my paper for me It is necessary to provide background and analysis of the legal problem in a body paragraph to adequately analyze background and analysis of the legal question.

  9. payday说:

    Verlusts, als solche zu identifizieren, sie aufzuschreiben und zu versuchen, sie neutral oder gar positiv wiederzugeben. authorecourses Wir sehen uns nicht mehr so oft, was auch nicht mehr machbar ist, aber wir telefonieren oder schreiben uns noch sehr oft.