博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
caffe源码阅读
阅读量:5051 次
发布时间:2019-06-12

本文共 25334 字,大约阅读时间需要 84 分钟。

caffe源码阅读

结构

主要两个目录

src: 包含源码实现
include: 头文件

src目录的架构,主要代码在caffe目录中,包含net.cpp, solver.cpp, blob.cpp, layer.cpp, blob.cpp, common.cpp, layers目录主要包含一些层,是caffe核心。proto中只有一个caffe.proto文件,里面使用protobuf语言描述了各种对象的成员变量, solvers主要提供不同的优化器,sgd, adam, rmsprop, adagrad,test目录包含一些单元测试用例, util常用工具函数:

├── caffe│   ├── layers│   ├── proto│   ├── solvers│   ├── test│   │   └── test_data│   └── util└── gtest

首先来看caffe目录下的几个cpp:

blob.cppcommon.cppdata_transformer.cppinternal_thread.cpplayer.cpplayer_factory.cppnet.cppparallel.cppsolver.cppsyncedmem.cpp

blob.cpp是caffe中主要的数据传输类型。

common.cpp

从tools出发

在根目录下有一个tools目录,主要用来编译一个caffe的可执行档,里面提供了caffe的一些可执行参数,通过配置参数来达到使用caffe的目的。

caffe.cppcompute_image_mean.cppconvert_imageset.cppdevice_query.cppextract_features.cppfinetune_net.cppnet_speed_benchmark.cpptest_net.cpptrain_net.cppupgrade_net_proto_binary.cppupgrade_net_proto_text.cppupgrade_solver_proto_text.cpp

main.cpp中分别注册了几个函数到g_brew_map中,分别是train, test, time, device_query。

首先来看train函数,使用一个solver_param对象来解析solver参数,

caffe::SolverParameter solver_param;  caffe::ReadSolverParamsFromTextFileOrDie(FLAGS_solver, &solver_param);

通过SolverRegistery::CreateSolver创建一个solver对象, solver对象有一个 shared_ptr<Net<Dtype> > net_成员变量:

shared_ptr
> solver(caffe::SolverRegistry
::CreateSolver(solver_param));

Net对象是整个网络的主体,那么一个Net究竟包含什么呢?最主要的是三个变量, layers_, params_, blobs_,如下:

template 
class Net {
private: vector
> > layers_; vector
> > params_; vector
> > blobs_;};

layers_是构成网络的基本组件; params_是每层的滤波器参数,这个变量和每层layerblobs_变量是共享数据的,即这边的params_存储的是layerblobs_的指针; blobs_是各层的中间数据。

Net构造函数接收一个NetParameter参数,只是调用了一下Init函数:

template 
Net
::Net(const NetParameter& param) { Init(param);}

NetParameter在caffe.proto的定义如下:

message NetParameter {  optional string name = 1;   repeated string input = 3;  repeated BlobShape input_shape = 8;  repeated int32 input_dim = 4;  optional bool force_backward = 5 [default = false];  optional NetState state = 6;  repeated LayerParameter layer = 100;  // ID 100 so layers are printed last.}message LayerParameter {  optional string name = 1; // the layer name  optional string type = 2; // the layer type  repeated string bottom = 3; // the name of each bottom blob  repeated string top = 4; // the name of each top blob  // The blobs containing the numeric parameters of the layer.  repeated BlobProto blobs = 7;  optional TransformationParameter transform_param = 100;}

NetParameter的核心是LayerParameter

LayerParamter(定义进行了简化)的核心是bottom名, top名, 以及参数blobs

这个NetParamter利用protobuftrain.prototxt, vgg.caffemodel进行读取初始化,然后去构造Net对象,有了Net整个网络也就搭建起来了。

之后可以调用solver->Solve();函数来开始整个网络的训练,而在Solve()函数中,则调用Step()函数,Step()函数主要用来进行每次的迭代,里面有个循环,每个循环是一次iter,每个iter进行iter_size次前向反向传播(FowardBackward()),并对这个batch的loss取平均更新优化器。

这里的iter_size参数是为了防止由于GPU内存不足导致无法使用较大的batch size带来的问题,因为它实际更新loss的迭代次数是iter_size * batch_size,这样就可以与使用较大的batch size是相同的结果。例如网络在batch_size = 128时取得较好的结果,但由于GPU内存不够,只够32张图片,那么可以将batch_size设为32,将iter_size设为4,取得的效果与batch_size = 128一样。

while (iter_ < stop_iter) {    // ...  	 Dtype loss = 0;    for (int i = 0; i < param_.iter_size(); ++i) {      loss += net_->ForwardBackward();    }    loss /= param_.iter_size();    // average the loss across iterations for smoothed reporting    UpdateSmoothedLoss(loss, start_iter, average_loss);    // ...    ApplyUpdate();    // ...  }

查看FowardBackward()实现如下,分别进行了Forward, Backward,并在前向传播时记录了loss:

Dtype ForwardBackward() {    Dtype loss;    Forward(&loss);    Backward();    return loss;  }

再看Foward(&loss)实现,调用了FowardFromTo(0, layers_.size() - 1)函数:

template 
const vector
*>& Net
::Forward(Dtype* loss) { if (loss != NULL) { *loss = ForwardFromTo(0, layers_.size() - 1); } else { ForwardFromTo(0, layers_.size() - 1); } return net_output_blobs_;}

FowardFromTo(0, layers_.szie()-1)遍历了每个层,使每个层分别调用Forward()函数,bottom_vecs_,top_vecs_的类型是vector<vector<Blob<Dtype>*> >,传入每层的类型是vector<Blob<Dtype>*>,这个vector表示层可能有多个输入或输出:

template 
Dtype Net
::ForwardFromTo(int start, int end) { Dtype loss = 0; for (int i = start; i <= end; ++i) { Dtype layer_loss = layers_[i]->Forward(bottom_vecs_[i], top_vecs_[i]); loss += layer_loss; } return loss;}

所以,以上的solver, net都是为了layer服务,核心的功能实现还是在layer当中,我们先来看卷积层(conv_layer.cpp)的Forward实现。

LayerFactory: 工厂模式

为了对layer有足够的理解,我们先来阅读与layer相关的对象。所有layer的基类是Layer,由于实现的类都是使用模板编程,如果没有静态地调用相关模板类,编译器是不会进行特化的。而我们的调用过程都是通过配置文件train.prototxt进行动态初始化相关的类,这样就会发现找不到这个类。为了避免这个问题,在类定义后面都进行一下声明,这样确保在使用的时候可以找到这个类,使用的是一个宏:

INSTANTIATE_CLASS(ConvolutionLayer);

宏的定义如下:

#define INSTANTIATE_CLASS(classname) \  char gInstantiationGuard##classname; \  template class classname
; \ template class classname

实际上就是声明了一下ConvolutionLayer<float>, ConvolutionLayer<double>

char gInstantiationGuardConvolutionLayer;   template class ConvolutionLayer
; template class ConvolutionLayer
;

除此之外,有那么多的Layer,caffe实现了一个工厂模型(layer_factory.cpp),将layer进行统一管理,也就是需要将所有Layer都注册到一个map,里面的key对应Layer名,value是生成相应的Layer函数,这样在使用的时候就可以根据类型实例化相应的Layer对象了。提供了两个宏定义:

#define REGISTER_LAYER_CREATOR(type, creator)                                  \  static LayerRegisterer
g_creator_f_##type(#type, creator
); \ static LayerRegisterer
g_creator_d_##type(#type, creator
) \#define REGISTER_LAYER_CLASS(type) \ template
\ shared_ptr
> Creator_##type##Layer(const LayerParameter& param) \ { \ return shared_ptr
>(new type##Layer
(param)); \ } \ REGISTER_LAYER_CREATOR(type, Creator_##type##Layer)

先看第一个宏,传入两个参数,一个是类型(Convolution),第二个是创建函数,如在layer_factory.cpp中有如下代码(进行了简化):

template 
shared_ptr
> GetConvolutionLayer(const LayerParameter& param) { // 简化... return shared_ptr
>(new ConvolutionLayer
(param)); // 简化...}REGISTER_LAYER_CREATOR(Convolution, GetConvolutionLayer);

那么宏翻译过来就是如下:

static LayerRegisterer
g_creator_f_Convolution("Convolution", creator
); static LayerRegisterer
g_creator_d_Convolution("Convolution", creator
) ;

所以我们再来看看LayerRegisterer这个类干了什么:

LayerRegistry
::AddCreator(type, creator);

调用了静态函数LayerRegistry<Dtype>::AddCreator,继续看:

class LayerRegistry {
public: static CreatorRegistry& Registry() { static CreatorRegistry* g_registry_ = new map
(); return *g_registry_; } static void AddCreator(const string& type, Creator creator) { CreatorRegistry& registry = Registry(); CHECK_EQ(registry.count(type), 0) << "Layer type " << type << " already registered."; registry[type] = creator; }}

可以看到维护了一个单例map类型对象g_registry_,这个对象存储了类型与对应的创建函数。

第二个宏,假如是这样调用REGISTER_LAYER_CLASS(Convolution),则可以翻译成下面的样子:

template 
shared_ptr
> Creator_ConvolutionLayer(const LayerParameter& param) { return shared_ptr
>(new ConvolutionLayer
(param)); } REGISTER_LAYER_CREATOR(type, Creator_ConvolutionLayer)

就是这个类不需要特殊创建,直接使用这个默认创建方法(Creator_ConvolutionLayer)就可以。而一些特殊的例子比如Convolution要进行其它的处理,所以要特殊写创建函数(GetConvolutionLayer),当然大多数层都可以直接调用这个默认的函数进行创建。

数据Blob

caffe中的数据的基本存储、操作对象就是Blob,还提供了CPU、GPU数据同步功能。

Blob的数据基本存储就是数组,是按照行存储的。
Blob主要存储了两个数据,data_, diff_,分别是数据与梯度。

blob是一个四维的数组。维度从高到低分别是:(num_,channels_,height_,width_)对于图像数据来说就是:图片个数,彩色通道个数,宽,高,比如说有10张图片,分别是512*256大小,彩色三通道,则为:(10,3,256,512)

template 
class Blob {
public: inline int num() const { return LegacyShape(0); } inline int channels() const { return LegacyShape(1); } inline int height() const { return LegacyShape(2); } inline int width() const { return LegacyShape(3); } inline const shared_ptr
& data() const { return data_; } inline const shared_ptr
& diff() const { return diff_; } void Update() { caffe_axpy
(count_, Dtype(-1), static_cast
(diff_->cpu_data()), static_cast
(data_->mutable_cpu_data())); }; // 数据更新,即减去当前计算出来的梯度 void FromProto(const BlobProto& proto, bool reshape = true); // 将数据进行反序列化,从磁盘导入之前存储的blob void ToProto(BlobProto* proto, bool write_diff = false) const; // 将数据进行序列化,便于存储 protected: shared_ptr
data_; shared_ptr
diff_; shared_ptr
shape_data_; vector
shape_; int count_; int capacity_; DISABLE_COPY_AND_ASSIGN(Blob);}; // class Blob

回到Layer

Layer基类的Forward方法,注意这并非是一个virtual方法,也就意味着它不希望子类对这个函数进行修改,即可以认为所有Layer都是使用的这个Forward函数,所以我们来看看具体的步骤:

template 
class Layer {
public: explicit Layer(const LayerParameter& param) : layer_param_(param) { phase_ = param.phase(); if (layer_param_.blobs_size() > 0) { blobs_.resize(layer_param_.blobs_size()); for (int i = 0; i < layer_param_.blobs_size(); ++i) { blobs_[i].reset(new Blob
()); blobs_[i]->FromProto(layer_param_.blobs(i)); } } } virtual ~Layer() {} void SetUp(const vector
*>& bottom, const vector
*>& top) { CheckBlobCounts(bottom, top); LayerSetUp(bottom, top); Reshape(bottom, top); SetLossWeights(top); } /** * @brief Does layer-specific setup: your layer should implement this function * as well as Reshape. * * @param bottom * the preshaped input blobs, whose data fields store the input data for * this layer * @param top * the allocated but unshaped output blobs * * This method should do one-time layer specific setup. This includes reading * and processing relevent parameters from the
layer_param_. * Setting up the shapes of top blobs and internal buffers should be done in *
Reshape, which will be called before the forward pass to * adjust the top blob sizes. */ virtual void LayerSetUp(const vector
*>& bottom, const vector
*>& top) {} /** * @brief Adjust the shapes of top blobs and internal buffers to accommodate * the shapes of the bottom blobs. * * @param bottom the input blobs, with the requested input shapes * @param top the top blobs, which should be reshaped as needed * * This method should reshape top blobs as needed according to the shapes * of the bottom (input) blobs, as well as reshaping any internal buffers * and making any other necessary adjustments so that the layer can * accommodate the bottom blobs. */ virtual void Reshape(const vector
*>& bottom, const vector
*>& top) = 0; /** * @brief Given the bottom blobs, compute the top blobs and the loss. * * @param bottom * the input blobs, whose data fields store the input data for this layer * @param top * the preshaped output blobs, whose data fields will store this layers' * outputs * \return The total loss from the layer. * * The Forward wrapper calls the relevant device wrapper function * (Forward_cpu or Forward_gpu) to compute the top blob values given the * bottom blobs. If the layer has any non-zero loss_weights, the wrapper * then computes and returns the loss. * * Your layer should implement Forward_cpu and (optionally) Forward_gpu. */ inline Dtype Forward(const vector
*>& bottom, const vector
*>& top); /** * @brief Given the top blob error gradients, compute the bottom blob error * gradients. * * @param top * the output blobs, whose diff fields store the gradient of the error * with respect to themselves * @param propagate_down * a vector with equal length to bottom, with each index indicating * whether to propagate the error gradients down to the bottom blob at * the corresponding index * @param bottom * the input blobs, whose diff fields will store the gradient of the error * with respect to themselves after Backward is run * * The Backward wrapper calls the relevant device wrapper function * (Backward_cpu or Backward_gpu) to compute the bottom blob diffs given the * top blob diffs. * * Your layer should implement Backward_cpu and (optionally) Backward_gpu. */ inline void Backward(const vector
*>& top, const vector
& propagate_down, const vector
*>& bottom); vector
> >& blobs() { return blobs_; } const LayerParameter& layer_param() const { return layer_param_; } protected: /** The protobuf that stores the layer parameters */ LayerParameter layer_param_; //层的参数: 卷积核大小,步长 Phase phase_; /** The vector that stores the learnable parameters as a set of blobs. */ vector
> > blobs_; //滤波器参数 vector
param_propagate_down_; vector
loss_; virtual void Forward_cpu(const vector
*>& bottom, const vector
*>& top) = 0; virtual void Forward_gpu(const vector
*>& bottom, const vector
*>& top) { // LOG(WARNING) << "Using CPU code as backup."; return Forward_cpu(bottom, top); } virtual void Backward_cpu(const vector
*>& top, const vector
& propagate_down, const vector
*>& bottom) = 0; virtual void Backward_gpu(const vector
*>& top, const vector
& propagate_down, const vector
*>& bottom) { // LOG(WARNING) << "Using CPU code as backup."; Backward_cpu(top, propagate_down, bottom); } private: DISABLE_COPY_AND_ASSIGN(Layer);};template
inline Dtype Layer
::Forward(const vector
*>& bottom, const vector
*>& top) { Dtype loss = 0; Reshape(bottom, top); switch (Caffe::mode()) { case Caffe::CPU: Forward_cpu(bottom, top); for (int top_id = 0; top_id < top.size(); ++top_id) { if (!this->loss(top_id)) { continue; } const int count = top[top_id]->count(); const Dtype* data = top[top_id]->cpu_data(); const Dtype* loss_weights = top[top_id]->cpu_diff(); loss += caffe_cpu_dot(count, data, loss_weights); } break; case Caffe::GPU: Forward_gpu(bottom, top);#ifndef CPU_ONLY for (int top_id = 0; top_id < top.size(); ++top_id) { if (!this->loss(top_id)) { continue; } const int count = top[top_id]->count(); const Dtype* data = top[top_id]->gpu_data(); const Dtype* loss_weights = top[top_id]->gpu_diff(); Dtype blob_loss = 0; caffe_gpu_dot(count, data, loss_weights, &blob_loss); loss += blob_loss; }#endif break; default: LOG(FATAL) << "Unknown caffe mode."; } return loss;}

在Layer中比较重要的几个函数,Setup, LayerSetup, Reshape, Forward, BackWard, Forward_cpu, Forward_gpu, Backward_cpu, Backward_gpu

  1. Reshape, Forward_cpu, Backward_cpu函数是纯虚函数,子类一定要对其进行实现;
  2. LayerSetup,Forward_gpu, Backward_gpu是虚函数,可以根据需要进行重写。
  3. Setup, Forward, BackWard是普通函数,不要重写;

由于卷积也有许多种,所以在中间加了BaseConvolutionLayer类,做为所有卷积类的基类。实现了如下函数,并将Reshape函数由纯虚函数变为了虚函数:

LayerSetUp(const vector
*>& bottom, const vector
*>& top)Reshape(const vector
*>& bottom, const vector
*>& top)forward_cpu_gemm(const Dtype* input, const Dtype* weights, Dtype* output, bool skip_im2col)forward_cpu_bias(Dtype* output, const Dtype* bias)backward_cpu_gemm(const Dtype* output, const Dtype* weights, Dtype* input)weight_cpu_gemm(const Dtype* input, const Dtype* output, Dtype* weights)backward_cpu_bias(Dtype* bias, const Dtype* input)forward_gpu_gemm(const Dtype* input, const Dtype* weights, Dtype* output, bool skip_im2col)forward_gpu_bias(Dtype* output, const Dtype* bias)backward_gpu_gemm(const Dtype* output, const Dtype* weights, Dtype* input)weight_gpu_gemm(const Dtype* input, const Dtype* output, Dtype* weights)backward_gpu_bias(Dtype* bias, const Dtype* input)

ConvolutionLayer继承BaseConvolutionLayer,实现了如下函数:

Forward_cpu(const vector
*>& bottom, const vector
*>& top)Backward_cpu(const vector
*>& top, const vector
& propagate_down, const vector
*>& bottom)

在Layer的Forward函数中,首先调用Reshape函数,这时调用的是BaseConvolutionLayer::Reshape函数,caffe的数据组织类型为Blob,在输入(bottom)大小已知,卷积参数已知的情况下,是可以计算输出(top)的Blob的shape,如下:

// Shape the tops.bottom_shape_ = &bottom[0]->shape();compute_output_shape();vector
top_shape(bottom[0]->shape().begin(), bottom[0]->shape().begin() + channel_axis_);top_shape.push_back(num_output_);for (int i = 0; i < num_spatial_axes_; ++i) { top_shape.push_back(output_shape_[i]);}for (int top_id = 0; top_id < top.size(); ++top_id) { top[top_id]->Reshape(top_shape);}

里面对每个输出top[i]调用了其成员函数Reshape,Blob的Reshape函数如下:

template 
void Blob
::Reshape(const vector
& shape) { CHECK_LE(shape.size(), kMaxBlobAxes); count_ = 1; shape_.resize(shape.size()); if (!shape_data_ || shape_data_->size() < shape.size() * sizeof(int)) { shape_data_.reset(new SyncedMemory(shape.size() * sizeof(int))); } int* shape_data = static_cast
(shape_data_->mutable_cpu_data()); for (int i = 0; i < shape.size(); ++i) { CHECK_GE(shape[i], 0); if (count_ != 0) { CHECK_LE(shape[i], INT_MAX / count_) << "blob size exceeds INT_MAX"; } count_ *= shape[i]; shape_[i] = shape[i]; //拷到内部 shape_data[i] = shape[i]; } if (count_ > capacity_) { //内存不够 capacity_ = count_; data_.reset(new SyncedMemory(capacity_ * sizeof(Dtype))); //重新申请 diff_.reset(new SyncedMemory(capacity_ * sizeof(Dtype))); }}

其实就是将传入的shape复制到Blob的内部变量shape_中,并判断内存是否满足要求,不满足要求的话重新申请内存。

前向传播这里我们分析cpu的情况,Reshape之后是Forward_cpu,现在调用的是ConvolutionLayer::Forward_cpu函数:

template 
void ConvolutionLayer
::Forward_cpu(const vector
*>& bottom, const vector
*>& top) { const Dtype* weight = this->blobs_[0]->cpu_data(); for (int i = 0; i < bottom.size(); ++i) { const Dtype* bottom_data = bottom[i]->cpu_data(); Dtype* top_data = top[i]->mutable_cpu_data(); for (int n = 0; n < this->num_; ++n) { this->forward_cpu_gemm(bottom_data + n * this->bottom_dim_, weight, top_data + n * this->top_dim_); if (this->bias_term_) { const Dtype* bias = this->blobs_[1]->cpu_data(); this->forward_cpu_bias(top_data + n * this->top_dim_, bias); } } }}

代码主要是对于每个bottom、top,要做num_(batch_size)次矩阵乘法(forward_cpu_gemm),将bottom_data与weight相乘,结果保存到top_data中,这里mutable_cpu_data表示要对这个地址进行写数据,具体地矩阵乘法:

template 
void BaseConvolutionLayer
::forward_cpu_gemm(const Dtype* input, const Dtype* weights, Dtype* output, bool skip_im2col) { const Dtype* col_buff = input; if (!is_1x1_) { if (!skip_im2col) { conv_im2col_cpu(input, col_buffer_.mutable_cpu_data()); } col_buff = col_buffer_.cpu_data(); } for (int g = 0; g < group_; ++g) { caffe_cpu_gemm
(CblasNoTrans, CblasNoTrans, conv_out_channels_ / group_, conv_out_spatial_dim_, kernel_dim_, (Dtype)1., weights + weight_offset_ * g, col_buff + col_offset_ * g, (Dtype)0., output + output_offset_ * g); }}

这里有conv_im2col_cpu函数。如果我们不进行转换,我们需要循环进行多次矩阵乘法,这里使用这个函数将每个patch(kxkxC)拉直,然后将这些patch堆在一起,这样就可以只进行一次卷积就可以求出所有结果,caffe_cpu_gemm就是封装的cblas的矩阵乘法ouput = weights * col_buff

im2col
im2col

再回到Forward函数中,做完Forward_cpu后,会遍历所有层判断是否是loss层,如果是则根据cpu_diff()计算loss:

inline Dtype Layer
::Forward(const vector
*>& bottom, const vector
*>& top) { Dtype loss = 0; Reshape(bottom, top); Forward_cpu(bottom, top); for (int top_id = 0; top_id < top.size(); ++top_id) { if (!this->loss(top_id)) { continue; } const int count = top[top_id]->count(); const Dtype* data = top[top_id]->cpu_data(); const Dtype* loss_weights = top[top_id]->cpu_diff(); loss += caffe_cpu_dot(count, data, loss_weights); }}

这样Forward函数就结束了,下面开始进入Backward函数,直接来看LayerBackward函数,如下:

template 
inline void Layer
::Backward(const vector
*>& top, const vector
& propagate_down, const vector
*>& bottom) { switch (Caffe::mode()) { case Caffe::CPU: Backward_cpu(top, propagate_down, bottom); break; case Caffe::GPU: Backward_gpu(top, propagate_down, bottom); break; default: LOG(FATAL) << "Unknown caffe mode."; }}

里面直接调用Backward_cpu函数,来看ConvolutionLayerBackward_cpu函数,如下:

template 
void ConvolutionLayer
::Backward_cpu(const vector
*>& top, const vector
& propagate_down, const vector
*>& bottom) { const Dtype* weight = this->blobs_[0]->cpu_data(); Dtype* weight_diff = this->blobs_[0]->mutable_cpu_diff(); for (int i = 0; i < top.size(); ++i) { const Dtype* top_diff = top[i]->cpu_diff(); const Dtype* bottom_data = bottom[i]->cpu_data(); Dtype* bottom_diff = bottom[i]->mutable_cpu_diff(); // Bias gradient, if necessary. if (this->bias_term_ && this->param_propagate_down_[1]) { Dtype* bias_diff = this->blobs_[1]->mutable_cpu_diff(); for (int n = 0; n < this->num_; ++n) { this->backward_cpu_bias(bias_diff, top_diff + n * this->top_dim_); } } if (this->param_propagate_down_[0] || propagate_down[i]) { for (int n = 0; n < this->num_; ++n) { // gradient w.r.t. weight. Note that we will accumulate diffs. if (this->param_propagate_down_[0]) { this->weight_cpu_gemm(bottom_data + n * this->bottom_dim_, top_diff + n * this->top_dim_, weight_diff); } // gradient w.r.t. bottom data, if necessary. if (propagate_down[i]) { this->backward_cpu_gemm(top_diff + n * this->top_dim_, weight, bottom_diff + n * this->bottom_dim_); } } } }}

里面根据top_diff分别更新了当前层的weight_diff(weight_cpu_gemm),和bottom_diff(backward_cpu_gemm)(计算bottom_diff实际上是为了weight_diff)。

那么Backward也结束了,它分别计算了各层的权重参数的梯度(weight_diff)、以及各层blob的梯度(bottom_diff)。

再回到solver.Solver函数中,发现下面是执行ApplyUpdate()函数,才是真正更新参数的时候,solver.ApplyUpdate()实际上调用了Net.Update()函数,如下:

template 
void Net
::Update() { for (int i = 0; i < learnable_params_.size(); ++i) { learnable_params_[i]->Update(); }}

这里的learnable_params_实际上就是每层可训练的参数,也就是每层的权重参数Blob,我们之前更新了这些Blob里的diff值,那我们再继续看看Blob.Update()函数里做了什么:

void Blob
::Update() { // We will perform update based on where the data is located. switch (data_->head()) { case SyncedMemory::HEAD_AT_CPU: // perform computation on CPU caffe_axpy
(count_, Dtype(-1), static_cast
(diff_->cpu_data()), static_cast
(data_->mutable_cpu_data())); break; //... }}

主要是做了如下的计算data_ = data_ - diff_caffe_axpy实际上是封装了cblas的函数,主要做两个函数相加,由于传入的系数是Dtype(-1),所以是进行了相减更新data_,至此,每层的权重参数都得到了更新,那么一次迭代更新也就结束了。下面就是多次调用这个过程,直到训练得到一个较好的权重参数。

test阶段

测试test阶段,不需要solver,直接使用Net进行Forward就可以得到结果:

Net
caffe_net(FLAGS_model, caffe::TEST, FLAGS_level, &stages);const vector
*>& result = caffe_net.Forward(&iter_loss);

pycaffe

首先有一个_caffe.cpp文件,里面将所有caffe框架编译成一个_caffe.so,而pycaffe.py相当于一个wrapper,封装了一些python接口。pycaffe中可以将_caffe.so中的对象import进来,当作python对象使用,如下:

from ._caffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, \        RMSPropSolver, AdaDeltaSolver, AdamSolver, NCCL, Timer

之所以可以导入直接使用,这是因为在_caffe.cpp中使用BOOST_PYTHON_MODULE进行了导出:

BOOST_PYTHON_MODULE(_caffe) {...}

如下是导出一个类的方法:

#include
#include
using namespace std;using namespace boost::python;struct World{
void set(string msg) { this->msg = msg; } string greet() { return msg; } string msg;};BOOST_PYTHON_MODULE(hello) //导出的module 名字{ class_
("World") .def("greet", &World::greet) .def("set", &World::set);}

如下是python中调用导出的方法:

import hello planet = hello.World() # 调用默认构造函数,产生类对象planet.set("howdy")   # 调用对象的方法print planet.greet() # 调用对象的方法

如果不想导出任何构造函数,则使用no_init:

class_
("Abstract",no_init)

最后,caffe目录中提供了一个__init__.py文件,将整个caffe目录变成一个python包:

from .pycaffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, RMSPropSolver, AdaDeltaSolver, AdamSolver, NCCL, Timerfrom ._caffe import init_log, log, set_mode_cpu, set_mode_gpu, set_device, Layer, get_solver, layer_type_list, set_random_seed, solver_count, set_solver_count, solver_rank, set_solver_rank, set_multiprocess, has_ncclfrom ._caffe import __version__from .proto.caffe_pb2 import TRAIN, TESTfrom .classifier import Classifierfrom .detector import Detectorfrom . import iofrom .net_spec import layers, params, NetSpec, to_proto

这样,外面就可以使用caffe.Net, caffe.init_log, caffe.__version__, caffe.TRAIN, caffe.Classifier caffe.Detector caffe.io...去使用caffe的Python接口了。

Reference

转载于:https://www.cnblogs.com/gr-nick/p/9379334.html

你可能感兴趣的文章
HEVC播放器出炉,迅雷看看支持H.265
查看>>
[置顶] Android仿人人客户端(v5.7.1)——人人授权访问界面
查看>>
Eclipse 调试的时候Tomcat报错启动不了
查看>>
【安卓5】高级控件——拖动条SeekBar
查看>>
ES6内置方法find 和 filter的区别在哪
查看>>
Android入门之文件系统操作(二)文件操作相关指令
查看>>
Android实现 ScrollView + ListView无滚动条滚动
查看>>
Swift 中的指针使用
查看>>
Swift - 使用闭包筛选过滤数据元素
查看>>
alue of type java.lang.String cannot be converted to JSONObject
查看>>
搜索引擎选择: Elasticsearch与Solr
查看>>
JAVA设计模式之简单工厂模式与工厂方法模式
查看>>
③面向对象程序设计——封装
查看>>
【19】AngularJS 应用
查看>>
Spring
查看>>
Linux 系统的/var目录
查看>>
Redis学习---Redis操作之其他操作
查看>>
WebService中的DataSet序列化使用
查看>>
BZOJ 1200 木梳
查看>>
【Linux】【C语言】菜鸟学习日志(一) 一步一步学习在Linxu下测试程序的运行时间...
查看>>