This document summarizes a research paper on scaling laws for neural language models. Some key findings of the paper include:
- Language model performance depends strongly on model scale and weakly on model shape. With enough compute and data, performance scales as a power law of parameters, compute, and data.
- Overfitting is universal, with penalties depending on the ratio of parameters to data.
- Large models have higher sample efficiency and can reach the same performance levels with less optimization steps and data points.
- The paper motivated subsequent work by OpenAI on applying scaling laws to other domains like computer vision and developing increasingly large language models like GPT-3.
You Only Look One-level Featureの解説と見せかけた物体検出のよもやま話Yusuke Uchida
?
第7回全日本コンピュータビジョン勉強会「CVPR2021読み会」(前編)の発表資料です
https://kantocv.connpass.com/event/216701/
You Only Look One-level Featureの解説と、YOLO系の雑談や、物体検出における関連する手法等を広く説明しています
You Only Look One-level Featureの解説と見せかけた物体検出のよもやま話Yusuke Uchida
?
第7回全日本コンピュータビジョン勉強会「CVPR2021読み会」(前編)の発表資料です
https://kantocv.connpass.com/event/216701/
You Only Look One-level Featureの解説と、YOLO系の雑談や、物体検出における関連する手法等を広く説明しています
IoT Devices Compliant with JC-STAR Using Linux as a Container OSTomohiro Saneyoshi
?
Security requirements for IoT devices are becoming more defined, as seen with the EU Cyber Resilience Act and Japan’s JC-STAR.
It's common for IoT devices to run Linux as their operating system. However, adopting general-purpose Linux distributions like Ubuntu or Debian, or Yocto-based Linux, presents certain difficulties. This article outlines those difficulties.
It also, it highlights the security benefits of using a Linux-based container OS and explains how to adopt it with JC-STAR, using the "Armadillo Base OS" as an example.
Feb.25.2025@JAWS-UG IoT
79. Hard negative and positive mining
マッチングが困難なペアに対するCNNの更新
順伝播によりhard negative, positiveのペアを求める
hard negativeとhard positiveに対して再度逆伝播
Simo-serra, et al., “Discriminative Learning of Deep Convolutional Feature Point Descriptors”,
ICCV, 2015. ポスターより、図の一部を引用
? Consistent improvements over the state of the art.
? Trained in one dataset, but generalizes very well to scaling, rota-
tion, deformation and illumination changes.
? Computational ef?ciency (on GPU: 0.76 ms; dense SIFT: 0.14 ms).
Code is available: https:/ / github.com/ etrulls/ deepdesc-release
Key observation
1. We train a Siamese architecture with pairs of patches. We want
to bringmatching pairstogether and otherwise pull them apart.
2. Problem? Randomly sampled pairs are already easy to separate.
3. Solution: To train discriminative networks we use hard negative
and positive mining. This proves essential for performance.
(a) 12 points/ 132 patches with t-SNE [8]
(b) All pairs: pos/ neg
(c) “ Hard” pairs: pos/ neg
We take samples from [1], for illustration. Cor-
responding patchesareshown with samecolor:
(a) Representation from t-SNE [8]. Distance
encodessimilarity.
(b) Random sampling: similar (close) posi-
tivesand different (distant) negatives.
(c) We mine the samples to obtain dissimilar
positives (+, long blue segments) and sim-
ilar negatives (× , short red segments):
(d) Random sampling results in easy pairs.
(e) Mined pairs with harder correspondences.
(d) Random pairs
(e) Mined pairs
This allows us to train discriminative models with a small number
of parameters (~45k), which also alleviates over?tting concerns.
Stride 2 3 4
Train on the M VS Dataset [1]. 64 × 64 grayscale pat
Statue of Liberty (LY, top), NotreDame (ND, center)
bottom). ~150k points and ~450k patches each ? 10
and 1012
negative pairs? Ef?cient exploration wit
Weminimizethehingeembedding loss. With 3D po
l(x1, x2)=
∥D(x1) ? D(x2)∥2,
max(0, C ? ∥D(x1) ? D(x2)∥2),
This penalizes corresponding pairs that are placed
non-corresponding pairs that are less than C units a
M ethodology: Train over two setsand test over third
with cross-validation. Metric: precision-recall (PR
haystack’ setting: pick 10k unique points and gen
tive pair and 1k negative pairs for each, i.e. 10k po
negatives. Results summarized by ‘Area Under the
Effect of mining
(a) Forward-propagate positives sp ≥ 128 and negat
(b) Pick the 128 with the largest loss (for each) and b
Recall
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Precision
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
PR curve, training LY+YO, test ND
SIFT
CNN3, mined 1/2
CNN3, mined 2/2
CNN3, mined 4/4
CNN3, mined 8/8
Recall
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Precision
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
PR curve, training LY+ND, test YO
SIFT
CNN3, mined 1/2
CNN3, mined 2/2
CNN3, mined 4/4
CNN3, mined 8/8
0 0.1 0.2 0.3
Precision
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
PR curve, tr
sp sn PR AUC
128 128 0.366
256 256 0.374
512 512 0.369
1024 1024 0.325
Table 1: (a) No mining.
Larger batches do not help.
sp sn rp rn Cos
128 256 1 2 20%
256 256 2 2 35%
512 512 4 4 48%
1024 1024 8 8 67%
Table 2: (b) Mining with rp =
The mining cost is incurred d
available: https:/ / github.com/ etrulls/ deepdesc-release
bservation
ain a Siamese architecture with pairs of patches. We want
ngmatching pairstogether and otherwisepull them apart.
em? Randomly sampled pairs are already easy to separate.
ion: To train discriminative networks we use hard negative
positive mining. This proves essential for performance.
points/ 132 patches with t-SNE [8]
(b) All pairs: pos/ neg
(c) “ Hard” pairs: pos/ neg
samples from [1], for illustration. Cor-
ing patchesareshown with samecolor:
bottom). ~150k points
and 1012
negative pa
Weminimizethehing
l(x1, x2)=
m
This penalizes corres
non-corresponding p
M ethodology: Train o
with cross-validation
haystack’ setting: pic
tive pair and 1k nega
negatives. Results sum
Effect of mining
(a) Forward-propaga
(b) Pick the 128 with
(a) 12 points/ 132 patches with t-SNE [8]
(b) All pairs: pos/ neg
(c) “ Hard” pairs: pos/ neg
We take samples from [1], for illustration. Cor-
responding patchesareshown with samecolor:
(a) Representation from t-SNE [8]. Distance
encodessimilarity.
(b) Random sampling: similar (close) posi-
tivesand different (distant) negatives.
(c) We mine the samples to obtain dissimilar
(d) Random pairs
各パッチ画像の特徴量の距離空間
同じ色のパッチ
↓
対応点
赤:negativeペア
青:positiveペア
hard positive
hard negative
hard negativeとhard positiveに適したネットワークに更新
92. ICCV2015での関連論文
Local Convolutional Features with Unsupervised
Training for Image Retrieval
2層のConvolutional Kernel Networkによる特徴量記述
入力パッチ画像にホワイトニングや勾配抽出等の
前処理を加えることで性能向上
Aggregating Deep Convolutional Features for
Image Retrieval
畳み込み層で出力された各2次元特徴マップの線形和を
特徴量として表現
特徴量にPCAを適用してコンパクト化,過剰適合の防止
RIDE: Reversal Invariant Descriptor
Enhancement
画像反転に不変なSIFT特徴量を提案
Bag-of-Featuresの問題に対して性能向上
CNNをベースとした手法
101. 関連文献
— [SIFT]
David G. Lowe, "Distinctive image features from scale-Invariant
keypoints," IJCV2004
— [SURF]
Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool, “Speeded-
up robust features (SURF),” CVIU2008.
— [PCA-SIFT]
Yan Ke and Rahul Sukthankar, “PCA-SIFT: A more distinctive
representation for local image descriptors,” CVPR2004
— [GLOH]
Krystian Mikolajczyk and Cordelia Schmid, “A performance evaluation of
local descriptors,“ TPAMI2005
— [RIFF]
Gabriel Takacs, Vijay Chandrasekhar, Sam Tsai, David Chen, Radek
Grzeszczuk, and Bernd Girod, "Unified real-time tracking and recognition
with rotation-invariant fast features," CVPR2010
— [ASIFT]
Jean-Michel Morel and Guoshen Yu, "ASIFT: A new framework for fully
affine invariant image comparison" SIAM Journal on Imaging Sciences,
Vol. 2, No. 2, pp. 438-469, April 2009.
本資料では、下記に記載の論文から図を一部引用した。
102. 関連文献
— [BRIEF]
M.Calonder, V.Lepetit, and P.Fua, “BRIEF: Binary Robust Independent
Elementary Features,” ECCV2010
— [BRISK]
S.Leutenegger, M.Chli and R.Siegwart, "BRISK: Binary Robust Invariant
Scalable Keypoints,“ ICCV2011
— [ORB]
E.Rublee, V.Rabaud, K.Konolige and G.Bradsk, "ORB: an efficient
alternative to SIFT or SURF,“ ICCV2011
— [FREAK]
Alexandre Alahi, Raphael Ortiz, and Pierre Vandergheynst, “FREAK: Fast
retina keypoint," CVPR2012
— [D-BRIEF]
Tomasz Trzcinski and Vincent Lepetit, "Efficient discriminative projections
for compact binary descriptors," ECCV2012
— [BinBoost]
T. Trzcinski, M. Christoudias, V. Lepetit, and P. Fua, "Boosting binary
keypoint descriptors," CVPR2013
— [BOLD]
V. Balntas, L. Tang, and K. Mikolajczyk. BOLD - binary online learned
descriptor for efficient image matching. CVPR2015
本資料では、下記に記載の論文から図を一部引用した。
103. 関連文献
— [CARD]
M.Ambai and Y.Yoshida, “Compact And Real-time Descriptors, ” in Proc.
ICCV2011
— [LDA Hash]
C. Strecha, A.M. Bronstein, M.M. Bronstein, and Pee. Fua, "LDAHash:
Improved matching with smaller descriptors," TPAMI2012
— [CNN Descriptor]
E. Simo-Serra, E. Trulls, L. Ferraz, I. Kokkinos, P. Fua, and F. Moreno-
Noguer, "Discriminative learning of deep convolutional feature point
descriptors," ICCV2015
— [CNN Similarity]
S. Zagoruyko and N. Komodakis. Learning to compare image patches via
convolutional neural networks," CVPR2015
— [Spectral Affine SIFT]
Takahiro Hasegawa, Mitsuru Ambai, Kohta Ishikawa, Gou Koutaki, Yuji
Yamauchi, Takayoshi Yamashita, Hironobu Fujiyoshi, "Multiple-Hypothesis
Affine Region Estimation With Anisotropic LoG Filters," ICCV2015
本資料では、下記に記載の論文から図を一部引用した。