INDEX
Explanations
words related to security, investigations, and substances like ricin
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1081
+0.09
0.2%
678
+0.08
0.2%
623
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
2043
+0.09
0.04
586
+0.08
0.04
1081
+0.07
0.04
Negative Logits
cassert
-0.81
stdarg
-0.65
levier
-0.60
Și
-0.58
kutumia
-0.58
Dziękuję
-0.57
iomanip
-0.57
delà
-0.53
pymysql
-0.52
Thine
-0.51
POSITIVE LOGITS
list
0.94
cammin
0.93
suon
0.92
centrif
0.85
tramont
0.84
sappi
0.82
spont
0.82
soggior
0.80
ideolog
0.80
notor
0.80
Activations Density 0.358%