INDEX
Explanations
percentages, ages, and statistics
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
184
+0.31
1.0%
674
+0.15
0.5%
1343
+0.13
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
184
+0.31
0.02
16
+0.15
0.03
453
+0.13
0.03
Negative Logits
reluct
-0.88
resear
-0.78
philanth
-0.78
apprehen
-0.77
practition
-0.77
intrigu
-0.75
disagre
-0.74
enthusi
-0.73
contribut
-0.73
sophistic
-0.71
POSITIVE LOGITS
<bos>
0.73
Dziękuję
0.72
CiNii
0.72
BIBSYS
0.69
Vielleicht
0.66
película
0.66
Baillargeon
0.64
Polecam
0.63
Από
0.63
Πηγή
0.62
Activations Density 0.155%