INDEX
Explanations
proper nouns, particularly names of locations or people
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1967
+0.11
0.3%
845
+0.10
0.3%
998
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
16
+0.11
0.10
1967
+0.10
0.05
845
+0.08
0.04
Negative Logits
reluct
-0.80
depic
-0.80
contex
-0.75
embodi
-0.75
TheGreat
-0.75
resear
-0.71
pamph
-0.70
!".
-0.70
!”.
-0.69
emphat
-0.69
POSITIVE LOGITS
>=",
0.79
película
0.66
Αν
0.65
<=",
0.63
municipi
0.62
Ótimo
0.61
Abraços
0.61
Από
0.60
Δείτε
0.59
<",
0.58
Activations Density 1.434%