INDEX
Explanations
phrases related to people's actions and interactions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
198
+0.13
0.4%
2006
+0.12
0.3%
1978
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
658
+0.13
0.06
862
+0.12
0.03
1919
+0.10
0.05
Negative Logits
territo
-0.78
kasa
-0.75
kase
-0.70
kamb
-0.69
vettoriale
-0.67
kuku
-0.66
mimi
-0.66
koz
-0.65
naer
-0.65
lele
-0.65
POSITIVE LOGITS
people
0.57
kteří
0.53
"}")
0.53
Viitteet
0.51
bParam
0.51
people
0.51
HideFlags
0.49
िल्म
0.48
Paglinawan
0.48
<bos>
0.48
Activations Density 0.356%