INDEX
Explanations
phrases related to historical events or processes
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
370
+0.11
0.3%
1778
+0.09
0.3%
1145
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
370
+0.11
0.02
474
+0.09
0.02
1604
+0.09
0.02
Negative Logits
utop
-0.68
kram
-0.64
solidar
-0.63
bombe
-0.62
ideolog
-0.61
vola
-0.60
aquare
-0.59
Bibl
-0.58
lomb
-0.58
perle
-0.57
POSITIVE LOGITS
followed
1.06
Followed
0.98
followed
0.97
Followed
0.85
seguido
0.77
folgt
0.69
gevol
0.68
diikuti
0.66
follow
0.62
FOLLOW
0.62
Activations Density 0.104%