INDEX
Explanations
phrases related to making an impactful statement or action within a formal setting
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
2019
+0.20
0.6%
872
+0.14
0.4%
1150
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
2019
+0.20
0.11
924
+0.14
0.11
1445
+0.12
0.10
Negative Logits
Wię
-0.70
Postup
-0.66
Może
-0.66
Výhody
-0.65
Zapraszamy
-0.65
Conheça
-0.65
Dlaczego
-0.65
asteroide
-0.64
Gdzie
-0.64
Kiedy
-0.63
POSITIVE LOGITS
<bos>
0.87
uxx
0.77
fte
0.73
dispen
0.73
oun
0.72
fta
0.69
Hn
0.69
»>
0.68
fto
0.67
fhe
0.67
Activations Density 0.894%