INDEX
Explanations
questions and interrogative phrases
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.33
1.2%
1491
+0.12
0.4%
597
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
597
+0.33
0.03
481
+0.12
0.03
2025
+0.11
0.03
Negative Logits
ⓧ
-0.94
<bos>
-0.91
Lmfao
-0.75
<?
-0.73
Hahahahaha
-0.71
Hahah
-0.69
/**
-0.69
Lma
-0.69
Noice
-0.62
-0.60
POSITIVE LOGITS
lemp
0.94
Valentín
0.93
quoc
0.91
ananas
0.88
paradiso
0.88
barbacoa
0.88
cristo
0.86
thuy
0.85
nuoc
0.85
paloma
0.85
Activations Density 0.107%