INDEX
Explanations
phrases emphasizing a particular point or aspect, especially with conditions or scenarios attached
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
605
+0.19
0.6%
1425
+0.10
0.3%
168
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
605
+0.19
0.02
1425
+0.10
0.03
893
+0.09
0.03
Negative Logits
Opere
-0.60
]}
-0.59
esattamente
-0.57
Voci
-0.56
Ressource
-0.56
:+:
-0.55
Paglinawan
-0.55
física
-0.53
-0.52
sullo
-0.52
POSITIVE LOGITS
apprehen
1.30
gaily
1.26
unspeak
1.25
shenan
1.23
intersper
1.17
rascal
1.15
gratify
1.13
quitted
1.12
vainly
1.11
pymysql
1.10
Activations Density 0.094%