INDEX
Explanations
phrases emphasizing the word "Plus" indicating additional benefits or features
New Auto-Interp
Negative Logits
nt
-0.22
zelf
-0.20
/is
-0.19
chod
-0.16
åħ¶
-0.15
/place
-0.15
castle
-0.15
tube
-0.14
.UnitTesting
-0.14
å¯Ł
-0.14
POSITIVE LOGITS
ieurs
0.35
-minus
0.32
minus
0.29
ça
0.28
++++++++++++++++++++++++++++++++
0.23
quam
0.22
Minus
0.22
++++++++++++++++
0.22
++++
0.21
++++++++
0.20
Activations Density 0.021%