INDEX
Explanations
phrases indicating transitions or changes in states or conditions
New Auto-Interp
Negative Logits
329
-0.17
istogram
-0.15
onu
-0.14
zzo
-0.14
652
-0.13
onden
-0.13
æĸ¹éĿ¢
-0.13
achu
-0.13
ws
-0.13
affen
-0.13
POSITIVE LOGITS
being
0.28
merely
0.23
being
0.22
mere
0.22
Being
0.20
被
0.20
strength
0.19
mere
0.19
zero
0.19
strength
0.19
Activations Density 0.098%