INDEX
Explanations
affirmations and enthusiastic punctuation in the text
New Auto-Interp
Negative Logits
atica
-0.16
Goldberg
-0.16
ulia
-0.15
memcmp
-0.15
unpredict
-0.14
ophy
-0.14
Ñıн
-0.14
/Common
-0.13
asia
-0.13
Cloth
-0.13
POSITIVE LOGITS
Place
0.28
Place
0.26
place
0.26
Heat
0.23
PLACE
0.23
Directions
0.22
Directions
0.22
Diss
0.21
Heat
0.21
mix
0.21
Activations Density 0.086%