INDEX
Explanations
phrases indicating actions or states of being
New Auto-Interp
Negative Logits
utin
-0.14
416
-0.14
emale
-0.14
ARG
-0.14
grav
-0.13
instrument
-0.13
vie
-0.13
geo
-0.13
tridge
-0.13
çŁ³
-0.13
POSITIVE LOGITS
antity
0.14
íĥĪ
0.14
Geh
0.14
é¨
0.14
GLOBALS
0.13
imum
0.13
antly
0.13
res
0.13
Ħ
0.13
ös
0.13
Activations Density 1.055%