INDEX
Explanations
words associated with upward and downward movement or changes in position
New Auto-Interp
Negative Logits
fav
-0.17
essional
-0.16
Randall
-0.15
laus
-0.14
feb
-0.14
stÃŃ
-0.14
aceous
-0.13
ongoose
-0.13
fü
-0.13
leston
-0.13
POSITIVE LOGITS
-desc
0.19
ion
0.17
endez
0.17
orp
0.17
ents
0.16
antly
0.16
mal
0.16
ior
0.15
ally
0.15
ence
0.15
Activations Density 0.042%