INDEX
Explanations
articles and determiners indicating quantities and nouns
New Auto-Interp
Negative Logits
umba
-0.15
éĶĻ
-0.14
lder
-0.14
osci
-0.14
forfe
-0.13
andon
-0.13
icone
-0.13
×Ļ
-0.13
Sab
-0.13
riz
-0.13
POSITIVE LOGITS
ERSHEY
0.21
EDA
0.15
proced
0.15
redicate
0.14
Äħż
0.14
assa
0.14
etr
0.14
eti
0.14
ÐIJÑĢÑħÑĸв
0.14
ilha
0.14
Activations Density 0.193%