INDEX
Explanations
phrases indicating comparison or evaluation of various situations or conditions
New Auto-Interp
Negative Logits
inge
-0.14
elix
-0.14
acz
-0.14
bilder
-0.13
oples
-0.13
обÑĢаз
-0.13
lein
-0.13
iesel
-0.13
iri
-0.13
au
-0.13
POSITIVE LOGITS
argar
0.18
oose
0.16
arger
0.16
edik
0.16
mps
0.16
ække
0.16
ikel
0.15
еÑħ
0.14
ÐĺТ
0.14
odyn
0.14
Activations Density 0.019%