INDEX
Explanations
expressions of feelings and inquiries about personal relationships and well-being
New Auto-Interp
Negative Logits
रण
-0.14
5
-0.13
apol
-0.13
Mil
-0.13
awe
-0.13
203
-0.13
ÙĪÙĨ
-0.13
ाà¤Ĺत
-0.13
ç«ĭãģ¦
-0.13
toler
-0.12
POSITIVE LOGITS
LAG
0.15
orld
0.15
ãĥ³ãĥģ
0.15
eydi
0.14
pulse
0.14
idot
0.14
vůbec
0.14
andest
0.14
queeze
0.14
-any
0.14
Activations Density 0.120%