INDEX
Explanations
expressions of approval or agreement
New Auto-Interp
Negative Logits
ůl
-0.17
osate
-0.16
ascar
-0.16
anca
-0.15
eldon
-0.15
Äįek
-0.15
ADOR
-0.15
į¼
-0.14
ISCO
-0.14
çĦ¶
-0.14
POSITIVE LOGITS
tober
0.30
lahoma
0.29
ahoma
0.20
AY
0.19
lah
0.18
amoto
0.17
ategor
0.17
Springer
0.17
Ok
0.16
anagan
0.16
Activations Density 0.035%