INDEX
Explanations
expressions of admiration or commendation
New Auto-Interp
Negative Logits
yll
-0.16
Ì£
-0.15
.Rad
-0.15
ay
-0.14
opsis
-0.14
oy
-0.14
Äijá»iji
-0.14
usted
-0.14
roy
-0.14
nez
-0.13
POSITIVE LOGITS
ably
0.25
able
0.20
atory
0.19
ovol
0.18
eworthy
0.17
-worthy
0.17
worthy
0.16
SSION
0.15
fully
0.15
ful
0.15
Activations Density 0.042%