INDEX
Explanations
instances of the word "aw" that indicate surprise or admiration
New Auto-Interp
Negative Logits
رش
-0.16
Ged
-0.15
agal
-0.15
aab
-0.15
Heb
-0.14
iets
-0.14
ognito
-0.14
arget
-0.14
uzzer
-0.14
ometown
-0.13
POSITIVE LOGITS
oft
0.17
las
0.15
Henri
0.14
æ©
0.14
renc
0.14
Laur
0.14
ist
0.13
ampa
0.13
ekil
0.13
=__
0.13
Activations Density 0.012%