INDEX
Explanations
references to individuals, especially pronouns and possessive forms
New Auto-Interp
Negative Logits
ãĤ¤ãĥ«
-0.17
ascus
-0.15
awan
-0.14
ient
-0.14
.utilities
-0.14
-navbar
-0.14
wan
-0.14
ICO
-0.14
usk
-0.13
stal
-0.13
POSITIVE LOGITS
uby
0.15
tdown
0.15
igy
0.14
æĻ´
0.14
ÑĥÑĢи
0.13
ucid
0.13
را
0.13
ICODE
0.13
erton
0.13
elor
0.13
Activations Density 0.043%