INDEX
Explanations
subjects and pronouns associated with personal actions or feelings
New Auto-Interp
Negative Logits
ummer
-0.16
лÑĸд
-0.16
apr
-0.16
oly
-0.16
oven
-0.15
ITOR
-0.15
itor
-0.14
ế
-0.14
idon
-0.14
ov
-0.14
POSITIVE LOGITS
oard
0.15
uveden
0.15
421
0.15
mia
0.14
getY
0.14
ener
0.14
mask
0.14
imli
0.13
tÆ°á»Ľng
0.13
ulario
0.13
Activations Density 0.235%