INDEX
Explanations
pronouns or nouns indicating people
pronouns related to collective or individual experiences and identities
New Auto-Interp
Negative Logits
————
-0.69
Anat
-0.67
Correction
-0.66
Canaver
-0.65
Interested
-0.62
Bundes
-0.61
RAY
-0.59
Britt
-0.59
Unlimited
-0.59
2004
-0.58
POSITIVE LOGITS
©¶æ
0.82
encount
0.80
ngth
0.80
cius
0.76
frequ
0.75
ÃĥÃĤ
0.74
attained
0.72
encountered
0.70
alian
0.70
consume
0.70
Activations Density 0.325%