INDEX
Explanations
references to personal experiences and professional roles
New Auto-Interp
Negative Logits
oure
-0.17
BuzzFeed
-0.16
ophobia
-0.15
خاÙĨÙĩ
-0.15
quina
-0.15
perse
-0.15
iei
-0.14
ja
-0.14
anon
-0.14
ãĤ²
-0.14
POSITIVE LOGITS
personal
0.23
personal
0.19
Personal
0.17
Personal
0.17
personally
0.16
ugar
0.16
Ãłn
0.15
/self
0.14
kiÅŁisel
0.14
à¹Ĥà¸Ĭ
0.14
Activations Density 0.062%