INDEX
Explanations
references to individuals involved in events or situations
New Auto-Interp
Negative Logits
odate
-0.16
TL
-0.15
alin
-0.15
wi
-0.14
ores
-0.14
adh
-0.14
unders
-0.14
ody
-0.14
é»
-0.14
ثاÙĦ
-0.14
POSITIVE LOGITS
anonym
0.22
Anonymous
0.22
anonymously
0.22
Identified
0.21
anonymous
0.21
Anonymous
0.20
identified
0.20
åĮ
0.20
identified
0.20
anonymous
0.18
Activations Density 0.056%