INDEX
Explanations
references to community membership and belonging
New Auto-Interp
Negative Logits
pétrole
-0.44
<bos>
-0.43
işi
-0.39
ketat
-0.39
Politique
-0.38
overkill
-0.38
primera
-0.38
huelga
-0.38
suicidio
-0.38
besser
-0.37
POSITIVE LOGITS
member
1.93
members
1.73
Member
1.71
Member
1.69
member
1.69
Members
1.63
MEMBER
1.62
Members
1.61
members
1.51
MEMBER
1.50
Activations Density 0.098%