INDEX
Explanations
references to agency and collective involvement in activities
New Auto-Interp
Negative Logits
Slate
-0.17
ighbor
-0.15
bloody
-0.15
assic
-0.14
È
-0.14
eldo
-0.14
eld
-0.14
neighbour
-0.14
note
-0.14
gaben
-0.13
POSITIVE LOGITS
fuck
0.15
ARSER
0.15
κÏĮ
0.15
ElementsBy
0.14
лÑĸв
0.14
fucked
0.13
orney
0.13
kee
0.13
fuck
0.13
untime
0.13
Activations Density 0.000%