INDEX
Explanations
references to the social media platform Facebook
New Auto-Interp
Negative Logits
ocard
-0.19
ovation
-0.15
stav
-0.15
Shields
-0.14
@stop
-0.14
wall
-0.14
ield
-0.14
ochond
-0.14
urm
-0.13
pr
-0.13
POSITIVE LOGITS
uce
0.16
ÏĤ
0.15
igne
0.14
igin
0.14
ÅĻeh
0.14
culate
0.14
erson
0.14
abama
0.14
atables
0.14
oles
0.14
Activations Density 0.010%