INDEX
Explanations
references to social media interactions and users
New Auto-Interp
Negative Logits
openh
-0.19
Ã
-0.18
ohon
-0.15
urbed
-0.14
abbo
-0.14
roundup
-0.14
alah
-0.14
vide
-0.14
olith
-0.14
afe
-0.14
POSITIVE LOGITS
89
0.25
79
0.24
87
0.24
73
0.23
82
0.23
23
0.23
01
0.23
istrovstvÃŃ
0.23
69
0.23
85
0.22
Activations Density 0.092%