INDEX
Explanations
references to social media platforms, particularly Facebook
mentions of Facebook
New Auto-Interp
Negative Logits
practicing
-0.66
rero
-0.65
1001
-0.62
Hof
-0.62
brim
-0.61
stood
-0.60
crank
-0.60
VERT
-0.60
neg
-0.59
ilk
-0.59
POSITIVE LOGITS
0.93
0.88
ileaks
0.86
Messenger
0.82
emouth
0.81
0.80
imil
0.76
ank
0.76
culosis
0.72
Features
0.72
Activations Density 0.010%