INDEX
Explanations
expressions of strong opinions or emotions regarding social issues
New Auto-Interp
Negative Logits
Cent
-0.69
ageing
-0.66
chnology
-0.63
hiber
-0.62
regeneration
-0.61
endeavour
-0.61
Franç
-0.61
Quart
-0.61
zed
-0.60
apon
-0.60
POSITIVE LOGITS
Fans
0.91
Looks
0.89
BRE
0.87
Actor
0.84
Rail
0.82
Speaking
0.82
Sorry
0.81
Liter
0.81
Yep
0.81
Fuck
0.81
Activations Density 0.070%