INDEX
Explanations
Twitter usernames
mentions of specific individuals or names
New Auto-Interp
Negative Logits
ACTIONS
-0.73
tantal
-0.67
pace
-0.65
ãĢIJ
-0.65
BILITIES
-0.64
seminal
-0.62
joint
-0.62
bruising
-0.61
overfl
-0.61
technicians
-0.61
POSITIVE LOGITS
)
0.99
)!
0.96
/)
0.95
)."
0.90
),
0.90
)...
0.88
veyard
0.87
Jew
0.86
%)
0.85
,)
0.84
Activations Density 0.080%