INDEX
Explanations
mentions of vehicles, specifically vans
New Auto-Interp
Negative Logits
Seym
-0.93
pta
-0.92
ometimes
-0.79
reluct
-0.72
DonaldTrump
-0.71
nces
-0.64
ģĸ
-0.64
ð
-0.64
otten
-0.63
ãģĦ
-0.63
POSITIVE LOGITS
load
0.97
neys
0.96
loads
0.87
illa
0.86
adium
0.85
agons
0.83
parked
0.81
ney
0.80
rol
0.80
ning
0.79
Activations Density 0.005%