INDEX
Explanations
verbs related to driving and motivation
New Auto-Interp
Negative Logits
Seym
-0.96
ereo
-0.76
ertain
-0.72
yip
-0.72
ellen
-0.69
umbn
-0.69
Lum
-0.69
anamo
-0.68
ileaks
-0.68
iannopoulos
-0.66
POSITIVE LOGITS
train
0.92
driving
0.90
driving
0.87
wheel
0.82
dealership
0.74
driven
0.74
Driving
0.73
club
0.71
wedge
0.70
driver
0.70
Activations Density 0.550%