INDEX
Explanations
actions related to driving or traveling by car
actions related to driving
New Auto-Interp
Negative Logits
Flavoring
-0.75
Seym
-0.71
Lum
-0.68
roma
-0.65
Ranked
-0.61
enei
-0.60
lett
-0.59
Published
-0.59
ANN
-0.58
acknowled
-0.57
POSITIVE LOGITS
wheel
1.03
away
0.96
bike
0.89
train
0.87
driving
0.87
whe
0.83
motorcycles
0.81
driving
0.80
BMW
0.78
toward
0.77
Activations Density 0.050%