INDEX
Explanations
mentions of the word "trains"
mentions of trains
New Auto-Interp
Negative Logits
wn
-0.74
uid
-0.74
hed
-0.70
cape
-0.66
Palm
-0.65
cus
-0.61
wiped
-0.60
hetical
-0.60
Herb
-0.59
kin
-0.59
POSITIVE LOGITS
trains
3.98
train
2.52
Train
2.10
Train
1.98
buses
1.96
railways
1.94
train
1.91
Amtrak
1.61
cars
1.55
bikes
1.55
Activations Density 0.013%