INDEX
Explanations
references to trains or railroad-related terms
New Auto-Interp
Negative Logits
assy
-0.17
ritis
-0.17
ting
-0.16
itzer
-0.16
ing
-0.16
adder
-0.14
uppy
-0.14
ADE
-0.14
instein
-0.14
plier
-0.14
POSITIVE LOGITS
ees
0.28
ee
0.21
loads
0.19
/bus
0.19
ings
0.19
station
0.18
load
0.17
derail
0.17
buff
0.17
bart
0.17
Activations Density 0.011%