INDEX
    Explanations

    mentions of the word "trains"

    New Auto-Interp
    Negative Logits
    wn
    -0.74
    uid
    -0.74
    hed
    -0.70
    cape
    -0.66
     Palm
    -0.65
    cus
    -0.61
     wiped
    -0.60
    hetical
    -0.60
     Herb
    -0.59
    kin
    -0.59
    POSITIVE LOGITS
     trains
    3.98
     train
    2.52
    Train
    2.10
     Train
    1.98
     buses
    1.96
     railways
    1.94
    train
    1.91
     Amtrak
    1.61
    cars
    1.55
     bikes
    1.55
    Act Density 0.013%

    No Known Activations