INDEX
    Explanations

    verbs related to driving and motivation

    New Auto-Interp
    Negative Logits
     Seym
    -0.96
    ereo
    -0.76
    ertain
    -0.72
    yip
    -0.72
    ellen
    -0.69
    umbn
    -0.69
     Lum
    -0.69
    anamo
    -0.68
    ileaks
    -0.68
    iannopoulos
    -0.66
    POSITIVE LOGITS
    train
    0.92
    driving
    0.90
     driving
    0.87
    wheel
    0.82
     dealership
    0.74
     driven
    0.74
     Driving
    0.73
    club
    0.71
     wedge
    0.70
    driver
    0.70
    Act Density 0.550%

    No Known Activations