INDEX
    Explanations

    verbs related to actions of force or compulsion

    words related to driving or motivation

    New Auto-Interp
    Negative Logits
     Seym
    -0.96
    ereo
    -0.74
    çĦ
    -0.74
    roma
    -0.70
    umbn
    -0.68
    aido
    -0.67
    yip
    -0.64
    iao
    -0.64
    iannopoulos
    -0.64
     Lumpur
    -0.63
    POSITIVE LOGITS
     driving
    0.86
     away
    0.84
     wedge
    0.78
    bike
    0.74
    driving
    0.74
    wheel
    0.74
    train
    0.72
    away
    0.71
    ousel
    0.69
     driven
    0.68
    Act Density 0.036%

    No Known Activations