INDEX
    Explanations

    verbs and phrases related to guiding or directing actions

    New Auto-Interp
    Negative Logits
     Corpus
    -0.76
    ropolitan
    -0.75
    ylon
    -0.74
    enegger
    -0.71
    upon
    -0.68
    è¦ļéĨĴ
    -0.67
    ITNESS
    -0.66
    ocalyptic
    -0.65
     Leban
    -0.64
     Ming
    -0.64
    POSITIVE LOGITS
     toward
    1.02
     towards
    0.97
     steer
    0.95
     steered
    0.95
    wheel
    0.94
     clear
    0.92
     away
    0.87
     downwards
    0.83
     wheel
    0.78
     steering
    0.78
    Act Density 0.007%

    No Known Activations