INDEX
    Explanations

    repeated references to the concept of "back"

    Directional words like "up," "down," or "back."

    New Auto-Interp
    Negative Logits
     Monfieur
    -1.19
     itſelf
    -1.07
     Jefus
    -1.07
     myſelf
    -1.04
     pleaſure
    -1.04
     houſe
    -1.02
     Efq
    -1.01
     ſta
    -1.01
     Diſ
    -1.00
     himſelf
    -1.00
    POSITIVE LOGITS
     around
    0.67
     in
    0.65
     up
    0.63
     down
    0.61
     toward
    0.59
     at
    0.58
     out
    0.58
     towards
    0.56
     along
    0.56
     on
    0.52
    Act Density 0.123%

    No Known Activations