INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     itſelf
    -0.99
     themſelves
    -0.97
     pleaſure
    -0.91
     himſelf
    -0.86
     Efq
    -0.85
    neſs
    -0.84
     obſ
    -0.84
     myſelf
    -0.83
     whoſe
    -0.83
     ſeveral
    -0.83
    POSITIVE LOGITS
     down
    0.53
     &___
    0.52
     front
    0.51
    EndContext
    0.49
    gra
    0.48
    labelledby
    0.47
     Down
    0.47
    down
    0.46
    dat
    0.44
    loader
    0.43
    Act Density 0.060%

    No Known Activations