INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Sen
    -0.52
     sh
    -0.50
     bal
    -0.49
     sen
    -0.49
    Bal
    -0.48
     es
    -0.47
    Arma
    -0.46
     Bal
    -0.46
     sa
    -0.46
    fjspx
    -0.45
    POSITIVE LOGITS
    ſelf
    1.00
     myſelf
    0.91
    ſelves
    0.87
    ing
    0.86
     uſed
    0.86
     itſelf
    0.84
     ſta
    0.82
     uſe
    0.81
     preſent
    0.78
     pleaſure
    0.78
    Act Density 0.093%

    No Known Activations