INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Majefty
    -0.92
    ſelves
    -0.91
     myſelf
    -0.90
     Efq
    -0.88
     Jefus
    -0.87
     itſelf
    -0.83
     uſed
    -0.79
    ſelf
    -0.79
     Roskov
    -0.79
    InitVars
    -0.78
    POSITIVE LOGITS
     you
    0.71
     like
    0.57
     it
    0.57
     I
    0.56
    Datuak
    0.56
    obod
    0.51
     hlad
    0.49
    <td>
    0.48
     no
    0.47
     their
    0.47
    Act Density 0.067%

    No Known Activations