INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    RUnlock
    -0.65
     houſe
    -0.64
     himſelf
    -0.62
     itſelf
    -0.62
     themſelves
    -0.61
     ſtate
    -0.61
     myſelf
    -0.61
     preſent
    -0.58
     Eſ
    -0.57
    anskje
    -0.57
    POSITIVE LOGITS
    ity
    1.02
    ally
    1.02
    ary
    0.94
    ly
    0.86
    ality
    0.82
    als
    0.81
    ening
    0.79
    ial
    0.78
    ing
    0.77
    al
    0.75
    Act Density 0.062%

    No Known Activations