INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Catholics
    -0.06
     не
    -0.06
     rinse
    -0.06
    _N
    -0.05
    inish
    -0.05
    -0.05
    ercise
    -0.05
    .dequeue
    -0.05
    ่ได
    -0.05
    120
    -0.05
    POSITIVE LOGITS
    فو
    0.07
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    0.07
    presentation
    0.06
    castle
    0.06
    алог
    0.06
     вій
    0.06
    geo
    0.06
     ancestor
    0.06
    юдж
    0.06
     мік
    0.06
    Act Density 0.002%

    No Known Activations