INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     cortex
    -0.07
    -0.07
     harassed
    -0.06
    -0.06
    ivation
    -0.06
    anded
    -0.06
     brig
    -0.06
    ctp
    -0.06
     ct
    -0.06
    ár
    -0.06
    POSITIVE LOGITS
     Psalm
    0.20
     Ps
    0.10
    alm
    0.07
    ocht
    0.06
    0.06
    ΜΑΤ
    0.06
    .paused
    0.06
    Penn
    0.06
    ोर
    0.06
    /read
    0.06
    Act Density 0.003%

    No Known Activations