INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Thankfully
    -0.08
    -step
    -0.07
     Sim
    -0.07
     beer
    -0.07
     ölüm
    -0.06
    .ham
    -0.06
     dairy
    -0.06
     Prot
    -0.06
    _on
    -0.06
     semi
    -0.06
    POSITIVE LOGITS
    πισ
    0.06
    !↵
    0.06
    _WORD
    0.06
    conti
    0.06
     payable
    0.06
     resolves
    0.06
    čas
    0.06
    laz
    0.06
     resolving
    0.06
     penny
    0.06
    Act Density 0.002%

    No Known Activations