INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     magazine
    -0.07
    .date
    -0.07
           
    -0.07
     з
    -0.07
     prés
    -0.06
    act
    -0.06
    ۷
    -0.06
    _average
    -0.06
     Healthy
    -0.06
     Church
    -0.06
    POSITIVE LOGITS
     too
    0.11
     TOO
    0.08
     Too
    0.08
    too
    0.07
    Too
    0.07
     úprav
    0.07
    なん
    0.07
    tau
    0.06
     Nico
    0.06
    .O
    0.06
    Act Density 0.014%

    No Known Activations