INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    modify
    -0.08
    ват
    -0.06
     captivity
    -0.06
    actually
    -0.06
    wolf
    -0.06
     sarcast
    -0.06
     Sesso
    -0.06
     bazen
    -0.06
    -0.06
     configurations
    -0.06
    POSITIVE LOGITS
     able
    0.08
    le
    0.07
     Veterinary
    0.06
     aft
    0.06
    orable
    0.06
    Lbl
    0.06
     Simple
    0.06
    _LSB
    0.06
     méd
    0.06
    dl
    0.06
    Act Density 0.006%

    No Known Activations