INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Lod
    -0.08
     Cz
    -0.07
     větš
    -0.07
    pray
    -0.06
    azz
    -0.06
     Buzz
    -0.06
     activated
    -0.06
     metaData
    -0.06
     bez
    -0.06
     betray
    -0.06
    POSITIVE LOGITS
     Simple
    0.13
     simple
    0.12
    Simple
    0.11
    .simple
    0.10
    -simple
    0.09
    นต
    0.09
     easily
    0.09
    simple
    0.08
     Maple
    0.08
    .Simple
    0.08
    Act Density 0.040%

    No Known Activations