INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     flexDirection
    -0.07
     άν
    -0.07
     Logout
    -0.07
     Thumb
    -0.07
    ありがとうござ
    -0.06
    -padding
    -0.06
    stří
    -0.06
     borrowed
    -0.06
     forget
    -0.06
    did
    -0.06
    POSITIVE LOGITS
     severe
    0.10
     Sever
    0.09
     scars
    0.08
     sever
    0.08
     harsh
    0.08
     Corona
    0.08
     Sev
    0.07
     dove
    0.07
    Core
    0.07
     severely
    0.07
    Act Density 0.011%

    No Known Activations