INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     leng
    -0.07
     males
    -0.07
     preference
    -0.07
     dry
    -0.07
    ][:
    -0.06
    (Level
    -0.06
     --↵
    -0.06
     موجود
    -0.06
     integrity
    -0.06
     Experienced
    -0.06
    POSITIVE LOGITS
     tools
    0.11
     Tools
    0.08
    Tools
    0.08
     tool
    0.07
    TERS
    0.07
    0.06
    _srv
    0.06
     instruments
    0.06
     imprison
    0.06
     unsur
    0.06
    Act Density 0.021%

    No Known Activations