INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Jed
    -0.09
     Community
    -0.09
     Overseas
    -0.08
     joe
    -0.08
     حالی
    -0.08
     Counselor
    -0.08
    unnan
    -0.08
     S
    -0.08
     Yup
    -0.08
     Gaza
    -0.08
    POSITIVE LOGITS
    LAB
    0.16
    POWER
    0.14
    lab
    0.12
    power
    0.11
    -files
    0.10
    Power
    0.10
    Lab
    0.10
    _lab
    0.10
    лаб
    0.10
     lab
    0.10
    Act Density 0.001%

    No Known Activations