INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    STOP
    -0.06
    스토
    -0.06
     beer
    -0.06
     leagues
    -0.06
    covering
    -0.06
    Pub
    -0.05
    SHARE
    -0.05
     Av
    -0.05
     seven
    -0.05
     دری
    -0.05
    POSITIVE LOGITS
    πουλος
    0.08
     efficient
    0.07
     chronic
    0.06
    _alt
    0.06
    sst
    0.06
     Sag
    0.06
     repro
    0.06
     erad
    0.06
     sunny
    0.06
    ;color
    0.06
    Act Density 0.205%

    No Known Activations