INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    erno
    -0.07
    _Name
    -0.06
     confronted
    -0.06
    .El
    -0.06
    quez
    -0.06
    .Fore
    -0.06
     gelenek
    -0.06
    ——
    -0.06
    Financial
    -0.06
    hodob
    -0.06
    POSITIVE LOGITS
     effort
    0.07
    work
    0.07
     work
    0.07
     منه
    0.06
     scratch
    0.06
     twist
    0.06
     take
    0.06
     работа
    0.06
    った
    0.06
     рев
    0.06
    Act Density 0.023%

    No Known Activations