INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Wanted
    -0.08
    -0.07
     dic
    -0.07
    стройство
    -0.07
     предел
    -0.07
     Haber
    -0.07
    xxx
    -0.07
    строй
    -0.07
    worthy
    -0.07
    /of
    -0.07
    POSITIVE LOGITS
     ub
    0.08
    0.08
     Ben
    0.08
    su
    0.07
    落实
    0.07
    perf
    0.07
    -chave
    0.07
    تر
    0.07
     Kra
    0.07
     Bennett
    0.07
    Act Density 0.012%

    No Known Activations