INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    dic
    -0.08
     socialism
    -0.08
     sto
    -0.07
     белем
    -0.07
     социаль
    -0.07
     BET
    -0.07
     сод
    -0.07
    Pare
    -0.07
     supervised
    -0.07
     "$
    -0.07
    POSITIVE LOGITS
     Pf
    0.08
    _Array
    0.08
     место
    0.08
     Petition
    0.08
    0.08
    0.08
    0.08
     halaman
    0.07
     incontri
    0.07
    lih
    0.07
    Act Density 0.001%

    No Known Activations