INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Seats
    -0.07
    userid
    -0.07
     погляд
    -0.07
     депут
    -0.06
     Kota
    -0.06
     unwitting
    -0.06
     unspecified
    -0.06
    definitions
    -0.06
    _alarm
    -0.06
    -0.06
    POSITIVE LOGITS
    (old
    0.06
    164
    0.06
     LI
    0.06
     }}">↵
    0.06
    κει
    0.06
    887
    0.06
     optimizations
    0.06
    072
    0.06
     curtains
    0.06
    041
    0.06
    Act Density 0.028%

    No Known Activations