INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    UCH
    -0.07
     hostile
    -0.07
     monopoly
    -0.07
    -0.06
    ότε
    -0.06
    áo
    -0.06
     součást
    -0.06
    .AddComponent
    -0.06
     boz
    -0.06
    acades
    -0.06
    POSITIVE LOGITS
     stoi
    0.07
     voyeur
    0.07
    _blk
    0.07
     Xin
    0.06
    bw
    0.06
    _friend
    0.06
    "_
    0.06
    &s
    0.06
    _gt
    0.06
     didn
    0.06
    Act Density 0.040%

    No Known Activations