INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    OOT
    -0.07
     Pass
    -0.06
     persons
    -0.06
     toss
    -0.06
     Entity
    -0.06
     Russia
    -0.06
    Parents
    -0.06
     Cabr
    -0.06
    cq
    -0.06
     Consult
    -0.06
    POSITIVE LOGITS
     HD
    0.16
    HD
    0.12
     hd
    0.09
    .communic
    0.07
     Shin
    0.07
    hd
    0.07
    .hd
    0.07
     VGA
    0.07
    _good
    0.07
     первого
    0.07
    Act Density 0.004%

    No Known Activations