INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Confeder
    -0.07
     auss
    -0.07
     datos
    -0.07
     meses
    -0.07
     кер
    -0.07
     Hor
    -0.06
     Sebast
    -0.06
     Kot
    -0.06
     ISP
    -0.06
     defiance
    -0.06
    POSITIVE LOGITS
     bodily
    0.07
     Installation
    0.07
    .Identifier
    0.06
    .fold
    0.06
    �试
    0.06
    riteln
    0.06
    arget
    0.06
    0.06
     beforeEach
    0.06
    .NORMAL
    0.06
    Act Density 0.004%

    No Known Activations