INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     benefit
    -0.07
     meal
    -0.07
    Resistance
    -0.07
     baby
    -0.07
    Forest
    -0.07
     officers
    -0.07
     cohesive
    -0.07
     Colors
    -0.07
    Subset
    -0.07
     Find
    -0.07
    POSITIVE LOGITS
    0.06
     псих
    0.06
     собствен
    0.06
    .ResponseEntity
    0.06
     przed
    0.06
     messed
    0.06
     jež
    0.06
    (pad
    0.06
     Tomáš
    0.06
     자신
    0.06
    Act Density 0.000%

    No Known Activations