INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -1.05
    -1.05
     kematian
    -0.99
    -0.98
     їх
    -0.96
     importación
    -0.93
    ثمار
    -0.93
     ANIMAL
    -0.91
    -0.91
    pkins
    -0.91
    POSITIVE LOGITS
     Shift
    1.76
     shift
    1.63
     Shifting
    1.59
     shifts
    1.46
    Shift
    1.46
     SH
    1.44
     SHIFT
    1.39
    SHIFT
    1.38
    hift
    1.36
     Shi
    1.26
    Act Density 0.012%

    No Known Activations