INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    in
    -0.87
     at
    -0.85
    sies
    -0.84
     became
    -0.82
    ade
    -0.81
    -0.80
    -0.79
    ables
    -0.79
    ag
    -0.78
     Februari
    -0.77
    POSITIVE LOGITS
     sp
    1.50
     Sp
    1.42
     SP
    1.27
    spender
    1.11
    Sp
    1.06
    に入れ
    1.05
     spat
    1.04
     spis
    1.04
    為に
    1.02
    ksp
    1.02
    Act Density 0.024%

    No Known Activations