INDEX
    Explanations

    experiments

    New Auto-Interp
    Negative Logits
     warmly
    -0.07
    電腦
    -0.07
    -0.07
    ˹
    -0.06
    -government
    -0.06
    zion
    -0.06
    及其
    -0.06
     Sinai
    -0.06
     senator
    -0.06
     всем
    -0.06
    POSITIVE LOGITS
     siendo
    0.08
     applying
    0.08
     Mist
    0.07
     iw
    0.07
     PP
    0.07
     Tra
    0.07
    (array
    0.07
    .PUT
    0.07
     Ottawa
    0.06
     tand
    0.06
    Act Density 0.031%

    No Known Activations