INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Locker
    -0.08
     courtroom
    -0.07
    국의
    -0.07
     jury
    -0.07
     formidable
    -0.06
     Galactic
    -0.06
     //}↵↵
    -0.06
    .]
    -0.06
     Marcus
    -0.06
    Marcus
    -0.06
    POSITIVE LOGITS
    ,start
    0.08
    0.07
     tl
    0.07
    >"+
    0.07
    Around
    0.07
     prohibit
    0.07
     gw
    0.07
    ">'+
    0.07
    fully
    0.07
    /"+
    0.06
    Act Density 0.012%

    No Known Activations