INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Allison
    -0.07
    “Well
    -0.07
     공지
    -0.07
    -threatening
    -0.07
     Week
    -0.06
     startIndex
    -0.06
    	JLabel
    -0.06
     nabí
    -0.06
    -0.06
     Evil
    -0.06
    POSITIVE LOGITS
     pure
    0.12
    Pure
    0.12
     Pure
    0.12
    pure
    0.09
     PURE
    0.09
    0.07
    0.07
     purely
    0.07
    are
    0.07
    .Sync
    0.06
    Act Density 0.010%

    No Known Activations