INDEX
    Explanations

    special conversation/formatting tokens and metadata markers (role tags like "user"/"assistant" and header/end-of-text markers).

    New Auto-Interp
    Negative Logits
     _("
    -0.07
     stereotype
    -0.07
    	fp
    -0.07
    цу
    -0.06
     Вона
    -0.06
    awah
    -0.06
    	Key
    -0.06
    -0.06
     swirl
    -0.06
     fino
    -0.06
    POSITIVE LOGITS
    _LEAVE
    0.07
    _BUS
    0.07
    .userInteractionEnabled
    0.06
     applaud
    0.06
    ynchronize
    0.06
     guarding
    0.06
    DOCTYPE
    0.06
    ~↵↵
    0.06
     Chim
    0.06
     Mant
    0.06
    Act Density 0.028%

    No Known Activations