INDEX
    Explanations

    phrases related to personal opinions or thoughts

    New Auto-Interp
    Negative Logits
     maksi
    -1.21
     ?...
    -1.17
     erik
    -1.12
     🤣🤣
    -1.12
     purcha
    -1.07
     reluct
    -1.06
     !...
    -1.06
     antik
    -1.06
     depic
    -1.05
     milf
    -1.04
    POSITIVE LOGITS
     really
    1.06
    really
    0.99
    Really
    0.91
     Really
    0.89
     REALLY
    0.89
     wirklich
    0.70
    <bos>
    0.69
     realmente
    0.65
     truly
    0.64
     naprawdę
    0.58
    Act Density 0.083%

    No Known Activations