INDEX
    Explanations

    phrases that include the speaker's name and self-identification

    New Auto-Interp
    Negative Logits
    ysl
    -0.20
    axed
    -0.15
    uard
    -0.15
    lags
    -0.14
    amburg
    -0.14
     hood
    -0.14
    crete
    -0.14
    fect
    -0.14
     persu
    -0.13
    Ãłng
    -0.13
    POSITIVE LOGITS
    oppins
    0.16
    å¼ı
    0.15
    Expose
    0.15
    å¸ĥ
    0.15
     Introduced
    0.14
    ÙIJب
    0.14
    اراÙĨ
    0.14
    oÄŁ
    0.14
    edd
    0.14
    602
    0.14
    Act Density 0.082%

    No Known Activations