INDEX
    Explanations

    phrases expressing emphasis or contradiction

    negations or expressions of denial

    New Auto-Interp
    Negative Logits
     eleph
    -1.04
     pione
    -1.03
    ò
    -1.01
    aditional
    -0.98
    ortunately
    -0.97
     metic
    -0.96
    Þ
    -0.94
    ö
    -0.92
     practition
    -0.92
    ą
    -0.91
    POSITIVE LOGITS
    't
    1.70
    ´
    1.03
    \'
    0.93
    uts
    0.89
    ÃŃ
    0.86
    `
    0.85
    �
    0.84
    Õ
    0.77
    ̶
    0.73
    bryce
    0.72
    Act Density 0.114%

    No Known Activations