INDEX
    Explanations

    phrases indicating disbelief or questioning one's understanding

    New Auto-Interp
    Negative Logits
    -0.65
    .
    -0.48
    '>";
    -0.45
    <bos>
    -0.40
    -0.37
     –
    -0.35
    </h5>
    -0.35
    </td>
    -0.35
    </h3>
    -0.34
    </blockquote>
    -0.33
    POSITIVE LOGITS
     Wikiseite
    0.80
     nahilalakip
    0.76
     autorytatywna
    0.74
     wikipagina
    0.73
    attutto
    0.72
    SharedCtor
    0.71
    Diweddarwch
    0.71
    jaqueta
    0.71
     كومونز
    0.70
     transfieras
    0.70
    Act Density 2.705%

    No Known Activations