INDEX
    Explanations

    words that indicate a problem or limitation

    scientific writing

    New Auto-Interp
    Negative Logits
     houſe
    -0.97
     فريبيس
    -0.97
    despite
    -0.96
    <?
    -0.94
     pleaſure
    -0.94
     wikipagina
    -0.93
     becauſe
    -0.93
     Monfieur
    -0.92
     Theſe
    -0.92
     ſtate
    -0.91
    POSITIVE LOGITS
    ,
    0.56
     b
    0.56
     M
    0.55
     De
    0.55
     al
    0.55
    <eos>
    0.54
     des
    0.54
    b
    0.54
     v
    0.53
    ib
    0.52
    Act Density 6.820%

    No Known Activations