INDEX
    Explanations

    descriptions of inadequacy or error

    New Auto-Interp
    Negative Logits
    真的很
    0.47
    真的是
    0.46
    是真的
    0.42
     вполне
    0.41
     durchaus
    0.40
    没有任何
    0.40
    Thankfully
    0.40
     obvious
    0.39
     অবশ্য
    0.39
    明明
    0.39
    POSITIVE LOGITS
     unreliable
    1.26
     ineffective
    1.13
     inadequate
    1.11
     unsatisfactory
    1.09
     suboptimal
    1.08
     flawed
    1.04
     problematic
    1.00
     imperfect
    0.98
     inefficient
    0.95
     questionable
    0.94
    Act Density 0.023%

    No Known Activations