INDEX
    Explanations

    negative statements regarding capabilities and evidence

    New Auto-Interp
    Negative Logits
    contri
    -0.15
    abr
    -0.15
    çķĻ
    -0.15
    stice
    -0.14
    aby
    -0.14
    ovie
    -0.14
    à¹ģหล
    -0.14
    annon
    -0.14
    umpt
    -0.13
    æĽ
    -0.13
    POSITIVE LOGITS
     even
    0.26
    even
    0.25
     anywhere
    0.24
     slightest
    0.20
    Anywhere
    0.19
     much
    0.19
     any
    0.19
     bother
    0.19
     meaningful
    0.19
     really
    0.19
    Act Density 0.060%

    No Known Activations