INDEX
    Explanations

    assertions that challenge the validity of claims or narratives

    New Auto-Interp
    Negative Logits
     unpredict
    -0.16
    /fw
    -0.15
    acht
    -0.15
    achts
    -0.15
     ambigu
    -0.15
     cyn
    -0.15
    Unexpected
    -0.14
    åįł
    -0.14
    odom
    -0.14
     تÙĦ
    -0.14
    POSITIVE LOGITS
     fall
    0.33
     bunk
    0.27
     fiction
    0.27
     base
    0.27
     pat
    0.25
     hog
    0.24
     false
    0.24
     fabrication
    0.24
     fig
    0.23
     Fall
    0.23
    Act Density 0.204%

    No Known Activations