INDEX
    Explanations

    statements or claims being questioned or challenged

    instances of the word "that."

    New Auto-Interp
    Negative Logits
     pione
    -0.87
    ãĤĬ
    -0.86
    oufl
    -0.82
    izont
    -0.81
    umat
    -0.81
    ãĥīãĥ©
    -0.79
    å§«
    -0.79
    arest
    -0.78
    ãĤ´ãĥ³
    -0.78
    ãĥĥ
    -0.78
    POSITIVE LOGITS
     there
    0.96
     they
    0.95
     possibility
    0.92
     aspect
    0.90
     notion
    0.86
     distinction
    0.85
     fact
    0.81
     we
    0.81
     assertion
    0.80
     phrase
    0.79
    Act Density 0.188%

    No Known Activations