INDEX
    Explanations

    questions and statements that inquire about the reasons behind actions or beliefs

    New Auto-Interp
    Negative Logits
     WHETHER
    -0.15
     dazu
    -0.14
    ino
    -0.14
    ynamo
    -0.14
    .YesNo
    -0.14
    scan
    -0.14
    chet
    -0.14
    oret
    -0.14
    tons
    -0.14
    Ïĥκε
    -0.13
    POSITIVE LOGITS
    /how
    0.43
    soever
    0.33
     they
    0.28
     we
    0.28
     exactly
    0.27
     it
    0.26
     there
    0.24
     bother
    0.23
     certain
    0.23
    /if
    0.23
    Act Density 0.051%

    No Known Activations