INDEX
    Explanations

    statements that express safety or certainty in assumptions

    New Auto-Interp
    Negative Logits
    iÅŁ
    -0.15
    uant
    -0.15
    andler
    -0.15
    iÅŁim
    -0.15
    turnstile
    -0.15
    ewis
    -0.14
     Masc
    -0.14
    ambiguous
    -0.14
    peats
    -0.14
    ucz
    -0.13
    POSITIVE LOGITS
     assumption
    0.21
     stretch
    0.21
     safe
    0.20
     Safe
    0.20
     expectation
    0.19
     likelihood
    0.19
     Stretch
    0.18
     expecting
    0.18
     assume
    0.17
     likely
    0.17
    Act Density 0.097%

    No Known Activations