INDEX
    Explanations

    phrases that express uncertainty or conditionality

    New Auto-Interp
    Negative Logits
     Efq
    -0.83
    tvguidetime
    -0.81
     Shakspeare
    -0.80
     sandero
    -0.79
     myſelf
    -0.78
     itſelf
    -0.75
     Cæsar
    -0.74
     ་་
    -0.74
     Meiji
    -0.74
     Majefty
    -0.74
    POSITIVE LOGITS
     it
    1.15
     the
    1.05
     there
    1.03
     we
    0.85
     nobody
    0.81
     this
    0.80
     neither
    0.74
     they
    0.71
     I
    0.69
     everyone
    0.68
    Act Density 1.692%

    No Known Activations