INDEX
    Explanations

    instances of hedging or tentative language, often using phrases like "of course."

    New Auto-Interp
    Negative Logits
     himſelf
    -0.91
     Efq
    -0.91
     myſelf
    -0.88
     pleaſure
    -0.85
    ſelf
    -0.83
     itſelf
    -0.81
     themſelves
    -0.80
     Chriftian
    -0.77
     Majefty
    -0.74
    ſelves
    -0.71
    POSITIVE LOGITS
    Somehow
    0.84
    Luckily
    0.82
     it
    0.79
    Maybe
    0.78
     Somehow
    0.76
    Surely
    0.74
     Luckily
    0.71
     if
    0.71
     оригіналу
    0.71
     Maybe
    0.70
    Act Density 0.329%

    No Known Activations