INDEX
    Explanations

    the word "weak" and words that mean the opposite of weak

    New Auto-Interp
    Negative Logits
     myſelf
    -1.63
     Monfieur
    -1.59
     itſelf
    -1.58
     متعلقه
    -1.52
     ―――――
    -1.51
     ſeveral
    -1.41
     pleaſure
    -1.40
     Reſ
    -1.37
    +#+#
    -1.36
     faſt
    -1.34
    POSITIVE LOGITS
     A
    0.82
     "
    0.81
     P
    0.80
     T
    0.79
    0.76
     H
    0.75
     v
    0.75
     I
    0.73
     M
    0.73
     m
    0.72
    Act Density 3.495%

    No Known Activations