INDEX
    Explanations

    phrases indicating conditional situations or outcomes

    New Auto-Interp
    Negative Logits
     itſelf
    -1.07
     myſelf
    -1.01
     themſelves
    -0.96
     raiſ
    -0.95
     ſever
    -0.93
     ſeveral
    -0.92
     whoſe
    -0.91
     purpoſe
    -0.88
     ſtate
    -0.87
    IntoConstraints
    -0.86
    POSITIVE LOGITS
    :
    0.90
     :
    0.71
    namely
    0.64
     namely
    0.64
     is
    0.60
    0.59
     Namely
    0.58
    的是
    0.57
    那就是
    0.57
    0.54
    Act Density 0.683%

    No Known Activations