INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
     ―――――
    -1.40
     itſelf
    -1.28
     myſelf
    -1.23
     ་་
    -1.22
     iſt
    -1.21
    <bos>
    -1.19
     ſche
    -1.19
     auffi
    -1.16
     againſt
    -1.16
    ſelf
    -1.14
    POSITIVE LOGITS
    1.58
     $
    0.83
     "
    0.83
     K
    0.79
     C
    0.77
     T
    0.77
     D
    0.75
     “
    0.75
     S
    0.74
     M
    0.73
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.