INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
     ―――――
    -1.12
     iſt
    -1.05
     pleaſure
    -1.04
     auffi
    -1.02
    ſelf
    -1.01
     ſche
    -1.00
     faſt
    -0.98
     dieß
    -0.97
     againſt
    -0.96
     myſelf
    -0.95
    POSITIVE LOGITS
    1.67
    mathrm
    0.81
     $
    0.75
     K
    0.67
     T
    0.65
     S
    0.65
    0.65
     B
    0.65
      (
    0.65
     D
    0.65
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.