INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ་་
    -0.93
     myſelf
    -0.90
     themſelves
    -0.87
     ſind
    -0.86
     doubtnut
    -0.82
    脚注の使い方
    -0.81
     ―――――
    -0.80
     iſt
    -0.77
     houſe
    -0.76
     faſt
    -0.76
    POSITIVE LOGITS
    h
    1.42
     h
    1.33
    H
    0.99
     H
    0.95
     S
    0.83
    setH
    0.81
     d
    0.80
     s
    0.76
     V
    0.75
    𝙝
    0.73
    Act Density 0.081%

    No Known Activations