INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     raiſ
    -0.97
     chofe
    -0.97
     poffible
    -0.92
     myſelf
    -0.89
     deſt
    -0.89
     caufe
    -0.87
     becauſe
    -0.85
     doubtnut
    -0.85
     themſelves
    -0.85
     subsidi
    -0.85
    POSITIVE LOGITS
    s
    1.62
     s
    1.11
    󠁿
    0.83
    ’,
    0.83
    0.76
    ’.
    0.76
    0.74
    ’)
    0.74
    ンの
    0.74
    €™
    0.73
    Act Density 0.191%

    No Known Activations