INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     myſelf
    -1.49
    ]--;
    -1.47
     виправивши
    -1.39
     itſelf
    -1.34
    OGND
    -1.30
     Efq
    -1.27
     Monfieur
    -1.27
     himſelf
    -1.26
     auffi
    -1.22
     ſeveral
    -1.21
    POSITIVE LOGITS
    :
    1.19
    ,
    1.00
    !
    0.92
    0.77
     I
    0.75
    ?
    0.75
    0.73
     even
    0.72
     most
    0.72
     so
    0.71
    Act Density 0.196%

    No Known Activations