INDEX
    Explanations

    statements related to public figures or public discourse

    New Auto-Interp
    Negative Logits
    +#+#
    -1.05
     myſelf
    -0.94
    principalColumn
    -0.94
     ſeveral
    -0.89
    AutoScaleMode
    -0.88
     Monfieur
    -0.86
    Diweddarwch
    -0.86
     ſmall
    -0.86
     esternos
    -0.85
    GEBURTS
    -0.84
    POSITIVE LOGITS
     $
    0.40
    いわ
    0.39
     dimenti
    0.39
     last
    0.38
    ->
    0.37
     explained
    0.36
    مئ
    0.36
     note
    0.36
    ev
    0.35
     …
    0.35
    Act Density 0.142%

    No Known Activations