INDEX
    Explanations

    the word "historic" and words that indicate success

    New Auto-Interp
    Negative Logits
     and
    -0.78
    -0.63
     (
    -0.63
     of
    -0.60
    ,
    -0.59
     A
    -0.57
     O
    -0.57
     but
    -0.57
     in
    -0.57
    ↵↵
    -0.57
    POSITIVE LOGITS
     Theſe
    1.35
     ainfi
    1.29
    \{\\
    1.23
     étoit
    1.22
     auffi
    1.20
     Monfieur
    1.20
     Efq
    1.19
     étoient
    1.16
     avoient
    1.10
     متعلقه
    1.09
    Act Density 1.161%

    No Known Activations