INDEX
    Explanations

    familiarize

    New Auto-Interp
    Negative Logits
     Efq
    -1.45
     pleaſure
    -1.41
     myſelf
    -1.40
     Theſe
    -1.38
     itſelf
    -1.37
    ſelf
    -1.35
     purpoſe
    -1.33
     ſche
    -1.30
     ſtate
    -1.30
     houſe
    -1.30
    POSITIVE LOGITS
    0.70
     (
    0.67
    '
    0.66
     in
    0.64
     W
    0.63
    ,
    0.63
     S
    0.60
     [
    0.60
     O
    0.58
    0.58
    Act Density 0.114%

    No Known Activations