INDEX
    Explanations

    phrases indicating a method or manner of doing something

    New Auto-Interp
    Negative Logits
     pleaſure
    -1.34
     myſelf
    -1.16
     ſche
    -1.13
     themſelves
    -1.12
     raiſ
    -1.12
     faſt
    -1.12
     ſta
    -1.10
     poffible
    -1.09
     becauſe
    -1.05
     ſever
    -1.05
    POSITIVE LOGITS
     that
    0.87
     is
    0.64
     can
    0.63
     may
    0.60
     a
    0.58
    ,
    0.56
    </h2>
    0.53
     was
    0.52
     made
    0.51
     cela
    0.51
    Act Density 0.028%

    No Known Activations