INDEX
    Explanations

    phrases including the word "as"

    New Auto-Interp
    Negative Logits
     itſelf
    -0.81
     Jefus
    -0.74
     pleaſure
    -0.71
     juſt
    -0.64
     ſever
    -0.63
     becauſe
    -0.63
    こと
    -0.63
     Anſ
    -0.62
     Conſ
    -0.62
    ſelf
    -0.61
    POSITIVE LOGITS
     follows
    1.11
     well
    1.08
     opposed
    1.08
     part
    1.02
     soon
    1.01
     a
    0.98
    follows
    0.95
    pires
    0.93
     far
    0.90
     much
    0.86
    Act Density 0.329%

    No Known Activations