INDEX
    Explanations

    mathematical operations and variables used in expressions or equations

    New Auto-Interp
    Negative Logits
    +</
    -0.50
     him
    -0.47
    tight
    -0.41
    我也是
    -0.41
     jeg
    -0.41
     kel
    -0.41
    ÄT
    -0.40
    urop
    -0.40
    occa
    -0.40
     Seek
    -0.39
    POSITIVE LOGITS
     pleaſure
    0.99
     raiſ
    0.93
     cauſe
    0.92
     ſeveral
    0.90
     ſmall
    0.88
     uſed
    0.88
     Theſe
    0.86
     againſt
    0.84
     reaſon
    0.84
     itſelf
    0.84
    Act Density 0.011%

    No Known Activations