INDEX
    Explanations

    phrases that express preference or comparison

    New Auto-Interp
    Negative Logits
     myſelf
    -0.93
     ſeveral
    -0.90
     purpoſe
    -0.88
     ſtate
    -0.86
     ſever
    -0.85
     houſe
    -0.81
     fevere
    -0.81
     himſelf
    -0.81
     uſed
    -0.78
     reaſon
    -0.77
    POSITIVE LOGITS
     than
    0.93
    而非
    0.89
    而不是
    0.83
     rather
    0.78
     niż
    0.77
     THAN
    0.72
     وليس
    0.69
     bukan
    0.67
     فريبيس
    0.67
    колко
    0.66
    Act Density 0.151%

    No Known Activations