INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     which
    -1.80
    which
    -1.41
     которая
    -0.97
     Which
    -0.93
    Which
    -0.90
     WHICH
    -0.87
     которое
    -0.83
     который
    -0.82
     которые
    -0.80
     والتي
    -0.79
    POSITIVE LOGITS
     purpoſe
    1.16
     Jefus
    1.11
     Efq
    1.11
     ſeveral
    1.09
     doubtnut
    1.08
     iſt
    1.05
     ſind
    1.05
     Monfieur
    1.05
    ſelf
    1.04
     myſelf
    1.04
    Act Density 0.089%

    No Known Activations