INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     aside
    -1.13
    aside
    -1.11
     auroit
    -1.02
     feroit
    -0.98
     Aside
    -0.91
     étoit
    -0.86
     Monfieur
    -0.86
     zelve
    -0.85
    berdayakan
    -0.84
     ainfi
    -0.84
    POSITIVE LOGITS
    ly
    0.54
     le
    0.54
     «
    0.53
     te
    0.49
    ле
    0.48
    mm
    0.47
    z
    0.47
     bel
    0.47
     "
    0.47
    هم
    0.47
    Act Density 0.105%

    No Known Activations