INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     and
    -1.91
     or
    -1.73
    年は
    -1.59
    以上です
    -1.59
    sitions
    -1.59
    That
    -1.57
    為主
    -1.53
    because
    -1.52
    although
    -1.52
    -1.51
    POSITIVE LOGITS
     både
    1.66
     sinn
    1.61
     vols
    1.51
     لعبة
    1.47
    H
    1.45
     بعضی
    1.38
    1.38
     hef
    1.38
    1.37
    (
    1.35
    Act Density 0.027%

    No Known Activations