INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    つまり
    -1.13
    conian
    -1.03
    あまり
    -0.96
    -0.95
    rola
    -0.94
     the
    -0.92
    でしたが
    -0.92
    しかも
    -0.92
     perhaps
    -0.91
    jenie
    -0.91
    POSITIVE LOGITS
     same
    3.75
     же
    1.84
     gleichen
    1.60
     stessa
    1.48
     stesso
    1.33
     mesma
    1.33
     gleiche
    1.31
     ίδ
    1.28
     stesse
    1.25
     mismos
    1.23
    Act Density 0.052%

    No Known Activations