INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     
    0.86
     πρώτη
    0.81
     a
    0.81
    td
    0.79
    idad
    0.76
    いた
    0.76
    don
    0.76
     ζωή
    0.74
    0.74
     päät
    0.73
    POSITIVE LOGITS
    1.29
    ש
    1.26
    ك
    1.16
    ות
    1.13
    ي
    1.12
    in
    1.11
    1.10
     in
    1.08
    াল
    1.04
    ER
    1.03
    Act Density 0.006%

    No Known Activations