INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     αλλά
    -2.42
    тися
    -2.39
    -2.23
    та
    -2.17
    -2.17
    ва
    -2.14
    ܞ
    -2.13
     WITH
    -2.11
    varlak
    -2.11
    ρα
    -2.09
    POSITIVE LOGITS
    2.98
     was
    2.95
    2.67
    sandalia
    2.63
    2.52
    ープン
    2.45
    2.42
    2.41
    ðsíða
    2.41
    2.38
    Act Density 0.068%

    No Known Activations