INDEX
    Explanations

    leading to or especially

    New Auto-Interp
    Negative Logits
     nhưng
    0.47
     spindles
    0.45
     없고
    0.44
     αλλά
    0.43
    或者是
    0.42
    ırd
    0.41
     democracies
    0.40
     tämä
    0.40
     faisons
    0.40
    ä
    0.40
    POSITIVE LOGITS
    H
    0.59
    K
    0.52
    ↵↵↵
    0.47
    P
    0.45
    ↵↵↵↵
    0.44
    W
    0.44
    is
    0.43
    S
    0.43
    Y
    0.43
    V
    0.43
    Act Density 0.024%

    No Known Activations