INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    .
    1.29
    1.28
    1.02
    ונה
    1.01
    ↵↵
    0.93
    de
    0.88
     jonka
    0.86
     can
    0.85
     concerne
    0.84
    god
    0.82
    POSITIVE LOGITS
    R
    1.76
    D
    1.68
    G
    1.61
    S
    1.50
    H
    1.50
    U
    1.43
    B
    1.42
    L
    1.41
    P
    1.40
    C
    1.35
    Act Density 0.000%

    No Known Activations