INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    0
    0.88
    ،
    0.88
    0.87
    0.82
    <0x80>
    0.79
     are
    0.68
    0.68
    0.65
    𝟎
    0.65
    ра
    0.64
    POSITIVE LOGITS
    ה
    0.96
    a
    0.96
    ه
    0.76
    0.74
     Connections
    0.72
     connections
    0.69
    -
    0.68
    connections
    0.67
    로부터
    0.66
    0.65
    Act Density 0.017%

    No Known Activations