INDEX
    Explanations

    comparisons and contrasts between subjects

    New Auto-Interp
    Negative Logits
    ?,?,?,?,
    -0.23
    $MESS
    -0.15
    Seven
    -0.15
    جات
    -0.14
    ?,?,
    -0.14
    hiba
    -0.14
     MANY
    -0.13
    -many
    -0.13
    ικη
    -0.13
    خرÙī
    -0.13
    POSITIVE LOGITS
     two
    1.10
    two
    0.93
    两个
    0.82
     Two
    0.80
    Two
    0.78
     TWO
    0.78
    _two
    0.76
    -two
    0.74
    两
    0.73
     zwei
    0.72
    Act Density 0.572%

    No Known Activations