INDEX
    Explanations

    person, pronoun, or article

    New Auto-Interp
    Negative Logits
    י
    1.49
    ي
    1.30
    i
    1.13
    ו
    1.05
    0.90
    ीय
    0.88
    ில்
    0.85
    يته
    0.84
    0.82
    ി
    0.82
    POSITIVE LOGITS
    v
    0.78
     are
    0.76
    </h6>
    0.71
    <0x0D>
    0.63
    </h2>
    0.63
    h
    0.61
    0.59
     ibang
    0.58
    </h4>
    0.56
    </h3>
    0.54
    Act Density 0.028%

    No Known Activations