INDEX
    Explanations

    "his" followed by possessive noun

    New Auto-Interp
    Negative Logits
     in
    1.38
    1.27
    ],
    1.06
     sensit
    1.05
    ได้
    1.05
    ")
    1.01
    <0x91>
    0.98
    0.98
     by
    0.98
     hypothes
    0.96
    POSITIVE LOGITS
    ן
    1.91
    ו
    1.52
    1.49
    ا
    1.36
    his
    1.31
    ार
    1.28
    .
    1.28
    ם
    1.23
     his
    1.20
    1.20
    Act Density 0.040%

    No Known Activations