INDEX
    Explanations

    references to the pronoun "who" indicating inquiries about identity

    New Auto-Interp
    Negative Logits
    ting
    -0.19
    ikip
    -0.15
    ration
    -0.15
    hm
    -0.14
    ault
    -0.14
     Kendall
    -0.14
    vas
    -0.14
     Arbeit
    -0.13
     tube
    -0.13
    elman
    -0.13
    POSITIVE LOGITS
     else
    0.20
    ugo
    0.16
    afen
    0.15
    RLF
    0.15
    ategorical
    0.15
    âĢĮاÙĨبار
    0.15
    etooth
    0.15
    afe
    0.15
    ÑĻ
    0.14
    overe
    0.14
    Act Density 0.023%

    No Known Activations