INDEX
    Explanations

    the word "who" and its variations, indicating a focus on identity or inquiry about individuals

    New Auto-Interp
    Negative Logits
    mented
    -0.19
    ning
    -0.18
    cola
    -0.18
    onte
    -0.18
    roi
    -0.17
    ces
    -0.16
    殿
    -0.16
     sona
    -0.16
    illos
    -0.16
    uros
    -0.16
    POSITIVE LOGITS
    osh
    0.24
     else
    0.24
    oping
    0.23
    opi
    0.21
    ops
    0.21
     Else
    0.19
    opsy
    0.18
    ever
    0.18
    opers
    0.17
    soever
    0.17
    Act Density 0.030%

    No Known Activations