INDEX
    Explanations

    the word "who" to identify subjects or individuals in various contexts

    New Auto-Interp
    Negative Logits
    rogram
    -0.16
    ningen
    -0.16
    abd
    -0.15
    алеж
    -0.15
    woo
    -0.14
    ault
    -0.14
    uet
    -0.14
    ted
    -0.14
    rov
    -0.13
    spi
    -0.13
    POSITIVE LOGITS
     else
    0.35
    _else
    0.23
     ELSE
    0.22
    soever
    0.21
     exactly
    0.21
    /how
    0.20
     Else
    0.20
    else
    0.19
    opi
    0.18
    	else
    0.18
    Act Density 0.029%

    No Known Activations