INDEX
    Explanations

    occurrences of the word "who" in various contexts

    New Auto-Interp
    Negative Logits
    robat
    -0.18
    rogram
    -0.17
    ented
    -0.16
     Darling
    -0.16
    uers
    -0.15
    umer
    -0.15
    бе
    -0.14
    _stuff
    -0.14
    urge
    -0.14
    ipher
    -0.14
    POSITIVE LOGITS
     else
    0.26
     wouldn
    0.24
    ops
    0.24
    ever
    0.22
     ever
    0.21
    osh
    0.18
    	else
    0.18
    opi
    0.18
    -ever
    0.17
    op
    0.17
    Act Density 0.020%

    No Known Activations