INDEX
    Explanations

    references to the word "who" in various contexts

    New Auto-Interp
    Negative Logits
    robat
    -0.20
     Darling
    -0.17
    _PD
    -0.15
    çį
    -0.15
    ted
    -0.15
    mented
    -0.15
    aster
    -0.14
    bian
    -0.14
    ented
    -0.14
    nt
    -0.14
    POSITIVE LOGITS
     else
    0.30
    ops
    0.28
    ever
    0.27
    opi
    0.23
     am
    0.22
     ELSE
    0.22
    osh
    0.21
     Else
    0.21
    oping
    0.20
     needs
    0.20
    Act Density 0.024%

    No Known Activations