INDEX
    Explanations

    questions starting with "Who"

    instances of the word "Who"

    New Auto-Interp
    Negative Logits
    MER
    -0.78
    PORT
    -0.67
     Pilgrim
    -0.63
     Hyde
    -0.62
    mun
    -0.58
    outer
    -0.58
     compatibility
    -0.57
     readiness
    -0.57
     relaxation
    -0.57
    rog
    -0.56
    POSITIVE LOGITS
    soever
    1.24
    ever
    1.09
    oping
    1.05
    abouts
    1.03
     else
    0.97
     cares
    0.96
    oped
    0.91
     knows
    0.90
    ileaks
    0.80
     cared
    0.79
    Act Density 0.092%

    No Known Activations