INDEX
    Explanations

    instances of the word "who" in various contexts

    New Auto-Interp
    Negative Logits
    ting
    -0.17
    ented
    -0.17
    ning
    -0.16
    ty
    -0.16
    ural
    -0.15
    cola
    -0.15
    smarty
    -0.15
    nox
    -0.15
    ng
    -0.15
    colo
    -0.15
    POSITIVE LOGITS
     else
    0.25
    ever
    0.17
     Else
    0.17
    opup
    0.16
    oping
    0.16
    ãĥªãĥ³
    0.16
    osh
    0.15
     ELSE
    0.15
    	else
    0.15
    ensch
    0.15
    Act Density 0.040%

    No Known Activations