INDEX
    Explanations

    the word "who" in various contexts

    New Auto-Interp
    Negative Logits
    robat
    -0.18
    mented
    -0.17
    cede
    -0.16
    cola
    -0.16
    roi
    -0.16
    алеж
    -0.16
    illos
    -0.16
    ning
    -0.15
    bian
    -0.15
    ÑĥÑĢÑĥ
    -0.15
    POSITIVE LOGITS
     else
    0.31
    ops
    0.25
    oping
    0.25
    osh
    0.24
    ever
    0.22
    opi
    0.21
     Else
    0.21
     ELSE
    0.20
    opsy
    0.20
    _else
    0.19
    Act Density 0.032%

    No Known Activations