INDEX
    Explanations

    instances of the word "that" used in various contexts

    New Auto-Interp
    Negative Logits
     Means
    -0.20
    Means
    -0.18
    pery
    -0.17
    ãĤ¤ãĥ¤
    -0.17
    _means
    -0.17
    means
    -0.16
    rud
    -0.15
    ứng
    -0.15
     means
    -0.15
    deaux
    -0.14
    POSITIVE LOGITS
     way
    0.31
    away
    0.29
    aways
    0.24
     direction
    0.23
     away
    0.21
    -away
    0.19
    -a
    0.19
    -way
    0.19
     why
    0.18
    Away
    0.17
    Act Density 0.014%

    No Known Activations