INDEX
    Explanations

    negative constructions and expressions of doubt

    New Auto-Interp
    Negative Logits
    amil
    -0.18
    actus
    -0.15
    Ī
    -0.15
     Hoch
    -0.14
    amber
    -0.14
     Monkey
    -0.14
    ãģĦãĤĦ
    -0.14
    opers
    -0.14
    ino
    -0.13
    .getenv
    -0.13
    POSITIVE LOGITS
     mean
    0.32
    mean
    0.28
    Mean
    0.27
     Mean
    0.26
     necessarily
    0.25
     means
    0.24
    _mean
    0.24
    -mean
    0.23
     Means
    0.22
    .mean
    0.21
    Act Density 0.034%

    No Known Activations