INDEX
    Explanations

    expressions of belief or opinion

    New Auto-Interp
    Negative Logits
    ughter
    -0.18
    ughters
    -0.17
    ν
    -0.17
    ein
    -0.17
    vig
    -0.16
    benh
    -0.16
    ez
    -0.16
    udy
    -0.15
    gew
    -0.15
    cono
    -0.15
    POSITIVE LOGITS
     twice
    0.29
     Twice
    0.26
     about
    0.23
    -about
    0.22
    fully
    0.19
     differently
    0.19
     alike
    0.19
    _about
    0.19
    lessly
    0.19
    tank
    0.18
    Act Density 0.070%

    No Known Activations