INDEX
    Explanations

    phrases related to changes over time

    New Auto-Interp
    Negative Logits
    umbn
    -0.90
    iris
    -0.63
    pired
    -0.62
     oath
    -0.60
     hypocr
    -0.60
    »Ĵ
    -0.57
    Ĥ¬
    -0.56
     pione
    -0.55
    ibles
    -0.55
     redes
    -0.54
    POSITIVE LOGITS
     thanks
    1.12
     due
    1.02
     owing
    1.02
    thanks
    0.95
     compared
    0.91
     because
    0.88
    due
    0.85
    because
    0.82
    ecause
    0.79
     despite
    0.77
    Act Density 0.350%

    No Known Activations