INDEX
    Explanations

    phrases or contexts indicating something is beneath a certain standard or level

    New Auto-Interp
    Negative Logits
    self
    -0.24
    ever
    -0.23
    time
    -0.21
    ology
    -0.20
    gether
    -0.20
    ical
    -0.20
    wide
    -0.20
    plier
    -0.19
    scopic
    -0.19
    icient
    -0.19
    POSITIVE LOGITS
    pin
    0.35
    whelming
    0.33
    lining
    0.31
    pins
    0.30
    lined
    0.30
    lay
    0.29
    whel
    0.28
    privileged
    0.27
    lies
    0.26
    util
    0.26
    Act Density 0.016%

    No Known Activations