INDEX
    Explanations

    phrases expressing knowledge or awareness

    phrases indicating familiarity or prior knowledge

    New Auto-Interp
    Negative Logits
    oreal
    -0.73
    xa
    -0.71
    rontal
    -0.71
    vati
    -0.68
    oyer
    -0.67
    orthy
    -0.66
    ongevity
    -0.65
    cific
    -0.64
    iterranean
    -0.63
    foreseen
    -0.63
    POSITIVE LOGITS
     already
    0.73
     by
    0.68
     yourselves
    0.68
     me
    0.67
     BY
    0.66
    76561
    0.60
    tale
    0.60
     guessed
    0.58
     about
    0.56
     that
    0.56
    Act Density 0.128%

    No Known Activations