INDEX
    Explanations

    phrases indicating alignment or conformity

    phrases indicating alignment or conformity to standards or rules

    New Auto-Interp
    Negative Logits
     livest
    -0.65
     Tacoma
    -0.61
    iens
    -0.59
    odor
    -0.59
     itch
    -0.58
    icides
    -0.58
     sacrific
    -0.57
     laun
    -0.57
    vertisements
    -0.56
     strugg
    -0.56
    POSITIVE LOGITS
     with
    0.75
    arity
    0.72
    anthrop
    0.68
     vein
    0.67
    With
    0.67
     favour
    0.66
    omsky
    0.63
    cise
    0.62
     WITH
    0.62
    llah
    0.62
    Act Density 0.053%

    No Known Activations