INDEX
    Explanations

    phrases or words containing 'ew'

    instances of a specific term related to a phenomenon or concept

    New Auto-Interp
    Negative Logits
    REDACTED
    -0.66
     cort
    -0.65
     unarmed
    -0.65
     suspic
    -0.64
     administr
    -0.59
     anat
    -0.58
     retri
    -0.58
     inhibition
    -0.58
    Downloadha
    -0.58
     apprehension
    -0.57
    POSITIVE LOGITS
    estern
    1.19
    een
    1.17
    riter
    1.07
    esley
    1.04
    esome
    1.04
    ITNESS
    1.03
    sburg
    0.97
    alker
    0.93
    ritten
    0.92
    eh
    0.91
    Act Density 0.016%

    No Known Activations