INDEX
    Explanations

    phrases expressing varying degrees of comparison

    New Auto-Interp
    Negative Logits
    pps
    -0.16
    kind
    -0.15
     bbw
    -0.14
     Scho
    -0.14
    è¨İ
    -0.14
    gres
    -0.14
    _lazy
    -0.14
     tighter
    -0.14
    iaux
    -0.14
    ess
    -0.13
    POSITIVE LOGITS
    acic
    0.16
    eyh
    0.15
    atta
    0.15
     Trou
    0.14
    imli
    0.14
    terior
    0.14
    ules
    0.14
    kee
    0.14
    urr
    0.14
    .CONTENT
    0.14
    Act Density 0.054%

    No Known Activations