INDEX
    Explanations

    expressions of positivity and excellence

    New Auto-Interp
    Negative Logits
    Def
    -0.66
    IsContent
    -0.63
     Paras
    -0.63
     Mona
    -0.61
    Datuak
    -0.59
    paras
    -0.59
    thalene
    -0.59
     Prey
    -0.58
     Marissa
    -0.57
     góry
    -0.56
    POSITIVE LOGITS
    Wonderful
    1.07
     Wonderful
    1.03
    Terrible
    0.89
     WONDER
    0.89
     terrible
    0.84
    .}~\
    0.84
    terrible
    0.83
     Terrible
    0.82
    rible
    0.81
    wonderful
    0.80
    Act Density 0.012%

    No Known Activations