INDEX
    Explanations

    phrases that indicate excellence or superiority

    New Auto-Interp
    Negative Logits
    sson
    -0.15
    uster
    -0.15
    union
    -0.15
    stell
    -0.14
    oust
    -0.14
    ustom
    -0.14
    \Component
    -0.14
    antan
    -0.14
    atron
    -0.14
    ayla
    -0.14
    POSITIVE LOGITS
    lia
    0.16
    rif
    0.15
    ãģŁãģı
    0.14
    otp
    0.14
    iales
    0.14
    elite
    0.14
     Nev
    0.14
    rane
    0.14
    alia
    0.14
    rides
    0.13
    Act Density 0.023%

    No Known Activations