INDEX
    Explanations

    adjectives related to weakness or vulnerability

    New Auto-Interp
    Negative Logits
     finn
    -0.59
     gild
    -0.58
     zyn
    -0.58
     inder
    -0.58
     oner
    -0.58
     lts
    -0.58
     ?...
    -0.57
     Gies
    -0.57
     embra
    -0.57
     mme
    -0.56
    POSITIVE LOGITS
     weak
    1.25
    weak
    1.19
     Weak
    1.19
    Weak
    1.10
     weakest
    1.03
     weaker
    1.02
     weaken
    1.01
     weakness
    0.98
     weakened
    0.94
     weakening
    0.88
    Act Density 0.065%

    No Known Activations