INDEX
    Explanations

    references to weakness or fragility in various contexts

    New Auto-Interp
    Negative Logits
    rike
    -0.17
    hone
    -0.16
    íĨµìĭł
    -0.16
    ingham
    -0.16
    êt
    -0.15
    ãĥ³ãĤ°
    -0.15
    rika
    -0.15
    asca
    -0.15
    asu
    -0.15
    Ậ
    -0.15
    POSITIVE LOGITS
    å¼±
    0.27
     weak
    0.25
     Weak
    0.24
    weak
    0.24
    Weak
    0.24
     Ñģлаб
    0.23
     weakest
    0.22
     weaker
    0.21
    -strong
    0.18
    ly
    0.18
    Act Density 0.029%

    No Known Activations