INDEX
    Explanations

    terms related to sensitivity and how it affects various contexts

    New Auto-Interp
    Negative Logits
     divertimento
    -0.68
    :✨
    -0.65
     lot
    -0.62
     Evers
    -0.62
    gl
    -0.60
    кто
    -0.60
     braw
    -0.59
     felizes
    -0.59
     posta
    -0.59
    weilen
    -0.58
    POSITIVE LOGITS
     sensitive
    1.48
     Sensitive
    1.47
    Sensitive
    1.39
     sensi
    1.32
     vulnerability
    1.32
     sensitivity
    1.32
    sensitive
    1.28
     sensitivities
    1.28
     vulnerabilities
    1.20
     Sensitivity
    1.19
    Act Density 0.260%

    No Known Activations