INDEX
    Explanations

    instances of the word "strong" and its variations, indicating a focus on strength or robustness

    New Auto-Interp
    Negative Logits
    upaten
    -0.67
    atimes
    -0.64
    kaido
    -0.62
     Maus
    -0.62
    jub
    -0.61
    jima
    -0.60
     []:
    -0.60
     délib
    -0.60
     oblivion
    -0.59
     bliss
    -0.59
    POSITIVE LOGITS
    STRONG
    1.61
    strong
    1.60
    strength
    1.60
     Strong
    1.57
    Strong
    1.54
     STRONG
    1.49
     strength
    1.49
     Strength
    1.45
    Strength
    1.43
     strong
    1.43
    Act Density 0.097%

    No Known Activations