INDEX
    Explanations

    fall victim, fall short, fall into

    New Auto-Interp
    Negative Logits
    𝚍
    0.70
    𝙚
    0.67
    τή
    0.66
    ي
    0.66
    ously
    0.63
    不满
    0.63
    рил
    0.61
    ه
    0.61
    𝐝
    0.61
    𝚐
    0.61
    POSITIVE LOGITS
    Falling
    1.09
     Falling
    1.01
     fall
    0.96
    Fall
    0.96
     falling
    0.95
     falls
    0.92
     fell
    0.91
    Falls
    0.91
     FALL
    0.84
     سقوط
    0.80
    Act Density 0.029%

    No Known Activations