INDEX
    Explanations

    expressions of hope or assistance

    New Auto-Interp
    Negative Logits
     blinking
    -0.17
    خر
    -0.16
    ÑģÑıÑĤ
    -0.16
     consist
    -0.15
    onder
    -0.14
    nier
    -0.14
    оÑĢоÑĤ
    -0.13
    rone
    -0.13
    ides
    -0.13
     just
    -0.13
    POSITIVE LOGITS
     helped
    0.26
     helps
    0.25
    help
    0.23
     Helps
    0.23
     helpful
    0.21
    	help
    0.20
     help
    0.19
    Help
    0.19
     помогаеÑĤ
    0.18
     helping
    0.18
    Act Density 0.042%

    No Known Activations