INDEX
    Explanations

    phrases indicating potential and expectations related to future actions or events

    New Auto-Interp
    Negative Logits
    otch
    -0.15
    /stdc
    -0.15
    ught
    -0.15
    ozy
    -0.14
    ecided
    -0.14
    $MESS
    -0.14
    redients
    -0.14
    urette
    -0.14
    issance
    -0.14
    ddit
    -0.14
    POSITIVE LOGITS
     never
    0.93
    never
    0.81
     Never
    0.81
    Never
    0.76
     NEVER
    0.75
     nunca
    0.69
     никогда
    0.60
     jamais
    0.58
     nikdy
    0.54
    .Never
    0.51
    Act Density 0.147%

    No Known Activations