INDEX
    Explanations

    phrases indicating future actions or intentions

    New Auto-Interp
    Negative Logits
     never
    -0.23
    never
    -0.22
     NEVER
    -0.19
     nunca
    -0.18
     Never
    -0.17
     никогда
    -0.16
    already
    -0.15
    ral
    -0.15
     already
    -0.15
       
    -0.15
    POSITIVE LOGITS
     need
    0.24
     hell
    0.24
     town
    0.21
     bed
    0.21
    need
    0.21
     Hell
    0.21
     be
    0.21
    iams
    0.20
     jail
    0.20
    hell
    0.19
    Act Density 0.037%

    No Known Activations