INDEX
    Explanations

    expressions of desire and requests for action

    New Auto-Interp
    Negative Logits
    UnusedPrivate
    -0.73
    Personendaten
    -0.72
     numerus
    -0.65
    ]))
    
    -0.61
     lenker
    -0.60
    '')
    -0.58
    (!__
    -0.58
    ural
    -0.57
    (""))
    -0.57
     '*')
    -0.57
    POSITIVE LOGITS
     revenge
    0.69
     quit
    0.65
     retali
    0.65
    HasForeignKey
    0.63
     fight
    0.61
     hurry
    0.59
     rejoin
    0.58
    revenge
    0.58
     stop
    0.57
     confess
    0.56
    Act Density 0.317%

    No Known Activations