INDEX
    Explanations

    Escaping/being free

    New Auto-Interp
    Negative Logits
    -0.08
     Nutzung
    -0.07
     funzione
    -0.07
     insertion
    -0.07
    φα
    -0.07
    _function
    -0.07
    -0.07
    -0.07
     मात्रा
    -0.07
    Zero
    -0.07
    POSITIVE LOGITS
     jailbreak
    0.12
     fuga
    0.11
     libertad
    0.10
     escape
    0.10
     liberties
    0.10
     آزادی
    0.10
     onts
    0.10
     своб
    0.10
     getaway
    0.10
     fugit
    0.09
    Act Density 0.033%

    No Known Activations