INDEX
    Explanations

    phrases related to behavior and misconduct

    New Auto-Interp
    Negative Logits
     Solution
    -0.76
     Puzzles
    -0.74
     Arri
    -0.73
    cells
    -0.73
     Cells
    -0.71
    reader
    -0.69
     Qiao
    -0.68
     houses
    -0.66
    vae
    -0.66
    ropolis
    -0.66
    POSITIVE LOGITS
     unlawful
    1.07
     contrary
    1.03
     morally
    1.01
     unethical
    1.01
     lawful
    1.00
     inappropriate
    0.99
     violate
    0.97
     justified
    0.95
     eth
    0.93
    repre
    0.92
    Act Density 0.373%

    No Known Activations