INDEX
    Explanations

    mentions of interrogation-related words and phrases

    terms related to interrogation methods and practices

    New Auto-Interp
    Negative Logits
    Offline
    -0.79
    ensical
    -0.73
    jri
    -0.71
    minecraft
    -0.70
    yright
    -0.69
    buy
    -0.68
    ouf
    -0.67
    erry
    -0.66
    WE
    -0.65
    cakes
    -0.65
    POSITIVE LOGITS
     interrogation
    1.01
     interrog
    0.99
     Techniques
    0.89
     interrogated
    0.85
     techniques
    0.84
     questioning
    0.68
     probing
    0.68
     sessions
    0.67
    atories
    0.67
     coercive
    0.67
    Act Density 0.018%

    No Known Activations