INDEX
    Explanations

    terms related to exploitation and its various forms

    New Auto-Interp
    Negative Logits
    <pad>
    -0.77
    <unused41>
    -0.76
    <unused14>
    -0.76
    <unused43>
    -0.76
    <unused17>
    -0.76
    <unused42>
    -0.76
    <unused79>
    -0.76
    <unused51>
    -0.76
    <unused47>
    -0.76
    [@BOS@]
    -0.75
    POSITIVE LOGITS
    Emily
    0.52
    MenuItem
    0.51
     Emily
    0.45
    VersionUID
    0.44
    Nutrient
    0.41
     exploit
    0.40
     Ż
    0.38
     bArr
    0.37
     emily
    0.37
     nutrient
    0.37
    Act Density 0.235%

    No Known Activations