INDEX
    Explanations

    words related to power dynamics and manipulation

    instances of numerical values or quantities and their implications in various contexts

    New Auto-Interp
    Negative Logits
    intend
    -0.71
    angler
    -0.67
    appropri
    -0.67
     rapt
    -0.63
     respons
    -0.63
    rament
    -0.63
     pastoral
    -0.61
     vanishing
    -0.60
     charm
    -0.59
    body
    -0.59
    POSITIVE LOGITS
     Explicit
    0.88
     Lastly
    0.76
    æĺ¯
    0.75
     Languages
    0.75
    ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
    0.74
     à¨
    0.74
    FORE
    0.71
    س
    0.69
    anguages
    0.68
    Disclaimer
    0.68
    Act Density 0.048%

    No Known Activations