INDEX
    Explanations

    references to physical objects or specific details within a larger context

    references to vulnerability and oppression

    New Auto-Interp
    Negative Logits
    amount
    -0.54
    SEA
    -0.51
     Prel
    -0.49
    inding
    -0.48
     Ori
    -0.48
    described
    -0.48
    uned
    -0.48
     Environment
    -0.46
     landowners
    -0.45
    shown
    -0.45
    POSITIVE LOGITS
     anymore
    1.12
     ;)
    0.96
    ?'
    0.94
    !",
    0.93
     haha
    0.93
     :-)
    0.92
    ?",
    0.90
    !'
    0.90
     someday
    0.89
     :)
    0.89
    Act Density 1.284%

    No Known Activations