INDEX
    Explanations

    phrases indicating personal agency and choices in the context of social issues

    New Auto-Interp
    Negative Logits
    ()]
    
    -0.60
    ']))
    
    -0.60
    ()));
    
    -0.57
    "]);
    
    -0.55
    )}}
    -0.55
     },
    
    -0.52
     discul
    -0.51
    ))}
    -0.50
    les
    -0.50
     {}));
    -0.50
    POSITIVE LOGITS
     themselves
    1.32
     their
    1.15
    themselves
    1.11
    their
    0.99
    Their
    0.94
     Their
    0.85
     THEIR
    0.74
     ihre
    0.73
     kanilang
    0.72
     ihren
    0.72
    Act Density 0.503%

    No Known Activations