INDEX
    Explanations

    topics related to LGBTQ+ rights and experiences

    New Auto-Interp
    Negative Logits
    ...
    -0.25
     ...
    -0.24
    ...(
    -0.21
     ..."↵↵
    -0.20
    ..."↵↵
    -0.20
    ..."↵
    -0.20
     ..."↵
    -0.20
    ...,
    -0.19
    â̦”↵↵
    -0.19
     ..."
    -0.18
    POSITIVE LOGITS
    .â̦
    0.19
    .â̦↵↵
    0.19
    ,↵↵↵↵
    0.18
    .","
    0.17
    ãĢĤ↵↵↵↵
    0.16
    #af
    0.16
    .↵↵↵↵
    0.16
     #__
    0.16
     th
    0.15
    ó
    0.15
    Act Density 0.374%

    No Known Activations