INDEX
    Explanations

    topics related to social issues

    New Auto-Interp
    Negative Logits
    enegger
    -0.77
    ONSORED
    -0.65
    abwe
    -0.62
    ornings
    -0.61
    renheit
    -0.60
    itored
    -0.59
     tradem
    -0.58
    Lago
    -0.57
    \\\\\\\\
    -0.57
     conclud
    -0.55
    POSITIVE LOGITS
    iest
    0.87
     portion
    0.79
     aspect
    0.74
     fallacy
    0.74
     element
    0.73
     hypothesis
    0.71
    liest
    0.71
     axis
    0.70
     process
    0.68
    osphere
    0.65
    Act Density 0.702%

    No Known Activations