INDEX
    Explanations

    statements made by individuals

    New Auto-Interp
    Negative Logits
    inary
    -0.71
    rats
    -0.64
    nerg
    -0.62
     caliber
    -0.62
    atible
    -0.62
    Justice
    -0.61
    OTUS
    -0.61
     totality
    -0.59
    swick
    -0.59
     Organization
    -0.57
    POSITIVE LOGITS
    hiba
    0.81
    ansky
    0.80
    ynthesis
    0.77
    ometimes
    0.75
    advertisement
    0.74
    omething
    0.71
    kowski
    0.68
     confidently
    0.67
     rhet
    0.67
    é¾
    0.67
    Act Density 0.043%

    No Known Activations