INDEX
    Explanations

    statements that highlight misconceptions and assumptions about societal issues or beliefs

    New Auto-Interp
    Negative Logits
    IntoConstraints
    -0.57
     Numerade
    -0.57
     للاسماء
    -0.57
    -0.54
     EAT
    -0.53
    ConstraintMaker
    -0.52
     surla
    -0.51
    IVEREF
    -0.51
    onded
    -0.50
    SharedCtor
    -0.50
    POSITIVE LOGITS
     vectorielle
    0.46
     oculta
    0.39
     gärna
    0.38
     wrongly
    0.36
     bolsillos
    0.36
     falsos
    0.36
    berdayakan
    0.36
     simplesmente
    0.35
     tatuajes
    0.35
     simplement
    0.35
    Act Density 0.402%

    No Known Activations