INDEX
    Explanations

    references to fraternal and sororal organizations

    New Auto-Interp
    Negative Logits
    ubu
    -0.17
    yte
    -0.15
    roc
    -0.14
    ÃĹ↵↵
    -0.14
    ube
    -0.14
    atten
    -0.14
    alace
    -0.13
    ubes
    -0.13
    IDTH
    -0.13
    åıĸãĤĬ
    -0.13
    POSITIVE LOGITS
     Sigma
    0.38
     Gamma
    0.37
     Om
    0.36
     Lambda
    0.36
     Mu
    0.34
     Delta
    0.34
     Pi
    0.33
     Phi
    0.32
     Chi
    0.32
     Theta
    0.31
    Act Density 0.087%

    No Known Activations