INDEX
Explanations
references to fraternal and sororal organizations
New Auto-Interp
Negative Logits
ubu
-0.17
yte
-0.15
roc
-0.14
ÃĹ↵↵
-0.14
ube
-0.14
atten
-0.14
alace
-0.13
ubes
-0.13
IDTH
-0.13
åıĸãĤĬ
-0.13
POSITIVE LOGITS
Sigma
0.38
Gamma
0.37
Om
0.36
Lambda
0.36
Mu
0.34
Delta
0.34
Pi
0.33
Phi
0.32
Chi
0.32
Theta
0.31
Activations Density 0.087%