INDEX
Explanations
phrases related to identity and self-worth
expressions of self-identity and personal struggle
New Auto-Interp
Negative Logits
pmwiki
-0.95
dispatch
-0.77
Locations
-0.75
osate
-0.75
refres
-0.74
Topics
-0.74
itures
-0.74
Utilities
-0.74
conclud
-0.73
rollout
-0.73
POSITIVE LOGITS
ashamed
1.19
born
1.13
afraid
1.09
gay
1.08
oppressed
1.07
alive
1.05
virtuous
1.03
homosexual
1.00
invincible
0.98
proud
0.98
Activations Density 0.325%