INDEX
Explanations
alienation, anxiety, fragmentation, disorientation
New Auto-Interp
Negative Logits
biased
0.44
Bias
0.41
unfair
0.41
cough
0.41
bias
0.40
꿋
0.39
biased
0.37
unfairly
0.37
persevere
0.37
bias
0.36
POSITIVE LOGITS
alienation
1.02
atom
0.95
nihil
0.88
Atom
0.86
alienated
0.85
spiritual
0.83
disen
0.82
disorientation
0.80
fragmentation
0.80
individualism
0.80
Activations Density 0.029%