INDEX
Explanations
words related to achieving a high status or rank
phrases associated with social commentary and critique
New Auto-Interp
Negative Logits
ANE
-0.81
7601
-0.75
AFB
-0.72
Rhodes
-0.72
$.
-0.71
uary
-0.69
kefeller
-0.68
$,
-0.68
Mike
-0.67
Pegasus
-0.66
POSITIVE LOGITS
âĢ
2.68
âĢ
1.82
â
1.45
**
1.42
âĸº
1.39
ãĢ
1.38
âĶ
1.37
âĢł
1.32
Å
1.32
¶
1.29
Activations Density 3.393%