INDEX
Explanations
mentions of prestigious accolades or institutions
references to prestigious awards, institutions, or accomplishments
New Auto-Interp
Negative Logits
Phones
-0.67
Twe
-0.65
Laws
-0.65
ter
-0.64
eper
-0.64
hy
-0.63
ogen
-0.62
alone
-0.62
Bundy
-0.62
harm
-0.62
POSITIVE LOGITS
accol
0.97
prestigious
0.96
honors
0.90
awards
0.90
adolesc
0.83
mosqu
0.82
institution
0.80
citiz
0.78
wcs
0.76
satell
0.76
Activations Density 0.010%