INDEX
Explanations
mentions of "AA" (American Association) followed by a mix of different words
references to the American Association of University Professors (AAUP) and associated ratings
New Auto-Interp
Negative Logits
Jenner
-0.87
Canaver
-0.72
ledge
-0.71
lov
-0.69
theless
-0.68
ician
-0.67
polit
-0.67
pheus
-0.66
sticks
-0.65
ophers
-0.65
POSITIVE LOGITS
BILITIES
0.93
BILITY
0.92
zona
0.87
BIL
0.86
HHHH
0.85
BA
0.83
HR
0.83
VE
0.82
illac
0.82
HAHAHAHA
0.82
Activations Density 0.013%