INDEX
Explanations
mentions of universities, debates, and academic discussions
New Auto-Interp
Negative Logits
EStream
-0.86
hower
-0.75
glers
-0.70
ptive
-0.69
mits
-0.67
hyde
-0.67
hound
-0.66
dylib
-0.64
pas
-0.63
metry
-0.63
POSITIVE LOGITS
REAM
1.30
UFF
1.15
OCK
1.13
DERR
1.13
AGE
1.11
RE
1.05
ALK
1.03
ACK
1.00
AY
0.99
alker
0.98
Activations Density 0.014%