INDEX
Explanations
references to significant or notable individuals and their contributions
New Auto-Interp
Negative Logits
(
-0.33
...(
-0.30
....
-0.25
(...
-0.24
--
-0.24
(&
-0.24
---
-0.24
...
-0.24
(~
-0.23
-
-0.22
POSITIVE LOGITS
celebrity
0.42
Celebrity
0.40
Celebr
0.27
celebrities
0.25
cele
0.23
—
0.23
cele
0.23
Cele
0.22
”.↵
0.20
—↵
0.20
Activations Density 0.016%