INDEX
Explanations
mentions of celebrities and notable figures in the context of events or collaborations
New Auto-Interp
Negative Logits
Roose
-0.15
-ok
-0.14
Flynn
-0.14
_trajectory
-0.13
advanced
-0.13
lius
-0.13
izzer
-0.13
Freel
-0.13
Gutenberg
-0.13
gard
-0.13
POSITIVE LOGITS
legend
0.19
strup
0.17
legends
0.17
actor
0.17
actress
0.16
astronaut
0.16
lumin
0.16
ex
0.16
odo
0.15
Sir
0.15
Activations Density 0.285%