INDEX
Explanations
references to parody and humor related to entertainment
New Auto-Interp
Negative Logits
Trim
-0.16
avou
-0.15
TL
-0.15
agas
-0.14
anten
-0.14
ensburg
-0.14
Forever
-0.14
agate
-0.14
Ris
-0.14
andan
-0.14
POSITIVE LOGITS
847
0.15
(~(
0.15
ysi
0.15
aur
0.15
epar
0.15
851
0.14
enga
0.14
ock
0.14
ossa
0.14
Transparent
0.14
Activations Density 0.311%