INDEX
Explanations
mentions of fossil fuels
New Auto-Interp
Negative Logits
outh
-0.18
kip
-0.16
iot
-0.15
Tyson
-0.14
Princip
-0.14
_IGNORE
-0.14
andy
-0.14
ugged
-0.13
erval
-0.13
ement
-0.13
POSITIVE LOGITS
usic
0.18
omik
0.17
isten
0.16
ized
0.15
.bd
0.15
ICT
0.15
leta
0.14
regor
0.14
engu
0.14
_Panel
0.14
Activations Density 0.002%