INDEX
Explanations
references to fossil fuel and its implications
New Auto-Interp
Negative Logits
gross
-0.15
ings
-0.15
er
-0.15
nown
-0.15
.fhir
-0.15
ht
-0.14
atori
-0.14
tures
-0.14
ement
-0.14
innen
-0.14
POSITIVE LOGITS
fuels
0.39
fuel
0.38
fuel
0.32
Fuel
0.30
Fu
0.29
çĩĥ
0.29
Fuel
0.29
fueled
0.26
ifer
0.22
ized
0.20
Activations Density 0.006%