INDEX
Explanations
references to clean energy or cleanliness
New Auto-Interp
Negative Logits
334
-0.15
agon
-0.15
go
-0.15
gay
-0.15
ately
-0.15
amins
-0.15
746
-0.14
Moz
-0.14
cales
-0.14
ylko
-0.14
POSITIVE LOGITS
slate
0.23
est
0.23
liness
0.23
-cut
0.22
(er
0.21
lier
0.18
-clean
0.17
liest
0.17
jh
0.17
conscience
0.17
Activations Density 0.018%