INDEX
Explanations
references to environmental concerns and their societal impacts
New Auto-Interp
Negative Logits
alse
-0.20
hell
-0.16
arse
-0.15
çĵľ
-0.15
besides
-0.15
rase
-0.15
prov
-0.15
estruct
-0.14
heck
-0.14
ëłµ
-0.14
POSITIVE LOGITS
specifically
0.26
Specifically
0.23
which
0.21
specific
0.17
which
0.16
ohl
0.14
WCHAR
0.14
avit
0.14
quel
0.14
since
0.14
Activations Density 0.336%