INDEX
Explanations
references to individuals or entities involved in environmental or social activism
New Auto-Interp
Negative Logits
INTON
-0.73
Ethics
-0.64
orically
-0.61
士
-0.59
Donation
-0.58
Sources
-0.58
INA
-0.58
enegger
-0.58
inately
-0.58
sources
-0.57
POSITIVE LOGITS
lihood
1.28
liest
1.23
lier
0.93
liness
0.85
hots
0.80
minded
0.78
creen
0.76
piring
0.74
paces
0.72
mith
0.72
Activations Density 0.007%