INDEX
Explanations
words related to political and environmental issues, especially focusing on specific locations
New Auto-Interp
Negative Logits
staking
-0.77
anwhile
-0.66
avers
-0.64
perate
-0.63
iris
-0.62
iddles
-0.61
iless
-0.60
ammers
-0.60
erity
-0.60
aws
-0.60
POSITIVE LOGITS
endeavor
0.81
environment
0.79
entity
0.79
adventure
0.79
institution
0.78
affair
0.75
phenomenon
0.74
event
0.73
outing
0.73
experience
0.72
Activations Density 0.741%