INDEX
Explanations
words related to political or social movements and changes
New Auto-Interp
Negative Logits
enegger
-0.78
agher
-0.74
Lear
-0.73
monds
-0.71
hest
-0.69
itness
-0.69
Witness
-0.68
ilan
-0.66
thouse
-0.65
ewitness
-0.65
POSITIVE LOGITS
eering
0.87
process
0.79
ism
0.78
efforts
0.74
ISM
0.73
ist
0.72
processes
0.71
anism
0.70
xual
0.70
Phase
0.69
Activations Density 0.074%