INDEX
Explanations
mentions of factories
mentions of factories
New Auto-Interp
Negative Logits
laus
-0.89
soever
-0.78
ĺħ
-0.74
ï¸
-0.74
lihood
-0.72
thood
-0.66
Liberties
-0.65
partisan
-0.63
theless
-0.63
venerable
-0.61
POSITIVE LOGITS
factory
1.11
actory
1.00
orer
0.86
rador
0.79
worker
0.77
rats
0.76
Worker
0.76
arde
0.75
builder
0.75
factories
0.74
Activations Density 0.015%