INDEX
Explanations
references to the concept of freedom
references to the concept of freedom
New Auto-Interp
Negative Logits
ergy
-0.87
IFIC
-0.78
liam
-0.73
eret
-0.72
nas
-0.71
VA
-0.67
sb
-0.67
Dynasty
-0.67
IENT
-0.65
PAR
-0.64
POSITIVE LOGITS
freedom
1.27
freedoms
1.14
freedom
1.04
roam
0.89
communism
0.84
bies
0.83
liberty
0.83
emancipation
0.81
liberties
0.80
empowerment
0.78
Activations Density 0.017%