INDEX
Explanations
words related to different aspects of freedom and rights
references to the concept of freedom and its various forms
New Auto-Interp
Negative Logits
ergy
-0.79
ded
-0.76
itant
-0.74
Cosponsors
-0.73
mers
-0.69
eor
-0.68
liam
-0.68
assium
-0.68
ENTS
-0.67
itated
-0.67
POSITIVE LOGITS
roam
0.93
bies
0.91
guaranteed
0.86
freedom
0.85
freedoms
0.82
freedom
0.81
unrestricted
0.81
captives
0.79
afforded
0.76
edom
0.75
Activations Density 0.020%