INDEX
Explanations
phrases indicating the concept of freedom or the ability to choose actions
free to do actions
New Auto-Interp
Negative Logits
manera
-0.40
Quote
-0.38
accompaniment
-0.36
oar
-0.35
gost
-0.35
quote
-0.35
jsxFileName
-0.34
efficiency
-0.33
ibatkan
-0.33
AppComponent
-0.33
POSITIVE LOGITS
free
0.71
freely
0.68
Free
0.67
szabad
0.67
libertà
0.66
Freedom
0.64
freedom
0.64
free
0.64
liberdade
0.62
Freedom
0.62
Activations Density 0.011%