INDEX
Explanations
rights, agency, and autonomy
New Auto-Interp
Negative Logits
inhomogeneities
0.42
ေါ်
0.41
Neue
0.41
outcrops
0.41
lemmas
0.39
Zusammenhang
0.38
roids
0.38
awkward
0.38
నేపథ
0.38
projections
0.37
POSITIVE LOGITS
granted
0.92
rights
0.89
permission
0.88
permissions
0.84
权限
0.84
freedom
0.82
свободы
0.82
granted
0.82
freedoms
0.80
allowed
0.80
Activations Density 0.159%