INDEX
Explanations
references to the concept of norms and their implications in society
New Auto-Interp
Negative Logits
entirety
-0.25
entire
-0.25
whole
-0.20
whole
-0.18
possibility
-0.18
Entire
-0.18
presence
-0.18
chance
-0.16
opportunity
-0.16
equivalent
-0.15
POSITIVE LOGITS
early
0.19
earlier
0.18
newer
0.18
recent
0.17
earliest
0.16
major
0.16
many
0.16
later
0.16
oldem
0.15
many
0.15
Activations Density 0.138%