INDEX
Explanations
words related to societal norms and cultural beliefs
references to societal norms and cultural practices
New Auto-Interp
Negative Logits
ded
-0.71
ש
-0.69
aman
-0.68
amen
-0.68
MER
-0.68
onement
-0.68
Interstitial
-0.67
amaz
-0.66
CLASSIFIED
-0.66
×ŀ
-0.65
POSITIVE LOGITS
ystem
1.11
pace
1.02
pring
1.02
cape
1.01
omething
1.00
mith
0.99
affecting
0.96
hops
0.95
hip
0.92
influencing
0.89
Activations Density 0.276%