INDEX
Explanations
references to various aspects of culture, including traditions, practices, and societal norms
New Auto-Interp
Negative Logits
ieth
-0.97
issan
-0.84
wered
-0.83
istant
-0.79
ishable
-0.79
deen
-0.79
iary
-0.77
agher
-0.76
iculty
-0.75
IELD
-0.74
POSITIVE LOGITS
indo
0.83
clash
0.80
Culture
0.77
immersion
0.77
Appropri
0.77
Marxism
0.74
atically
0.71
Diversity
0.71
appropriation
0.70
wars
0.70
Activations Density 10.167%