INDEX
Explanations
mentions of cultural topics
references to culture and its various implications and discussions
New Auto-Interp
Negative Logits
ieth
-0.89
agher
-0.83
iary
-0.79
wered
-0.79
deen
-0.77
ishable
-0.77
issan
-0.77
istant
-0.72
engers
-0.70
IELD
-0.69
POSITIVE LOGITS
Culture
0.83
culture
0.78
indo
0.77
immersion
0.77
Appropri
0.75
Diversity
0.74
diversity
0.74
Marxism
0.73
clash
0.73
ulture
0.72
Activations Density 0.023%