INDEX
Explanations
references to Western culture and concepts
New Auto-Interp
Negative Logits
InSection
-0.59
braun
-0.56
قل
-0.52
iren
-0.49
budgeting
-0.49
หวัด
-0.49
ires
-0.49
transfieras
-0.49
Suara
-0.49
oportunidade
-0.48
POSITIVE LOGITS
ModelExpression
0.88
anglo
0.70
Anglo
0.68
ędzynarod
0.68
English
0.65
영어
0.64
ValueStyle
0.64
estero
0.63
AnchorStyles
0.63
Anglo
0.62
Activations Density 0.294%