INDEX
Explanations
references to ethnic identity and cultural practices
New Auto-Interp
Negative Logits
akens
-0.18
exampleInput
-0.16
.ai
-0.15
incontri
-0.15
_AI
-0.14
ÙĦب
-0.14
ZX
-0.14
lrt
-0.14
Calibri
-0.14
utron
-0.14
POSITIVE LOGITS
Rom
0.58
Roma
0.51
Rom
0.43
roma
0.42
ROM
0.41
rom
0.40
roma
0.34
rom
0.34
ROM
0.34
_ROM
0.31
Activations Density 0.008%