INDEX
Explanations
mentions of Filipino culture or identity
New Auto-Interp
Negative Logits
Clair
-0.15
625
-0.15
750
-0.14
ÚĺÙĨ
-0.14
Outer
-0.14
Mall
-0.14
375
-0.14
æ¤
-0.14
geben
-0.13
avana
-0.13
POSITIVE LOGITS
osa
0.15
opsy
0.15
ì²´
0.15
_ptr
0.14
hook
0.14
les
0.14
ipple
0.14
Revenue
0.14
ibble
0.14
OF
0.14
Activations Density 0.002%