INDEX
Explanations
occurrences of pronouns and identifiers that denote individuals or groups
New Auto-Interp
Negative Logits
asd
-0.16
orgia
-0.15
arn
-0.15
isti
-0.15
Convention
-0.15
jax
-0.15
POP
-0.15
-0.15
::::::::
-0.14
Central
-0.14
POSITIVE LOGITS
.scalablytyped
0.17
/apis
0.15
کت
0.15
æļ
0.15
itan
0.15
sür
0.15
ани
0.15
лон
0.15
ilere
0.14
earer
0.14
Activations Density 0.014%