INDEX
Explanations
references to traditional customs or cultural practices
New Auto-Interp
Negative Logits
/we
-0.17
bras
-0.17
thing
-0.17
/he
-0.15
asons
-0.14
oll
-0.14
lict
-0.14
íıī
-0.14
ages
-0.13
rowse
-0.13
POSITIVE LOGITS
ists
0.25
/current
0.20
ively
0.20
/original
0.20
ist
0.19
ised
0.19
ized
0.19
mente
0.18
itionally
0.18
ism
0.18
Activations Density 0.028%