INDEX
Explanations
instances of specific foreign names or terms related to cultural references
New Auto-Interp
Negative Logits
ERGY
-0.17
nger
-0.16
arf
-0.16
alem
-0.16
longer
-0.15
ware
-0.15
ivable
-0.15
olut
-0.14
stom
-0.14
matic
-0.14
POSITIVE LOGITS
uvre
0.27
iou
0.27
bling
0.22
hle
0.19
cker
0.19
olian
0.18
ufs
0.17
ae
0.17
oe
0.17
iš
0.17
Activations Density 0.038%