INDEX
Explanations
mentions of specific names and references in a text
New Auto-Interp
Negative Logits
neau
-0.17
ament
-0.15
oks
-0.15
Olsen
-0.15
827
-0.14
896
-0.14
argo
-0.14
ernen
-0.14
923
-0.13
Willow
-0.13
POSITIVE LOGITS
chied
0.16
ugar
0.16
ستگÛĮ
0.15
سب
0.15
elman
0.14
éѝ
0.14
оди
0.14
UED
0.14
é²ģ
0.14
ῦ
0.14
Activations Density 0.021%