INDEX
Explanations
contractions and possessive forms in the text
New Auto-Interp
Negative Logits
s
-0.22
Ùĩ
-0.17
tti
-0.16
ÑĮ
-0.16
ilden
-0.15
vido
-0.15
нг
-0.15
YNAM
-0.15
owski
-0.14
igner
-0.14
POSITIVE LOGITS
richt
0.14
amp
0.14
ment
0.13
fluence
0.13
them
0.13
imately
0.13
mate
0.13
ãĥ¼ãĥĸ
0.13
Evelyn
0.13
ripp
0.13
Activations Density 0.025%