INDEX
Explanations
instances of the word "speaking" and related phrases
New Auto-Interp
Negative Logits
ium
-0.14
millennium
-0.14
TL
-0.14
se
-0.13
ester
-0.13
iosis
-0.13
pmat
-0.13
人æ°Ĺ
-0.13
Ctrls
-0.13
ike
-0.13
POSITIVE LOGITS
enia
0.16
ην
0.14
CKET
0.14
BV
0.14
udiant
0.14
endet
0.13
odash
0.13
richt
0.13
dog
0.13
icina
0.13
Activations Density 0.009%