INDEX
Explanations
affirmative and confirming statements in dialogue
New Auto-Interp
Negative Logits
ardon
-0.17
tera
-0.15
Principle
-0.15
principle
-0.15
bero
-0.14
Fast
-0.14
590
-0.14
Merk
-0.14
Tep
-0.14
ibel
-0.14
POSITIVE LOGITS
Ki
0.15
ayi
0.14
enin
0.14
HOLDERS
0.14
Äijá»ĭnh
0.14
kee
0.14
ester
0.14
ylim
0.13
peq
0.13
orm
0.13
Activations Density 0.154%