INDEX
Explanations
phrases that express curiosity or uncertainty
New Auto-Interp
Negative Logits
awe
-0.17
aina
-0.15
emon
-0.14
á¹
-0.14
ear
-0.14
ienia
-0.13
GIN
-0.13
_DR
-0.13
DEX
-0.13
igner
-0.13
POSITIVE LOGITS
æģ¯
0.16
ATEGORIES
0.14
upo
0.14
quete
0.14
ington
0.14
zeÅĦ
0.14
Tato
0.14
ckett
0.14
/tiny
0.13
æľīä»Ģä¹Ī
0.13
Activations Density 0.019%