INDEX
Explanations
phrases expressing thoughts or reflections, particularly those that are self-critical or clichéd
New Auto-Interp
Negative Logits
ntax
-0.16
ustos
-0.15
erah
-0.14
antz
-0.14
482
-0.14
виж
-0.14
PLUS
-0.13
egt
-0.13
MLS
-0.13
ìĿ´ìĬ¤
-0.13
POSITIVE LOGITS
but
0.23
nhưng
0.18
but
0.18
pero
0.17
_but
0.17
ï¼Įä½Ĩ
0.16
oeff
0.16
но
0.15
μιÏĥ
0.15
ì§Ģë§Į
0.15
Activations Density 0.089%