INDEX
Explanations
phrases indicating uncertainty or questioning existence and value
New Auto-Interp
Negative Logits
benh
-0.17
/*č↵
-0.16
ÅĦst
-0.15
igo
-0.15
_ZERO
-0.14
idle
-0.14
ãĥªãĥ¼ãĤº
-0.14
elerik
-0.14
alia
-0.14
Idle
-0.14
POSITIVE LOGITS
anymore
1.00
nữa
0.56
artık
0.42
longer
0.38
lagi
0.37
ãģªãģıãģª
0.35
دÛĮگر
0.33
no
0.32
åĨį
0.31
again
0.30
Activations Density 0.381%