INDEX
Explanations
phrases suggesting urgency or advice to take action
New Auto-Interp
Negative Logits
è»Ł
-0.16
eger
-0.16
peare
-0.15
yal
-0.15
ìĤ¬ìĿ´
-0.15
arel
-0.14
ref
-0.14
itzer
-0.14
á»Ĩ
-0.14
atsby
-0.14
POSITIVE LOGITS
ulp
0.17
Newman
0.14
Ax
0.14
GANG
0.13
Herm
0.13
isl
0.13
riet
0.13
uncomment
0.13
isco
0.13
spare
0.13
Activations Density 0.029%