INDEX
Explanations
phrases indicating emotional tension or personal conflict
New Auto-Interp
Negative Logits
èĥŀ
-0.07
yt
-0.07
edis
-0.07
æŃ©
-0.07
ogui
-0.07
antz
-0.07
undry
-0.07
eldre
-0.07
utex
-0.07
unday
-0.07
POSITIVE LOGITS
withholding
0.06
Meanwhile
0.06
learns
0.05
opers
0.05
Their
0.05
avel
0.05
discovers
0.05
Specs
0.05
um
0.05
vy
0.05
Activations Density 0.019%