INDEX
Explanations
emphasized expressions of truth and authenticity
New Auto-Interp
Negative Logits
965
-0.18
赤
-0.15
itol
-0.15
å¾Ģ
-0.15
¡
-0.14
vanced
-0.14
ç°
-0.14
/info
-0.14
å¼±
-0.13
rous
-0.13
POSITIVE LOGITS
-blue
0.17
eya
0.15
adle
0.14
believer
0.14
pleasure
0.14
nda
0.14
izoph
0.14
apon
0.14
bis
0.14
HEME
0.13
Activations Density 0.011%