INDEX
Explanations
being attacked or overwhelmed
New Auto-Interp
Negative Logits
。
-2.25
酲
-2.16
.
-2.02
馐
-2.02
na
-2.02
as
-2.02
本章
-2.00
谞
-1.99
va
-1.98
şi
-1.97
POSITIVE LOGITS
}
2.34
↵
2.22
the
2.06
嚀
2.05
无比
2.00
1.97
it
1.83
霂
1.76
respald
1.74
因为他
1.73
Activations Density 0.004%