INDEX
Negative Logits
defenses
0.54
+
0.52
verhindert
0.52
against
0.51
pfl
0.50
to
0.50
behandelt
0.50
privilegi
0.50
膑
0.49
verm
0.49
POSITIVE LOGITS
定義
0.94
Defining
0.92
Define
0.91
Defined
0.84
defined
0.83
Defines
0.83
Defining
0.81
define
0.80
정의
0.79
defines
0.77
Activations Density 0.023%