INDEX
Negative Logits
Temper
-0.08
hältnis
-0.08
square
-0.07
-0.07
daughters
-0.07
relax
-0.07
temper
-0.07
wür
-0.07
ca
-0.07
territori
-0.07
POSITIVE LOGITS
selective
0.14
selet
0.12
Selective
0.12
selectively
0.11
Filtering
0.11
Filter
0.11
.Filter
0.11
(Filter
0.11
过滤
0.10
filtro
0.10
Activations Density 0.011%