INDEX
Explanations
phrases indicating additional quantity or items
New Auto-Interp
Negative Logits
peg
-0.14
prop
-0.14
oshi
-0.14
coon
-0.14
aklı
-0.13
iram
-0.13
maktan
-0.13
base
-0.13
tread
-0.13
victim
-0.13
POSITIVE LOGITS
(extra
0.23
extra
0.23
-extra
0.23
added
0.20
EXTRA
0.20
extra
0.20
/add
0.20
Added
0.19
-added
0.19
/new
0.17
Activations Density 0.169%