INDEX
Explanations
elements related to commentary and opinion sections of content
New Auto-Interp
Negative Logits
rok
-0.16
emouth
-0.16
combe
-0.16
upos
-0.15
onta
-0.15
üst
-0.14
olas
-0.14
ubat
-0.14
vd
-0.14
ched
-0.14
POSITIVE LOGITS
aires
0.24
aries
0.23
ary
0.20
eting
0.18
ative
0.18
ators
0.18
ghan
0.18
аÑĢÑĸ
0.16
ariat
0.16
/Instruction
0.16
Activations Density 0.031%