INDEX
Explanations
phrases related to conflicts of interest in various contexts
New Auto-Interp
Negative Logits
adar
-0.17
zimmer
-0.16
vette
-0.15
ivre
-0.15
lettes
-0.15
_WM
-0.15
lsa
-0.14
ELY
-0.14
ktion
-0.14
APPER
-0.14
POSITIVE LOGITS
703
0.17
killer
0.15
agon
0.15
327
0.15
atas
0.15
MC
0.14
o
0.14
MEM
0.14
399
0.14
astr
0.14
Activations Density 0.084%