INDEX
Explanations
words related to critique or negative assessments
New Auto-Interp
Negative Logits
ernet
-0.16
sj
-0.15
ë§ĪíĬ¸
-0.15
ubl
-0.15
roker
-0.14
cq
-0.14
/stdc
-0.14
egas
-0.14
elib
-0.14
dens
-0.13
POSITIVE LOGITS
present
0.34
search
0.33
volution
0.32
presentation
0.30
fer
0.30
stricted
0.28
lation
0.28
levant
0.28
commended
0.27
v
0.27
Activations Density 0.012%