INDEX
Explanations
expressions of deep reflection and critical thinking
New Auto-Interp
Negative Logits
ised
-0.17
/tcp
-0.16
redient
-0.16
noop
-0.16
igham
-0.15
sher
-0.15
erre
-0.15
hed
-0.15
hoff
-0.15
pone
-0.14
POSITIVE LOGITS
fulness
0.41
fully
0.41
lessly
0.28
ful
0.28
-pro
0.26
processes
0.25
prov
0.24
FUL
0.24
crime
0.23
leaders
0.23
Activations Density 0.041%