INDEX
Explanations
phrases indicating decision-making or conclusion
phrases indicating the action of presenting or putting something forward for consideration
New Auto-Interp
Negative Logits
pregnancies
-0.68
artifacts
-0.64
holder
-0.63
withdrawals
-0.61
notations
-0.61
empt
-0.59
scl
-0.59
important
-0.58
hops
-0.56
employed
-0.56
POSITIVE LOGITS
lling
0.86
lled
0.86
ggles
0.81
wered
0.80
pload
0.77
fend
0.77
pless
0.77
blame
0.71
othy
0.69
OOL
0.69
Activations Density 0.098%