INDEX
Explanations
phrases prompting or advising against a certain action
imperatives and negative commands or suggestions
New Auto-Interp
Negative Logits
ELD
-0.76
Redd
-0.70
liner
-0.68
dimension
-0.63
prof
-0.63
ħĭ
-0.62
established
-0.61
upon
-0.61
milo
-0.61
ilage
-0.61
POSITIVE LOGITS
underestimate
0.89
hesitate
0.86
theless
0.86
ndum
0.82
heny
0.78
kidding
0.76
ardless
0.74
worry
0.74
omsday
0.74
ations
0.72
Activations Density 0.057%