INDEX
Explanations
phrases related to advice or considerations to be made
phrases that encourage awareness and consideration of important factors or advice
New Auto-Interp
Negative Logits
pathetic
-0.65
caricature
-0.64
CHAT
-0.63
pretended
-0.60
CHA
-0.59
promise
-0.57
occupancy
-0.56
idth
-0.55
tricked
-0.55
Claim
-0.55
POSITIVE LOGITS
ASAP
1.07
whenever
1.04
when
1.00
considering
0.94
before
0.94
if
0.91
lest
0.91
BEFORE
0.87
during
0.84
besides
0.82
Activations Density 0.217%