INDEX
Explanations
phrases indicating actions or imperatives
New Auto-Interp
Negative Logits
hent
-0.81
existent
-0.66
pes
-0.66
ector
-0.66
RF
-0.65
person
-0.63
paralle
-0.62
hemat
-0.62
Provided
-0.61
meant
-0.61
POSITIVE LOGITS
revisit
1.34
retire
1.10
celebrate
1.08
rethink
1.08
rejoice
1.07
reconsider
1.06
congratulate
0.95
recomm
0.94
introduce
0.93
apologize
0.92
Activations Density 0.064%