INDEX
Explanations
suggestions or actions that can be taken
phrases related to actions or measures being taken
New Auto-Interp
Negative Logits
fre
-0.77
antha
-0.70
inately
-0.68
Faul
-0.66
phe
-0.65
reused
-0.65
Aless
-0.64
space
-0.64
Phant
-0.64
ench
-0.63
POSITIVE LOGITS
corrective
1.02
ACTIONS
0.98
actions
0.93
steps
0.93
inaction
0.87
concerted
0.85
decisively
0.85
GBT
0.84
toward
0.82
Steps
0.81
Activations Density 0.230%