INDEX
Explanations
targets related to specific goals or objectives
phrases describing objectives or goals
New Auto-Interp
Negative Logits
Guard
-0.78
note
-0.74
minus
-0.73
acted
-0.72
guards
-0.70
shit
-0.64
outside
-0.64
chapter
-0.63
hot
-0.62
part
-0.62
POSITIVE LOGITS
maximizing
0.91
perfection
0.90
maximize
0.88
achieving
0.84
achieve
0.84
emulate
0.84
replicate
0.82
minimize
0.79
improving
0.79
recreate
0.77
Activations Density 0.164%