INDEX
Explanations
phrases related to planning, discussion, and reflection
sentences expressing collective thoughts or plans
New Auto-Interp
Negative Logits
lation
-0.70
Kills
-0.68
reality
-0.67
srfAttach
-0.65
ardless
-0.62
Nah
-0.62
Sheen
-0.62
itary
-0.61
Fill
-0.60
Cros
-0.60
POSITIVE LOGITS
wish
0.99
disliked
0.92
wished
0.91
wanted
0.90
regret
0.89
dislike
0.89
forgot
0.88
overlooked
0.85
learned
0.85
learnt
0.84
Activations Density 0.169%