INDEX
Explanations
phrases related to the concept of "Never"
the repetition of the word "Never."
New Auto-Interp
Negative Logits
Els
-0.78
CRIP
-0.77
ipation
-0.69
ivities
-0.68
åĤ
-0.67
lass
-0.66
å½
-0.66
ifts
-0.66
ELF
-0.65
ÙĪ
-0.64
POSITIVE LOGITS
theless
1.32
entimes
0.94
underestimate
0.89
Never
0.82
forgot
0.81
cared
0.78
dreamed
0.78
forgotten
0.78
never
0.78
soever
0.77
Activations Density 0.006%