INDEX
Explanations
the phrase "the only thing" followed by additional context
repetitions of the phrase "the only thing" in various contexts
New Auto-Interp
Negative Logits
onz
-0.75
ãĤ´ãĥ³
-0.75
perse
-0.71
lance
-0.69
enture
-0.69
baugh
-0.69
soon
-0.68
inate
-0.67
fixme
-0.67
analysis
-0.67
POSITIVE LOGITS
missing
0.95
separating
0.90
bothering
0.88
happening
0.88
that
0.87
we
0.86
keeping
0.83
preventing
0.82
stopping
0.82
you
0.81
Activations Density 0.064%