INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
<bos>
3.00
the
2.34
and
2.31
with
2.12
the
2.11
is
2.10
both
2.08
'
2.03
were
2.02
a
2.02
POSITIVE LOGITS
slightest
1.89
midst
1.80
outermost
1.77
Diffuse
1.72
outskirts
1.67
purest
1.65
coldest
1.65
Hydrochloride
1.64
nascent
1.63
lowest
1.59
Activations Density 0.670%