INDEX
Explanations
references or citations to other content
New Auto-Interp
Negative Logits
ness
-0.19
soever
-0.18
land
-0.18
ly
-0.17
l
-0.17
nya
-0.17
like
-0.17
wide
-0.17
self
-0.16
most
-0.16
POSITIVE LOGITS
below
0.29
also
0.28
-through
0.25
/he
0.23
Also
0.22
ley
0.22
also
0.21
beck
0.21
LEY
0.21
Also
0.20
Activations Density 0.031%