INDEX
Explanations
instances of the word "in."
New Auto-Interp
Negative Logits
mazon
-0.16
ertools
-0.16
asm
-0.15
accordance
-0.15
bac
-0.15
rám
-0.14
trag
-0.14
spite
-0.14
contri
-0.14
tlement
-0.14
POSITIVE LOGITS
turn
0.29
verts
0.28
itself
0.28
ients
0.27
fact
0.26
izes
0.25
iates
0.25
question
0.25
ched
0.24
-turn
0.24
Activations Density 0.170%