INDEX
Explanations
phrases that instruct or suggest actions
instances of the word "to"
New Auto-Interp
Negative Logits
lav
-0.69
eryl
-0.66
cephal
-0.66
ordon
-0.63
rell
-0.61
ifer
-0.59
bridge
-0.58
ery
-0.57
rys
-0.56
encamp
-0.56
POSITIVE LOGITS
TO
3.01
TO
1.80
INTO
1.64
FOR
1.60
FROM
1.60
OF
1.59
ON
1.57
ABOUT
1.56
IN
1.49
BY
1.49
Activations Density 0.024%