INDEX
Explanations
instances of the word "this" and phrases expressing emphasis or significance
New Auto-Interp
Negative Logits
wiki
-0.17
elson
-0.15
odd
-0.15
ι
-0.15
ons
-0.14
nt
-0.14
id
-0.14
wonders
-0.14
orthand
-0.14
iler
-0.13
POSITIVE LOGITS
is
0.27
æĺ¯æĪij
0.20
was
0.19
isn
0.19
morning
0.18
entire
0.17
wasn
0.17
whole
0.17
thing
0.16
timeofday
0.16
Activations Density 0.182%