INDEX
Explanations
significant emphasis on the word "this" in various contexts
New Auto-Interp
Negative Logits
ial
-0.16
eday
-0.15
uts
-0.15
tr
-0.15
reasons
-0.14
that
-0.14
bug
-0.14
raison
-0.13
the
-0.13
itoris
-0.13
POSITIVE LOGITS
particular
0.41
/th
0.33
entire
0.27
guy
0.27
PARTICULAR
0.26
/her
0.25
whole
0.23
zelf
0.22
exact
0.22
latest
0.21
Activations Density 0.387%