INDEX
Explanations
occurrences of the word "define" and its variations, indicating a focus on definitions and clarifications
New Auto-Interp
Negative Logits
eros
-0.18
ero
-0.16
eras
-0.14
465
-0.14
anna
-0.14
113
-0.14
cribe
-0.14
finally
-0.13
rema
-0.13
iffin
-0.13
POSITIVE LOGITS
636
0.17
-ÑĤо
0.16
nable
0.15
/method
0.15
arton
0.15
undef
0.14
adamente
0.14
_HAVE
0.14
purpose
0.14
IGN
0.14
Activations Density 0.075%