INDEX
Explanations
the word "definition"
definitions or explanations within a text
instances of the word "definition" used in various contexts
New Auto-Interp
Negative Logits
ili
-0.77
rop
-0.70
roph
-0.70
itch
-0.68
ublic
-0.67
estern
-0.66
ocamp
-0.66
orld
-0.66
Pradesh
-0.64
rentice
-0.64
POSITIVE LOGITS
definitions
1.01
inition
0.97
defines
0.93
definition
0.93
initions
0.92
REDACTED
0.84
Definition
0.76
witz
0.75
definition
0.74
terday
0.73
Activations Density 0.012%