INDEX
Explanations
phrases related to emphasizing or pointing out specific words or phrases
references to articles and their content
New Auto-Interp
Negative Logits
ordable
-0.83
habi
-0.83
ornings
-0.79
urnal
-0.78
adle
-0.76
elaide
-0.76
soever
-0.76
leground
-0.75
thood
-0.73
entimes
-0.69
POSITIVE LOGITS
implication
1.57
analogy
1.48
gist
1.45
wording
1.36
distinction
1.33
argument
1.30
assumption
1.27
reasoning
1.25
inference
1.24
difference
1.22
Activations Density 0.523%