INDEX
Explanations
comparisons or contrasts between different entities or concepts
instances of the word "that" used to introduce clauses or emphasize comparisons
New Auto-Interp
Negative Logits
natureconservancy
-0.69
downs
-0.59
Dialogue
-0.59
Luck
-0.57
english
-0.56
ysis
-0.56
Offline
-0.55
Laughs
-0.55
berries
-0.54
cycles
-0.54
POSITIVE LOGITS
fateful
0.77
iago
0.72
same
0.71
occurred
0.69
cher
0.69
atism
0.69
icter
0.67
occurs
0.67
resulted
0.67
mattered
0.63
Activations Density 0.336%