INDEX
Explanations
phrases indicating importance, such as "the fate of" and "the history of"
repeated phrases primarily containing the word "of."
New Auto-Interp
Negative Logits
!.
-0.77
,...
-0.76
.''.
-0.65
tackle
-0.65
.","
-0.64
!,
-0.64
.........
-0.63
".[
-0.63
.[
-0.62
»
-0.62
POSITIVE LOGITS
pires
0.73
these
0.70
varies
0.62
this
0.61
was
0.60
hasn
0.60
allows
0.58
translates
0.57
bothers
0.56
pired
0.56
Activations Density 0.440%