INDEX
Explanations
sentences ending with a period followed by a high activation value, potentially indicating the end of a statement or thought
the presence of sentences that end with a period
New Auto-Interp
Negative Logits
"$:/
-0.60
irlf
-0.60
izont
-0.57
itially
-0.56
escription
-0.55
ogun
-0.54
ividual
-0.53
uese
-0.52
uilt
-0.50
udicrous
-0.50
POSITIVE LOGITS
.
1.55
.(
1.10
.</
1.06
.*
1.05
.]
1.02
!.
1.00
.<
1.00
.:
0.98
._
0.95
.[
0.94
Activations Density 1.433%