INDEX
Explanations
declarative statements or assertions with strong opinions
various sentence endings or punctuation marks
New Auto-Interp
Negative Logits
',"
-0.67
!'"
-0.64
atility
-0.57
Cup
-0.56
inguishable
-0.55
Thumbnail
-0.54
,'"
-0.54
'."
-0.53
owan
-0.53
teammate
-0.52
POSITIVE LOGITS
↵Âł
1.38
Âł
1.35
³³
1.30
Âł Âł
1.29
³³
1.25
]
1.06
Âł
1.05
)
1.02
):
1.02
)]
0.99
Activations Density 0.477%