INDEX
Explanations
proper nouns, particularly names of people
proper nouns, likely related to people's names or entities
New Auto-Interp
Negative Logits
pired
-0.67
existed
-0.63
exists
-0.59
compromises
-0.57
subs
-0.57
pires
-0.57
finds
-0.57
hath
-0.56
destroy
-0.56
doesnt
-0.56
POSITIVE LOGITS
.
1.03
sarcast
0.89
rhet
0.77
.</
0.76
_.
0.75
.[
0.74
.(
0.72
.<
0.71
."
0.71
via
0.69
Activations Density 0.177%