INDEX
Explanations
specific locations or proper nouns
New Auto-Interp
Negative Logits
Lv
-0.67
Dialog
-0.66
_(
-0.64
Scient
-0.63
Siber
-0.61
Quantity
-0.58
arbon
-0.58
?,
-0.57
Redditor
-0.57
Izan
-0.55
POSITIVE LOGITS
have
0.98
were
0.98
are
0.96
rejoice
0.95
reacted
0.94
contend
0.92
insist
0.91
argue
0.89
weren
0.83
fared
0.82
Activations Density 0.421%