INDEX
Explanations
the end of sentences
periods at the end of sentences or phrases
New Auto-Interp
Negative Logits
tarian
-0.78
conservatism
-0.75
arily
-0.71
planners
-0.70
superpower
-0.67
currents
-0.66
scaling
-0.65
cloning
-0.65
chained
-0.65
drinkers
-0.64
POSITIVE LOGITS
Hence
1.07
Amen
1.01
Pour
1.01
See
1.00
Thus
0.98
Regist
0.97
Whereas
0.97
Therefore
0.97
Please
0.95
Literally
0.95
Activations Density 0.049%