INDEX
Explanations
phrases that repeat the same word, particularly with an emphasis or comparison
New Auto-Interp
Negative Logits
UME
-0.76
Reviewer
-0.74
AMI
-0.72
ERAL
-0.70
ogi
-0.70
andr
-0.70
ettle
-0.67
orge
-0.67
CHAT
-0.65
oes
-0.65
POSITIVE LOGITS
consecut
0.89
theless
0.89
apiece
0.72
fold
0.72
dipping
0.69
dozen
0.69
thirds
0.65
Eleven
0.64
entimes
0.63
halves
0.62
Activations Density 0.018%