INDEX
Explanations
phrases related to comparison and contrast
instances of punctuation, specifically commas, often indicating lists or separation in thoughts
New Auto-Interp
Negative Logits
zos
-0.82
sers
-0.65
tv
-0.62
oir
-0.62
hn
-0.61
zers
-0.61
lvl
-0.61
aron
-0.61
lees
-0.60
aris
-0.60
POSITIVE LOGITS
respectively
0.89
curfew
0.70
depending
0.69
depending
0.65
iffe
0.57
disclaim
0.56
dispos
0.55
whichever
0.55
premature
0.55
ilation
0.54
Activations Density 0.237%