INDEX
Explanations
punctuation marks
repeated phrases or statements indicating negation or contradiction
New Auto-Interp
Negative Logits
gypt
-0.74
feasibility
-0.66
ounded
-0.65
chains
-0.64
tro
-0.62
herds
-0.61
bount
-0.59
bounty
-0.58
mud
-0.58
trave
-0.56
POSITIVE LOGITS
sir
0.94
whatsoever
0.87
except
0.79
nor
0.78
thank
0.76
onsense
0.71
uh
0.70
please
0.68
Shift
0.67
Mistress
0.67
Activations Density 0.056%