INDEX
Explanations
numbers at the beginning of sentences and the symbol ':' in a text
punctuation and formatting symbols, particularly colons
New Auto-Interp
Negative Logits
etheless
-0.71
disliked
-0.71
referees
-0.69
receipt
-0.66
ibur
-0.65
evapor
-0.65
brewed
-0.63
stagn
-0.63
pads
-0.62
diver
-0.62
POSITIVE LOGITS
Exactly
0.93
Tonight
0.86
Yeah
0.82
Well
0.80
Correct
0.78
Wow
0.76
Alright
0.75
Explain
0.75
Yeah
0.74
Thirty
0.74
Activations Density 0.043%