INDEX
Explanations
instances of sentence-ending punctuation
New Auto-Interp
Negative Logits
"$:/
-0.63
eject
-0.61
rup
-0.60
crash
-0.59
endeavour
-0.59
compulsion
-0.59
IGH
-0.59
neutron
-0.58
punishment
-0.58
excuse
-0.58
POSITIVE LOGITS
Already
0.75
giving
0.66
Quite
0.65
Already
0.64
wr
0.61
Karin
0.61
Presumably
0.59
Cyn
0.59
Plot
0.59
Points
0.58
Activations Density 0.063%