INDEX
Explanations
phrases indicating a requirement, prohibition, or consideration of opinions or actions
the word "don’t"
New Auto-Interp
Negative Logits
cised
-0.65
Completed
-0.64
afore
-0.63
Casting
-0.63
Dise
-0.62
ipel
-0.61
Learning
-0.61
milo
-0.60
vanquished
-0.58
Semi
-0.57
POSITIVE LOGITS
't
1.43
ned
1.13
ates
0.92
ning
0.91
atives
0.90
nell
0.85
nels
0.84
etsk
0.83
uts
0.82
kie
0.81
Activations Density 0.112%