INDEX
Explanations
conjunctions used to provide contrast or introduce a different perspective
conditional phrases and expressions of uncertainty
New Auto-Interp
Negative Logits
ocument
-0.68
identified
-0.66
reported
-0.65
ewitness
-0.65
extensively
-0.64
eteen
-0.64
tained
-0.64
irteen
-0.63
ONSORED
-0.63
endar
-0.62
POSITIVE LOGITS
yeah
0.92
thats
0.85
maybe
0.84
maybe
0.82
devs
0.81
beware
0.79
hopefully
0.79
hey
0.77
Goku
0.77
dont
0.76
Activations Density 0.618%