INDEX
Explanations
the word "won"
negations or instances of "won't."
New Auto-Interp
Negative Logits
gypt
-0.71
eki
-0.63
illin
-0.62
OTOS
-0.61
Factor
-0.60
periphery
-0.60
Traps
-0.59
bian
-0.58
compr
-0.57
constrained
-0.57
POSITIVE LOGITS
't
1.43
itive
1.08
cest
0.95
now
0.94
iors
0.85
geon
0.83
ced
0.82
ests
0.81
ipeg
0.81
cing
0.81
Activations Density 0.036%