INDEX
Explanations
self-referential words indicating action or change happening
New Auto-Interp
Negative Logits
SPONSORED
-0.76
ITNESS
-0.65
elected
-0.65
interstitial
-0.63
ifted
-0.62
aug
-0.60
Married
-0.60
assadors
-0.60
ilingual
-0.59
onnaissance
-0.59
POSITIVE LOGITS
balance
0.85
fortunes
0.85
unnecessarily
0.84
altogether
0.84
prematurely
0.84
momentum
0.83
inhib
0.83
entire
0.82
reins
0.82
boundaries
0.81
Activations Density 7.666%