INDEX
Explanations
phrases related to advocating for change or improvement
New Auto-Interp
Negative Logits
pains
-0.71
tones
-0.67
stad
-0.66
significance
-0.63
particulars
-0.62
enhagen
-0.59
cern
-0.57
floats
-0.57
udeau
-0.56
plots
-0.56
POSITIVE LOGITS
through
0.83
through
0.78
probably
0.77
undoubtedly
0.76
indeed
0.74
definitely
0.74
simply
0.73
ALWAYS
0.72
surely
0.71
\\\\\\\\
0.70
Activations Density 0.076%