INDEX
Explanations
phrases denoting importance, significance, or emphasis
phrases that emphasize particular importance or significance
New Auto-Interp
Negative Logits
\\\\\\\\
-0.74
cli
-0.73
guiActive
-0.71
essentially
-0.71
only
-0.70
ensibly
-0.69
merely
-0.68
both
-0.68
ences
-0.67
ylon
-0.67
POSITIVE LOGITS
egregious
1.22
noteworthy
1.04
poignant
0.99
troublesome
0.97
gall
0.95
vulnerable
0.94
suited
0.93
acute
0.91
notable
0.90
worrisome
0.90
Activations Density 0.040%