INDEX
Explanations
comparisons or contrasts
New Auto-Interp
Negative Logits
xtap
-0.74
agra
-0.73
oes
-0.70
plings
-0.70
ipment
-0.69
https
-0.68
itudes
-0.66
atan
-0.65
uum
-0.64
oak
-0.64
POSITIVE LOGITS
importantly
0.98
than
0.97
important
0.95
interesting
0.91
informative
0.89
challeng
0.89
likely
0.84
worrisome
0.84
realistic
0.84
salient
0.82
Activations Density 0.022%