INDEX
Explanations
adjectives describing emotions or qualities
expressions of personal opinion or evaluations
New Auto-Interp
Negative Logits
rontal
-0.67
hemor
-0.67
coli
-0.64
downed
-0.63
mop
-0.63
boarded
-0.62
eties
-0.60
mercial
-0.59
raph
-0.58
ideon
-0.57
POSITIVE LOGITS
SPONSORED
0.86
?:
0.83
varies
0.79
differs
0.78
distinguishes
0.72
illustrates
0.70
Story
0.67
differed
0.66
requires
0.66
boils
0.66
Activations Density 0.286%