INDEX
Explanations
phrases starting with a dash followed by a number
negative sentiments or critiques
New Auto-Interp
Negative Logits
irlf
-0.79
ecause
-0.77
ometimes
-0.74
ividual
-0.74
withd
-0.73
lished
-0.71
ancial
-0.66
ashtra
-0.66
fman
-0.64
ij士
-0.62
POSITIVE LOGITS
-
2.02
âĢij
1.40
âĢIJ
1.35
-,
1.14
-[
1.12
-'
1.12
-$
1.09
"-
1.02
'-
0.91
-.
0.89
Activations Density 0.390%