INDEX
Explanations
instances of significant nouns and phrases that denote impact or importance
New Auto-Interp
Negative Logits
opportunity
-0.19
entire
-0.17
likes
-0.17
stuff
-0.17
avenue
-0.16
idea
-0.15
stuff
-0.15
ä¸Ģ次
-0.15
approach
-0.14
hint
-0.14
POSITIVE LOGITS
few
0.41
few
0.34
many
0.32
Few
0.30
Few
0.29
many
0.29
åĩłä¸ª
0.25
several
0.24
ways
0.24
MANY
0.24
Activations Density 0.181%