INDEX
Explanations
mentions of various perspectives and viewpoints
New Auto-Interp
Negative Logits
lum
-0.16
dex
-0.16
nan
-0.16
dy
-0.16
bert
-0.15
asset
-0.15
nd
-0.15
dir
-0.15
nie
-0.15
lis
-0.15
POSITIVE LOGITS
view
0.23
-shift
0.20
-view
0.20
pective
0.20
views
0.20
view
0.18
-taking
0.18
pectives
0.17
shift
0.17
Shift
0.17
Activations Density 0.031%