INDEX
Explanations
mentions of specific names or terms
New Auto-Interp
Negative Logits
umer
-0.71
esis
-0.68
reconc
-0.66
iments
-0.66
ervation
-0.65
sacrific
-0.63
occup
-0.63
eering
-0.63
emort
-0.63
terior
-0.63
POSITIVE LOGITS
forth
0.91
Berry
0.90
weed
0.88
lon
0.83
words
0.83
buck
0.82
mere
0.81
pee
0.79
down
0.79
adow
0.78
Activations Density 1.641%