INDEX
Explanations
words related to past events or situations
references to specific groups of people or entities
New Auto-Interp
Negative Logits
........
-0.74
................
-0.74
........................
-0.64
.............
-0.64
................................
-0.63
.........
-0.63
Apart
-0.63
Volcano
-0.60
whichever
-0.59
435
-0.58
POSITIVE LOGITS
survived
0.94
ppers
0.91
participated
0.90
ever
0.86
interacted
0.84
oped
0.84
frequ
0.82
disliked
0.81
cared
0.80
ventured
0.80
Activations Density 0.162%