INDEX
Explanations
phrases related to someone or something becoming notable or significant
articles and descriptors related to significant entities or concepts
New Auto-Interp
Negative Logits
goodness
-0.71
abilities
-0.69
!/
-0.69
erity
-0.67
agree
-0.66
packs
-0.65
icity
-0.65
panel
-0.64
books
-0.62
achu
-0.62
POSITIVE LOGITS
scapego
0.84
fodder
0.84
irl
0.77
permanent
0.77
martyr
0.76
unwitting
0.75
casualty
0.73
focal
0.73
hemer
0.73
unstoppable
0.72
Activations Density 0.128%