INDEX
Explanations
phrases that highlight significant societal changes or accomplishments
New Auto-Interp
Negative Logits
duk
-0.15
ensen
-0.15
sert
-0.14
anded
-0.14
immel
-0.13
emen
-0.13
orph
-0.13
-0.13
ENSE
-0.13
keyst
-0.13
POSITIVE LOGITS
happening
0.23
happens
0.20
happened
0.19
happen
0.18
burger
0.17
done
0.17
aconte
0.17
accomplished
0.17
Done
0.16
vang
0.16
Activations Density 0.091%