INDEX
Explanations
action words indicating progression or movement
New Auto-Interp
Negative Logits
esser
-0.17
ides
-0.17
esti
-0.15
reff
-0.15
eneg
-0.15
lette
-0.15
ooter
-0.14
eturn
-0.14
rides
-0.14
iped
-0.14
POSITIVE LOGITS
tém
0.15
olit
0.15
ew
0.15
Brotherhood
0.15
383
0.14
Dawn
0.14
480
0.14
ourage
0.14
Watt
0.14
à¥ģष
0.14
Activations Density 0.002%