INDEX
Explanations
variations of the word "victor" or "victim"
New Auto-Interp
Negative Logits
¤
-0.18
otion
-0.16
osc
-0.15
205
-0.15
fter
-0.14
ITIES
-0.14
ishment
-0.14
seau
-0.14
head
-0.14
Indexed
-0.14
POSITIVE LOGITS
ims
0.28
orious
0.28
orian
0.27
imizer
0.26
oriously
0.26
imized
0.25
ory
0.25
oire
0.25
ories
0.25
imization
0.24
Activations Density 0.005%