INDEX
Explanations
variations of the letters "w" and certain sentence structures
New Auto-Interp
Negative Logits
hang
-0.18
Levine
-0.18
puck
-0.17
empt
-0.16
leted
-0.15
les
-0.15
lob
-0.15
AINED
-0.15
legg
-0.15
lets
-0.15
POSITIVE LOGITS
hat
0.21
irtschaft
0.20
rote
0.20
issenschaft
0.20
anj
0.19
istar
0.19
ishes
0.19
tf
0.18
hy
0.18
ipro
0.17
Activations Density 0.142%