INDEX
Explanations
references to specific events or actions in a narrative context
New Auto-Interp
Negative Logits
bilt
-0.17
swick
-0.17
plain
-0.16
efeller
-0.15
vale
-0.15
itler
-0.15
tainment
-0.15
rais
-0.14
ulous
-0.14
works
-0.14
POSITIVE LOGITS
ness
0.20
Ù
0.19
NESS
0.18
naments
0.18
latter
0.16
nown
0.16
plevel
0.15
alien
0.14
-être
0.14
/not
0.14
Activations Density 0.445%