INDEX
Explanations
repeated instances of the word "was" in various contexts
New Auto-Interp
Negative Logits
iem
-0.17
hiba
-0.15
nal
-0.15
vailability
-0.14
etter
-0.14
bservable
-0.14
oen
-0.14
iom
-0.13
adele
-0.13
now
-0.13
POSITIVE LOGITS
nt
0.21
originally
0.19
/is
0.18
origin
0.17
ı
0.15
abi
0.15
orderid
0.14
ÙĨ
0.14
ãĤĬ
0.14
enville
0.14
Activations Density 0.301%