INDEX
Explanations
words related to cutting or separating something into parts
repeated patterns or sequences in the text
New Auto-Interp
Negative Logits
SHIP
-0.69
prisoners
-0.64
inmates
-0.61
embodiments
-0.61
brate
-0.58
labour
-0.58
phyl
-0.57
reconciliation
-0.57
theless
-0.57
gulf
-0.56
POSITIVE LOGITS
ops
1.26
yright
1.01
heet
0.99
icle
0.91
oppers
0.91
imus
0.90
ilon
0.90
icles
0.89
hel
0.88
iate
0.88
Activations Density 0.011%