INDEX
Explanations
instances of the word "was"
New Auto-Interp
Negative Logits
acers
-0.20
eec
-0.16
antly
-0.15
olson
-0.15
flap
-0.15
eus
-0.15
Lod
-0.15
sx
-0.14
acias
-0.14
bart
-0.14
POSITIVE LOGITS
abi
0.27
illa
0.22
abis
0.22
htub
0.22
atch
0.20
ps
0.19
ILLA
0.18
ABI
0.18
abe
0.17
ist
0.17
Activations Density 0.043%