INDEX
Explanations
mentions of fur and related terms
New Auto-Interp
Negative Logits
chk
-0.17
es
-0.16
century
-0.16
aal
-0.16
listed
-0.16
ez
-0.16
lifting
-0.15
egasus
-0.15
y
-0.15
Century
-0.15
POSITIVE LOGITS
thest
0.26
ioso
0.23
iously
0.23
iosa
0.22
thers
0.21
rier
0.21
riers
0.20
fur
0.18
phy
0.18
fur
0.18
Activations Density 0.006%