INDEX
Explanations
repeated instances of the word "simple" in various contexts
New Auto-Interp
Negative Logits
litt
-0.15
ξε
-0.15
um
-0.15
ONA
-0.15
ET
-0.14
uste
-0.14
inf
-0.14
rix
-0.14
Sou
-0.14
meer
-0.14
POSITIVE LOGITS
°}
0.15
oyer
0.15
#
0.14
gend
0.14
vir
0.14
celik
0.14
@js
0.14
catalog
0.14
cul
0.14
æ³³
0.14
Activations Density 0.009%