INDEX
Explanations
phrases and words related to repetition
New Auto-Interp
Negative Logits
akin
-0.17
-0.17
smith
-0.16
../
-0.16
bell
-0.16
quier
-0.15
còn
-0.15
rey
-0.15
er
-0.15
gren
-0.14
POSITIVE LOGITS
ively
0.20
itious
0.20
able
0.19
itive
0.19
offenders
0.18
pattern
0.17
patterns
0.17
sclerosis
0.17
patterns
0.17
pattern
0.17
Activations Density 0.032%