INDEX
Explanations
instances of the word "the" in the text
assignments and values
New Auto-Interp
Negative Logits
GenerationType
-0.62
AnchorStyles
-0.54
Audrey
-0.50
helicópter
-0.50
Badger
-0.49
mujer
-0.49
ihnach
-0.47
niña
-0.47
hloromethane
-0.47
Butterfly
-0.46
POSITIVE LOGITS
=
0.94
)=
0.82
>=</
0.79
]=
0.79
))=
0.76
})=
0.74
\}=
0.73
|=
0.73
]]=
0.72
}=
0.72
Activations Density 0.001%