INDEX
Explanations
the word "one" with a high activation level
the repetition of the word "one."
New Auto-Interp
Negative Logits
ooks
-0.68
lov
-0.66
actionGroup
-0.64
inas
-0.60
Available
-0.59
Ec
-0.58
cores
-0.58
Frames
-0.58
gif
-0.58
inders
-0.57
POSITIVE LOGITS
Hundred
0.84
dimensional
0.79
rency
0.79
hundred
0.79
thing
0.73
oxide
0.71
Thousand
0.69
thousand
0.68
person
0.67
ones
0.66
Activations Density 0.129%