INDEX
Explanations
explaining concepts or questions
New Auto-Interp
Negative Logits
Seeing
1.45
Seeing
1.37
However
1.29
That
1.25
seeing
1.24
Looking
1.24
Hearing
1.24
However
1.23
Hearing
1.23
That
1.20
POSITIVE LOGITS
something
0.98
things
0.87
thing
0.86
something
0.81
cosa
0.79
there
0.78
coisa
0.74
algo
0.71
things
0.66
openide
0.64
Activations Density 0.002%