INDEX
Explanations
instances of the word "find" across various contexts
New Auto-Interp
Negative Logits
assisted
-0.75
ansky
-0.73
agn
-0.71
stroke
-0.67
inion
-0.67
forming
-0.65
jab
-0.65
awar
-0.64
haw
-0.63
idium
-0.62
POSITIVE LOGITS
plenty
1.06
yourself
0.87
yourselves
0.86
lots
0.82
ample
0.81
references
0.79
traces
0.78
them
0.77
fewer
0.75
ourselves
0.74
Activations Density 0.029%