INDEX
Explanations
instances of the word "refer" and its variations
New Auto-Interp
Negative Logits
cffff
-0.68
herical
-0.65
oppable
-0.64
urity
-0.63
whiff
-0.60
stride
-0.60
foothold
-0.59
alach
-0.57
stacked
-0.56
anch
-0.56
POSITIVE LOGITS
rers
0.80
entious
0.80
irect
0.77
ãĥĥ
0.72
itatively
0.70
queries
0.68
thereto
0.68
ragon
0.67
geon
0.66
questions
0.65
Activations Density 0.014%