INDEX
Explanations
phrases with the word "hint"
phrases indicating subtle hints or implications
New Auto-Interp
Negative Logits
frey
-0.71
artney
-0.67
nea
-0.67
bred
-0.67
lux
-0.67
ccording
-0.65
cus
-0.65
CENT
-0.64
fare
-0.63
portion
-0.63
POSITIVE LOGITS
hint
1.52
hints
1.32
clue
0.89
hinted
0.88
glimps
0.83
wink
0.82
warning
0.81
clues
0.80
intim
0.72
ibly
0.71
Activations Density 0.021%