INDEX
Explanations
hints or suggestions
various forms of the word "hint" indicating suggestions or implications
New Auto-Interp
Negative Logits
frey
-0.74
ccording
-0.73
nea
-0.71
ocker
-0.70
artney
-0.69
bred
-0.68
vict
-0.67
Cross
-0.65
vez
-0.64
Kinnikuman
-0.63
POSITIVE LOGITS
hint
1.49
hints
1.36
clue
0.88
clues
0.85
glimps
0.82
hinted
0.82
wink
0.80
llor
0.77
warning
0.73
itives
0.73
Activations Density 0.016%