INDEX
Explanations
words or phrases related to uncovering or discovering information
references to clues or hints related to mysteries or investigations
New Auto-Interp
Negative Logits
lav
-0.74
rifice
-0.74
rior
-0.71
kus
-0.69
sburgh
-0.68
ategory
-0.65
Turks
-0.64
rik
-0.64
rie
-0.64
roc
-0.63
POSITIVE LOGITS
clue
1.12
hint
0.95
clues
0.90
glean
0.78
hints
0.72
hole
0.71
hig
0.69
wcs
0.67
detector
0.67
hooting
0.66
Activations Density 0.022%