INDEX
Explanations
words related to detecting or finding clues
New Auto-Interp
Negative Logits
ategory
-0.83
oslav
-0.70
rior
-0.68
atism
-0.68
fare
-0.66
kus
-0.64
rifice
-0.62
neighb
-0.62
È
-0.61
lav
-0.60
POSITIVE LOGITS
clue
1.16
hint
1.14
clues
1.10
hints
0.98
glean
0.95
tale
0.71
illuminate
0.70
pointing
0.70
glimps
0.69
ibly
0.69
Activations Density 0.033%