INDEX
Explanations
words related to uncovering or unveiling information
occurrences of the word "reveals."
New Auto-Interp
Negative Logits
peaceful
-0.64
hare
-0.62
Handle
-0.62
safe
-0.60
automatic
-0.60
reserve
-0.60
aterasu
-0.60
retired
-0.59
handle
-0.57
blanket
-0.57
POSITIVE LOGITS
reveals
3.13
confirms
2.06
exposes
1.71
shows
1.68
demonstrates
1.66
tells
1.63
suggests
1.62
discl
1.61
proves
1.60
explains
1.59
Activations Density 0.020%