INDEX
Explanations
occurrences of the word "reveal" and its derivatives
New Auto-Interp
Negative Logits
anim
-0.16
iff
-0.15
755
-0.15
gelmiÅŁ
-0.15
iffin
-0.15
erre
-0.15
ify
-0.14
chu
-0.14
::$
-0.14
avanaugh
-0.14
POSITIVE LOGITS
aled
0.33
aling
0.23
olution
0.23
als
0.22
ILLE
0.20
ãĥ¥ãĥ¼
0.20
ille
0.19
aler
0.18
IEWS
0.18
aged
0.17
Activations Density 0.003%