INDEX
Explanations
the word "anyone" with a high activation value
mention of the word "anyone."
New Auto-Interp
Negative Logits
Nanto
-0.66
Barg
-0.62
Congo
-0.61
Cast
-0.61
efficient
-0.60
ritz
-0.59
pa
-0.59
Maze
-0.59
Hound
-0.59
Labor
-0.58
POSITIVE LOGITS
else
1.51
THING
1.24
Else
1.10
omever
0.99
Else
0.99
soever
0.97
else
0.95
ĪĴ
0.88
zzle
0.87
imaginable
0.83
Activations Density 0.016%