INDEX
Explanations
phrases related to causality or influence
instances of the word "to," indicating a focus on expressions of purpose or intention
New Auto-Interp
Negative Logits
contrace
-0.70
tucked
-0.70
handled
-0.68
Pixel
-0.67
geared
-0.67
toured
-0.67
todd
-0.66
cared
-0.65
headlined
-0.65
touched
-0.65
POSITIVE LOGITS
icial
0.82
ãĥĨãĤ£
0.73
extinction
0.70
ãĤ´ãĥ³
0.66
minist
0.66
ãĥĩãĤ£
0.66
ym
0.66
breakthrough
0.66
obin
0.65
ournal
0.65
Activations Density 0.088%