INDEX
Explanations
positive anecdotes or sentiments
sentences ending in a period
New Auto-Interp
Negative Logits
uly
-0.69
thal
-0.68
arov
-0.67
helper
-0.65
correct
-0.64
apan
-0.64
ãĥķãĤ¡
-0.64
©¶æ
-0.63
dimensional
-0.62
defe
-0.62
POSITIVE LOGITS
Occasionally
1.05
Worse
1.04
Visitors
1.01
But
0.98
Amid
0.98
Among
0.96
Whereas
0.96
Their
0.95
Fortunately
0.95
Those
0.94
Activations Density 0.934%