INDEX
Explanations
phrases that introduce or explain visual content
phrases suggesting the reader's perspective or engagement in an explanation
New Auto-Interp
Negative Logits
emate
-0.64
Ducks
-0.62
Deng
-0.61
lasting
-0.61
rang
-0.60
Pistons
-0.60
enberg
-0.59
wr
-0.59
wcs
-0.59
Penguins
-0.59
POSITIVE LOGITS
guessed
0.76
Know
0.72
yourselves
0.70
realise
0.67
Figure
0.64
paraph
0.64
wont
0.64
:{0.63
Know
0.63
insure
0.60
Activations Density 0.060%