INDEX
Explanations
personal reflection or commentary within sentences
phrases expressing concern or caution
New Auto-Interp
Negative Logits
targ
-0.80
è¦ļéĨĴ
-0.79
ipers
-0.78
area
-0.71
rir
-0.68
artney
-0.66
raq
-0.66
explan
-0.64
faintly
-0.63
battle
-0.63
POSITIVE LOGITS
somew
0.69
Penguin
0.69
none
0.66
neither
0.65
chery
0.65
ignorance
0.62
nob
0.61
imaru
0.61
Meow
0.61
injuries
0.59
Activations Density 0.132%