INDEX
Explanations
negations, particularly the word "isn't."
New Auto-Interp
Negative Logits
Slf
-0.72
does
-0.70
-
-0.65
does
-0.65
did
-0.64
i
-0.61
EdgeInsets
-0.61
DES
-0.60
dos
-0.60
\
-0.60
POSITIVE LOGITS
raiſ
1.10
Anſ
0.99
itſelf
0.96
ſever
0.95
...');
0.93
ſind
0.92
Eſ
0.92
iſt
0.87
faſt
0.87
myſelf
0.86
Activations Density 0.032%