INDEX
Explanations
the phrase "After all"
phrases expressing conclusions or summaries
New Auto-Interp
Negative Logits
lav
-0.71
Kamp
-0.63
cept
-0.62
nom
-0.60
Schwe
-0.58
fman
-0.58
utsche
-0.57
eele
-0.57
grad
-0.57
ritic
-0.56
POSITIVE LOGITS
ocating
1.06
igator
1.00
igators
0.97
iances
0.93
kinds
0.90
iance
0.89
sorts
0.87
uding
0.84
udes
0.83
owing
0.80
Activations Density 0.140%