INDEX
Explanations
phrases related to legal and medical terms
occurrences of the word "the"
New Auto-Interp
Negative Logits
thood
-0.78
Ò
-0.77
because
-0.68
bg
-0.68
.
-0.68
ornings
-0.68
icia
-0.67
!!!
-0.67
xt
-0.66
verage
-0.66
POSITIVE LOGITS
slightest
1.39
latter
1.30
majority
1.20
vast
1.11
entire
1.08
remainder
1.08
heaviest
1.05
same
1.05
biggest
1.04
greatest
1.03
Activations Density 0.424%