INDEX
Explanations
proper nouns and names
references to identification or personal details
New Auto-Interp
Negative Logits
naissance
-0.62
Annotations
-0.61
PDATE
-0.61
atten
-0.59
prising
-0.57
senal
-0.56
151
-0.55
cffff
-0.54
attering
-0.54
whichever
-0.53
POSITIVE LOGITS
is
1.33
is
1.15
IS
1.02
Is
0.99
isn
0.95
are
0.93
was
0.91
are
0.80
isin
0.77
Is
0.77
Activations Density 0.241%