INDEX
Explanations
proper nouns and verbs indicating actions or accomplishments
verbs indicating actions or states of being
New Auto-Interp
Negative Logits
olini
-0.64
rick
-0.64
lis
-0.63
PDATE
-0.62
liner
-0.61
zel
-0.59
zan
-0.57
Enlarge
-0.57
SON
-0.55
ston
-0.55
POSITIVE LOGITS
a
1.10
a
1.08
A
0.88
A
0.79
an
0.74
aa
0.71
ai
0.70
Ãł
0.69
Ãł
0.67
ae
0.66
Activations Density 0.205%