INDEX
Explanations
articles and indefinite pronouns
New Auto-Interp
Negative Logits
nuts
-0.80
auts
-0.75
fires
-0.73
dayName
-0.73
Orn
-0.67
marches
-0.67
chiefs
-0.66
scraps
-0.66
absor
-0.63
attachments
-0.63
POSITIVE LOGITS
ural
0.84
eson
0.80
uras
0.79
uster
0.77
urt
0.75
ë
0.74
ria
0.74
versive
0.74
endum
0.74
ether
0.73
Activations Density 0.051%