INDEX
Explanations
proper nouns, specifically names and locations
New Auto-Interp
Negative Logits
informée
-0.59
purpoſe
-0.54
pleaſure
-0.51
inappropriés
-0.50
itſelf
-0.49
tığı
-0.48
ſche
-0.48
༘
-0.48
Normdatei
-0.47
Bibliografía
-0.47
POSITIVE LOGITS
ned
0.65
ning
0.60
elli
0.57
nnnn
0.57
nn
0.54
ny
0.52
rin
0.50
nnn
0.49
nin
0.49
ners
0.47
Activations Density 0.553%