INDEX
Explanations
occurrences of specific characters or punctuation marks, particularly apostrophes
New Auto-Interp
Negative Logits
wallets
-0.68
Wash
-0.68
peanuts
-0.65
othy
-0.60
Morse
-0.60
stacks
-0.59
Livingston
-0.59
Lars
-0.59
Avery
-0.58
Crosby
-0.58
POSITIVE LOGITS
oeuv
1.02
ét
0.97
eros
0.89
euro
0.88
avez
0.88
ava
0.79
ê
0.79
hist
0.78
esp
0.78
hab
0.77
Activations Density 0.006%