INDEX
Explanations
references to names and titles
instances of the apostrophe
New Auto-Interp
Negative Logits
ħĭ
-0.79
ĸļ
-0.72
eclipse
-0.69
etheless
-0.68
isphere
-0.67
nown
-0.65
icka
-0.65
swer
-0.64
sted
-0.64
referen
-0.63
POSITIVE LOGITS
gall
0.75
esp
0.74
Allah
0.72
orange
0.72
alla
0.71
yer
0.70
morrow
0.69
hon
0.69
hair
0.69
expl
0.68
Activations Density 0.011%