INDEX
Explanations
words denoting certainty or emphasis
modal and auxiliary verbs
New Auto-Interp
Negative Logits
inav
-0.76
ares
-0.72
cember
-0.71
hops
-0.68
iku
-0.67
eper
-0.66
VILLE
-0.66
haul
-0.66
stalls
-0.65
unin
-0.65
POSITIVE LOGITS
doi
0.67
latter
0.66
Stein
0.63
Forensic
0.61
nt
0.60
Teresa
0.59
iosyncr
0.59
stricter
0.58
commentators
0.58
sei
0.58
Activations Density 0.086%