INDEX
Explanations
phrases indicating importance or necessity
the word "that" in various contexts throughout the text
New Auto-Interp
Negative Logits
Nay
-0.65
Centers
-0.64
Pass
-0.64
IVERS
-0.62
Directions
-0.59
Occup
-0.58
ucc
-0.58
Gender
-0.58
throp
-0.57
mate
-0.57
POSITIVE LOGITS
soever
0.92
fateful
0.86
carbohyd
0.85
pesky
0.83
mattered
0.81
iago
0.78
surrounds
0.78
ched
0.76
ÃĥÃĤ
0.75
eatures
0.75
Activations Density 0.443%