INDEX
Explanations
instances of the word "intro" in various contexts
New Auto-Interp
Negative Logits
Millennium
-0.76
Ivory
-0.65
Sutherland
-0.63
Liang
-0.62
Shogun
-0.60
Pru
-0.58
Franks
-0.58
Laurent
-0.58
precincts
-0.57
Goods
-0.57
POSITIVE LOGITS
spection
1.80
spective
1.66
verted
1.59
spect
1.23
verts
1.23
vert
1.21
version
1.14
verting
1.06
jection
1.04
hibited
0.94
Activations Density 0.004%