INDEX
Explanations
names, especially ones that are repeated multiple times
proper nouns, particularly names
New Auto-Interp
Negative Logits
ivals
-0.86
atchewan
-0.81
ainment
-0.75
orters
-0.74
[+
-0.73
ãĥķãĤ©
-0.72
urgical
-0.71
ItemTracker
-0.71
urgy
-0.71
ablishment
-0.68
POSITIVE LOGITS
Dee
1.41
zie
0.88
ples
0.85
Reeves
0.83
leigh
0.80
pling
0.80
pee
0.76
Dodd
0.72
ffe
0.72
ble
0.72
Activations Density 0.007%