INDEX
Explanations
the word "ink" followed by a number
New Auto-Interp
Negative Logits
ãĥ£
-0.81
è¦ļéĨĴ
-0.78
ccording
-0.74
blance
-0.69
ODUCT
-0.65
ppa
-0.64
goose
-0.63
APH
-0.61
ignty
-0.60
Interstitial
-0.59
POSITIVE LOGITS
erman
1.10
ery
1.06
ering
1.03
edin
0.98
ansen
0.97
wink
0.95
ert
0.90
erm
0.89
enstein
0.87
owski
0.86
Activations Density 0.023%