INDEX
Explanations
abbreviations for various items or categories
references to common items or categories indicated by "etc."
New Auto-Interp
Negative Logits
adows
-0.66
Lords
-0.61
mp
-0.61
skies
-0.61
shadow
-0.61
gd
-0.60
parliament
-0.58
poke
-0.58
backyard
-0.57
irlfriend
-0.57
POSITIVE LOGITS
etc
1.27
etc
1.08
gow
0.91
.?
0.88
indo
0.84
icter
0.84
imei
0.84
eter
0.83
.,
0.83
querque
0.83
Activations Density 0.023%