INDEX
Explanations
proper nouns or names of places, products, and people
references to specific people, organizations, or concepts
New Auto-Interp
Negative Logits
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
-0.66
©¶æ
-0.63
)]
-0.60
said
-0.59
Topics
-0.59
)]
-0.59
upon
-0.56
Registered
-0.56
IMAGES
-0.55
please
-0.55
POSITIVE LOGITS
itself
0.71
vertisement
0.69
*.
0.66
!!!!!!!!
0.65
interstitial
0.64
oooooooo
0.60
!:
0.56
.
0.56
().
0.54
Incarn
0.54
Activations Density 0.946%