INDEX
Explanations
names of specific people
references to specific individuals, particularly celebrities and historical figures
New Auto-Interp
Negative Logits
etheless
-0.60
atform
-0.55
sclerosis
-0.55
cytok
-0.54
podcast
-0.54
archived
-0.54
screenshot
-0.52
referen
-0.51
imentary
-0.51
FANTASY
-0.51
POSITIVE LOGITS
Jr
0.68
aka
0.68
Fountain
0.58
dan
0.57
loe
0.57
iani
0.56
aux
0.56
idas
0.55
hoe
0.55
Corona
0.54
Activations Density 0.660%