INDEX
Explanations
names or keywords related to specific entities, possibly with a focus on names that are not common nouns
proper nouns and specific names
New Auto-Interp
Negative Logits
STATS
-0.68
innocence
-0.60
Cinderella
-0.60
meditation
-0.60
Reviewer
-0.59
Ou
-0.59
Constructed
-0.58
LSD
-0.58
attendant
-0.58
preservation
-0.58
POSITIVE LOGITS
edo
1.08
oshenko
1.04
obl
0.87
nel
0.87
asury
0.86
opot
0.80
hiba
0.80
arkin
0.79
oren
0.79
flix
0.78
Activations Density 0.097%