INDEX
Explanations
words related to specific names or identifiers, particularly related to people or potentially sensitive situations
New Auto-Interp
Negative Logits
fare
-0.73
hawks
-0.69
cens
-0.64
pter
-0.63
mosp
-0.63
pron
-0.61
strings
-0.61
Elf
-0.61
toile
-0.61
CES
-0.61
POSITIVE LOGITS
agate
3.12
angan
2.44
olen
1.60
asta
1.59
amon
1.35
olini
1.34
astern
1.29
oso
1.24
arella
1.17
atto
1.13
Activations Density 0.043%