INDEX
Explanations
proper nouns related to names or locations
mentions of the word "worth."
New Auto-Interp
Negative Logits
PDATE
-0.71
Nou
-0.66
Ars
-0.64
++++++++++++++++
-0.64
âĸ¬âĸ¬
-0.61
verbs
-0.61
heartbeat
-0.60
iolet
-0.59
Quit
-0.58
Verge
-0.58
POSITIVE LOGITS
sworth
1.08
sung
0.97
umes
0.93
borough
0.91
worth
0.89
ume
0.89
nesses
0.88
endor
0.87
iness
0.86
ages
0.85
Activations Density 0.018%