INDEX
Explanations
proper nouns
names and mentions of notable individuals
New Auto-Interp
Negative Logits
lords
-0.90
meal
-0.78
Hussein
-0.75
Demand
-0.71
manship
-0.70
lord
-0.68
Declaration
-0.64
Clockwork
-0.63
Luxem
-0.63
mann
-0.62
POSITIVE LOGITS
rique
0.92
rics
0.89
tered
0.84
icol
0.83
rika
0.77
avid
0.76
icle
0.75
agraph
0.75
icles
0.74
ersen
0.74
Activations Density 0.015%