INDEX
Explanations
references to historical figures and their roles
New Auto-Interp
Negative Logits
-0.59
GIVEREF
-0.58
boyfriend
-0.52
jeans
-0.50
InSection
-0.47
curé
-0.47
Supermarket
-0.45
WriteTagHelper
-0.45
girlfriend
-0.45
θρω
-0.45
POSITIVE LOGITS
palace
1.07
Palace
0.93
imperial
0.89
court
0.87
royal
0.84
Palace
0.83
palace
0.79
Imperial
0.76
courtier
0.75
courtiers
0.73
Activations Density 0.295%