INDEX
Explanations
proper nouns related to military or historical figures
references to specific characters and themes related to monks
New Auto-Interp
Negative Logits
alez
-1.02
horizont
-0.85
IUM
-0.80
ãĥ¤
-0.79
ocial
-0.73
ortmund
-0.72
acio
-0.71
ators
-0.70
itional
-0.68
itaire
-0.67
POSITIVE LOGITS
ucket
0.93
letcher
0.81
aughs
0.75
mare
0.73
mares
0.72
ucky
0.72
Phant
0.68
OWN
0.68
orld
0.68
Fletcher
0.68
Activations Density 0.064%