INDEX
Explanations
phrases related to historical figures or religious saints
New Auto-Interp
Negative Logits
PT
-0.79
HO
-0.71
razil
-0.68
ORN
-0.64
itsch
-0.64
PN
-0.64
BIP
-0.63
clips
-0.62
OTH
-0.62
raltar
-0.61
POSITIVE LOGITS
Laurent
1.05
Clair
0.95
Louis
0.95
Augustine
0.94
Lucia
0.93
Petersburg
0.90
Francis
0.88
clair
0.86
Jude
0.81
onew
0.81
Activations Density 0.019%