INDEX
Explanations
references to religious concepts and figures
New Auto-Interp
Negative Logits
僕は
-0.47
僕が
-0.44
dios
-0.42
god
-0.40
anda
-0.39
僕の
-0.38
僕
-0.38
deines
-0.37
dieu
-0.35
僕も
-0.35
POSITIVE LOGITS
His
2.47
His
1.98
Его
1.64
Himself
1.63
祂
1.59
He
1.59
Him
1.49
Jego
1.34
Zijn
1.29
He
1.16
Activations Density 0.460%