INDEX
Explanations
Roman numerals
references to specific historical or cultural contexts and their implications
New Auto-Interp
Negative Logits
!".
-0.66
torpedo
-0.64
Rhodes
-0.64
Newark
-0.62
$.
-0.62
yacht
-0.62
.")
-0.60
."
-0.59
Jonah
-0.58
'.
-0.56
POSITIVE LOGITS
âĢ
1.72
âĢ
1.40
âĢł
1.21
âĶ
1.11
âī
0.96
â
0.95
ãĢ
0.93
�
0.90
âģ
0.89
Â
0.88
Activations Density 0.766%