INDEX
Explanations
references to Roman cultural elements and terms
New Auto-Interp
Negative Logits
Efq
-0.95
JoJo
-0.94
Monfieur
-0.90
Gerd
-0.89
CreateTagHelper
-0.88
Volga
-0.87
pleaſure
-0.87
Psyche
-0.86
Keim
-0.86
otheses
-0.84
POSITIVE LOGITS
rom
0.86
Rom
0.82
rom
0.79
Rom
0.79
ROM
0.78
anz
0.77
Romero
0.73
𝟱
0.73
ROM
0.70
Roman
0.68
Activations Density 0.525%