INDEX
Explanations
mentions or references to someone named "Rom" with varying contexts
references to the city of Rome or related terms
New Auto-Interp
Negative Logits
insula
-0.76
gettable
-0.75
pressed
-0.68
attendant
-0.68
IELD
-0.67
Ͻ
-0.67
graded
-0.67
ctic
-0.63
weights
-0.62
eyed
-0.62
POSITIVE LOGITS
antics
1.04
ulus
1.03
anes
1.03
ijn
1.02
aji
1.01
atic
0.95
antic
0.94
antically
0.93
ano
0.92
ace
0.91
Activations Density 0.017%