INDEX
Explanations
references to palaces or royal residences
palace and mansion
New Auto-Interp
Negative Logits
RN
-0.49
RN
-0.45
Shar
-0.40
USN
-0.40
RW
-0.40
RTE
-0.40
Indian
-0.39
Indian
-0.38
R
-0.38
Tm
-0.37
POSITIVE LOGITS
Palace
1.23
palace
1.17
Palace
1.16
palace
1.10
palaces
1.08
palacio
0.88
Palacios
0.85
Palacio
0.82
Palais
0.81
palais
0.76
Activations Density 0.004%