INDEX
Explanations
mentions of the word "rab" or variations of it
references to specific individuals or entities associated with political discourse
New Auto-Interp
Negative Logits
éĹĺ
-0.69
Tigers
-0.69
Ceres
-0.67
Resurrection
-0.66
Occupations
-0.65
Swordsman
-0.64
appropriation
-0.64
Spartan
-0.64
Tsukuyomi
-0.63
Spartans
-0.63
POSITIVE LOGITS
ble
1.00
inson
0.98
bing
0.96
iotics
0.93
ozo
0.89
anus
0.88
ody
0.87
orough
0.83
owitz
0.80
rab
0.79
Activations Density 0.016%