INDEX
Explanations
references to locations or nationalities
New Auto-Interp
Negative Logits
bound
-0.61
ez
-0.61
ALT
-0.61
birth
-0.59
ept
-0.59
drawn
-0.57
blood
-0.57
ceptive
-0.57
compassionate
-0.56
States
-0.55
POSITIVE LOGITS
oglu
0.88
theless
0.82
owsky
0.79
;;;;;;;;;;;;
0.77
ueller
0.77
aja
0.77
sov
0.76
ulhu
0.75
nikov
0.75
oulos
0.75
Activations Density 0.065%