INDEX
Explanations
terms related to concentration camps and focus on specific phrases related to them
references to concentration camps
New Auto-Interp
Negative Logits
ĪĴ
-0.80
aired
-0.75
facing
-0.71
Logged
-0.71
mic
-0.71
posted
-0.70
ambassadors
-0.70
\\\\\\\\
-0.70
named
-0.68
anne
-0.67
POSITIVE LOGITS
concentration
0.98
Concent
0.80
itated
0.80
emetery
0.76
Pru
0.76
itate
0.75
concentrated
0.74
itating
0.73
arnaev
0.72
inct
0.72
Activations Density 0.051%