INDEX
Explanations
mentions of historical events or discussions surrounding Jewish identity and persecution
New Auto-Interp
Negative Logits
was
-0.28
Was
-0.25
wasn
-0.25
Was
-0.24
_was
-0.23
isnt
-0.22
was
-0.21
isn
-0.21
Isn
-0.18
conver
-0.18
POSITIVE LOGITS
are
0.83
aren
0.62
_are
0.55
were
0.54
ARE
0.53
Are
0.52
Are
0.50
are
0.48
são
0.46
.are
0.45
Activations Density 3.793%