INDEX
Explanations
references to the intelligence agency "Shin Bet."
references to the Shin Bet, the Israeli Security Agency
New Auto-Interp
Negative Logits
utic
-0.76
ALS
-0.64
utics
-0.63
getic
-0.62
Interstitial
-0.61
Cue
-0.61
halves
-0.60
breast
-0.59
mileage
-0.59
correctness
-0.59
POSITIVE LOGITS
ichi
1.16
pei
1.09
obi
1.00
hei
0.96
omi
0.94
etsu
0.93
jin
0.86
ji
0.84
atsu
0.84
Tok
0.83
Activations Density 0.041%