INDEX
Explanations
references to the intelligence agency "Shin Bet"
references to the organization Shin Bet
New Auto-Interp
Negative Logits
anwhile
-0.86
ENCE
-0.76
ENC
-0.74
utics
-0.73
enance
-0.72
rency
-0.71
theless
-0.69
gently
-0.69
utic
-0.68
nces
-0.67
POSITIVE LOGITS
obi
1.01
omi
0.92
ning
0.86
Shin
0.84
eless
0.84
jin
0.82
ikuman
0.81
mare
0.77
wa
0.77
ook
0.75
Activations Density 0.026%