INDEX
Explanations
references to a specific organization called "Shin Bet."
references to the Shin Bet
New Auto-Interp
Negative Logits
anwhile
-0.82
utics
-0.76
aneers
-0.73
rency
-0.71
utic
-0.70
gently
-0.70
ENCE
-0.68
enance
-0.68
ENC
-0.67
nces
-0.67
POSITIVE LOGITS
obi
1.02
eless
0.88
ook
0.87
omi
0.86
ning
0.85
wa
0.83
azi
0.82
Shin
0.81
aman
0.80
ichi
0.80
Activations Density 0.017%