INDEX
Explanations
clauses introducing contrast
New Auto-Interp
Negative Logits
שא
0.52
薷
0.50
ס
0.49
eresa
0.48
ENSION
0.48
ומים
0.47
ותו
0.47
ד
0.47
cháu
0.46
Subscription
0.45
POSITIVE LOGITS
tinge
0.47
decoy
0.47
sites
0.46
roadblock
0.46
cross
0.46
mass
0.45
yni
0.45
lead
0.45
;
0.44
فقط
0.44
Activations Density 0.003%