INDEX
Explanations
proper nouns related to persons or places
references to "Islam" and related terms
New Auto-Interp
Negative Logits
bats
-0.76
Zot
-0.60
corrid
-0.58
session
-0.56
Rockies
-0.55
ACTIONS
-0.54
Bloody
-0.54
succeeding
-0.54
incent
-0.54
lodge
-0.53
POSITIVE LOGITS
ibo
0.87
orio
0.86
iaz
0.81
uable
0.76
omal
0.74
auri
0.71
hya
0.70
STEM
0.70
wer
0.69
eni
0.69
Activations Density 0.066%