INDEX
Explanations
proper nouns and complex phrases that aren't part of everyday conversation
New Auto-Interp
Negative Logits
Neutral
-0.67
ashtra
-0.63
azeera
-0.63
Leban
-0.62
Nort
-0.59
outhern
-0.59
guiActiveUn
-0.56
Moroc
-0.55
advisory
-0.55
ettlement
-0.55
POSITIVE LOGITS
ifies
1.59
izes
1.52
uates
1.36
ulates
1.34
itates
1.29
tains
1.28
ends
1.26
iates
1.25
ses
1.23
ates
1.22
Activations Density 0.322%