INDEX
Explanations
proper nouns related to people or fictional characters
names and familial relationships
New Auto-Interp
Negative Logits
clarity
-0.74
understandable
-0.73
ibaba
-0.72
contrast
-0.68
umph
-0.66
azeera
-0.66
reassuring
-0.66
Pwr
-0.65
emphasizing
-0.63
olulu
-0.63
POSITIVE LOGITS
belonged
1.28
existed
1.20
had
1.15
hadn
1.10
resided
1.10
exists
1.07
belongs
1.06
was
1.03
originated
1.02
perished
0.99
Activations Density 0.543%