INDEX
Explanations
names of people, likely related to news or criminal activities
proper nouns, specifically names of individuals
New Auto-Interp
Negative Logits
================================================================
-0.66
insightful
-0.66
illuminating
-0.63
cz
-0.63
innov
-0.63
Introdu
-0.62
ularity
-0.62
infographic
-0.61
philosophers
-0.61
broaden
-0.60
POSITIVE LOGITS
pleaded
1.17
surrendered
1.10
died
1.06
Jr
1.00
suffered
0.96
disappeared
0.95
stabbed
0.93
vanished
0.93
texted
0.93
underwent
0.91
Activations Density 0.279%