INDEX
Explanations
proper nouns associated with sports or competitive activities
quotation marks and their associated contents in the text
New Auto-Interp
Negative Logits
hattan
-0.80
alysis
-0.78
coincide
-0.77
acebook
-0.76
icides
-0.76
aneously
-0.73
awaru
-0.73
ilater
-0.73
incial
-0.71
netflix
-0.70
POSITIVE LOGITS
Sem
0.82
vik
0.79
Ha
0.77
burn
0.76
ham
0.75
Thor
0.75
Gam
0.73
Wilson
0.73
zek
0.73
felt
0.71
Activations Density 0.074%