INDEX
Explanations
headlines or titles containing strong and impactful words
proper nouns and names
New Auto-Interp
Negative Logits
Franks
-0.69
Maker
-0.68
icans
-0.62
naires
-0.61
fox
-0.61
makers
-0.60
Pike
-0.60
ponies
-0.60
Maker
-0.58
coats
-0.58
POSITIVE LOGITS
nown
0.89
entimes
0.86
clair
0.85
withstanding
0.84
¥ŀ
0.83
stantial
0.76
rely
0.75
specified
0.73
lihood
0.72
ntil
0.71
Activations Density 0.288%