INDEX
Explanations
entities related to specific people like "Monroe" and "Lyon"
proper nouns, particularly names of people and places
New Auto-Interp
Negative Logits
arded
-0.78
unks
-0.74
arding
-0.74
abor
-0.73
reddits
-0.72
rets
-0.70
gravity
-0.70
rogen
-0.70
uden
-0.69
abytes
-0.68
POSITIVE LOGITS
street
0.91
alties
0.91
Monroe
0.87
Lyon
0.82
court
0.80
selves
0.79
hurst
0.76
Superior
0.75
ville
0.71
ette
0.70
Activations Density 0.038%