INDEX
Explanations
proper nouns related to various locations and people
references to individuals and their actions or roles
New Auto-Interp
Negative Logits
Unle
-0.68
scription
-0.67
raft
-0.65
Speedway
-0.64
1954
-0.64
1904
-0.64
yss
-0.62
ongh
-0.62
Fargo
-0.61
Prompt
-0.61
POSITIVE LOGITS
resided
0.87
'd
0.78
penetrated
0.77
boarded
0.75
reside
0.74
invaded
0.73
interacted
0.73
lived
0.72
resides
0.72
poked
0.71
Activations Density 0.163%