INDEX
Explanations
specific words or phrases related to historical events and political discussions
New Auto-Interp
Negative Logits
ramids
-0.71
uku
-0.68
eer
-0.65
any
-0.61
mill
-0.61
jury
-0.60
bill
-0.59
case
-0.59
Family
-0.59
ione
-0.59
POSITIVE LOGITS
mattered
0.97
prompted
0.87
distinguishes
0.87
bothers
0.86
determines
0.85
rouse
0.83
drew
0.80
bothered
0.78
inspires
0.76
prompts
0.75
Activations Density 0.088%