INDEX
Explanations
instances where something is being emphasized or pointed out
phrases that suggest causality or connection between concepts
New Auto-Interp
Negative Logits
ramids
-0.68
uku
-0.66
bill
-0.65
mill
-0.65
eer
-0.61
Cub
-0.60
jury
-0.59
Family
-0.59
ione
-0.59
case
-0.58
POSITIVE LOGITS
mattered
0.91
rouse
0.89
distinguishes
0.85
bothers
0.83
prompted
0.80
determines
0.79
drew
0.75
attracts
0.75
inspires
0.74
ozy
0.74
Activations Density 0.119%