INDEX
Explanations
names of individuals
punctuation, particularly commas
New Auto-Interp
Negative Logits
olves
-0.77
adh
-0.67
outputs
-0.64
animate
-0.61
versive
-0.60
¥µ
-0.60
appropriate
-0.59
ole
-0.59
FIX
-0.58
overs
-0.57
POSITIVE LOGITS
meanwhile
1.22
however
1.15
flanked
1.05
enegger
1.02
who
0.94
moreover
0.94
nicknamed
0.92
along
0.92
whose
0.90
pictured
0.90
Activations Density 0.125%