INDEX
Explanations
names of specific individuals
names of individuals and their associated actions or statements
New Auto-Interp
Negative Logits
ataka
-0.71
ipedia
-0.68
umbers
-0.68
onto
-0.67
uminium
-0.65
enegger
-0.62
ould
-0.62
isure
-0.61
ickr
-0.61
gorithm
-0.61
POSITIVE LOGITS
She
0.79
Quan
0.78
Born
0.77
Vent
0.72
Bott
0.71
Gou
0.71
Moor
0.71
Pt
0.71
Flav
0.70
Schwarz
0.70
Activations Density 1.392%