INDEX
Explanations
names of persons
references to specific individuals and their connections
New Auto-Interp
Negative Logits
iku
-0.93
ector
-0.78
sha
-0.73
idth
-0.68
anka
-0.64
silver
-0.62
unes
-0.62
akery
-0.62
awar
-0.60
rets
-0.60
POSITIVE LOGITS
More
1.68
more
1.59
more
1.54
More
1.51
less
1.41
Less
1.36
MORE
1.36
fewer
1.30
Less
1.20
MORE
1.10
Activations Density 0.169%