INDEX
Explanations
names, particularly those associated with political figures and events
New Auto-Interp
Negative Logits
iances
-0.70
icals
-0.64
photon
-0.64
Covenant
-0.63
ãĥĩãĤ£
-0.60
variance
-0.60
Lantern
-0.58
oubted
-0.58
iqueness
-0.57
surn
-0.57
POSITIVE LOGITS
lette
1.15
au
0.96
ff
0.94
vous
0.92
lett
0.87
lete
0.86
grad
0.85
lect
0.85
tti
0.80
ugal
0.80
Activations Density 0.006%