INDEX
Explanations
proper nouns related to names
names or terms associated with individuals involved in significant contexts
New Auto-Interp
Negative Logits
Rated
-0.79
andals
-0.70
rations
-0.70
urgy
-0.68
ittal
-0.68
Interstitial
-0.67
apping
-0.65
uers
-0.65
Balt
-0.64
apped
-0.63
POSITIVE LOGITS
hee
1.43
gee
1.38
ffe
0.96
zee
0.88
zza
0.81
ze
0.78
jee
0.78
Wee
0.78
erness
0.77
zeb
0.77
Activations Density 0.011%