INDEX
Explanations
words related to institutions, organizations, and proper nouns
words related to specific names or titles, particularly those associated with Bantam publications
New Auto-Interp
Negative Logits
holes
-0.83
ionage
-0.72
wich
-0.72
sclerosis
-0.68
hole
-0.68
ubiqu
-0.65
Mandela
-0.63
umpy
-0.62
happiest
-0.62
embr
-0.62
POSITIVE LOGITS
avan
0.92
antam
0.92
ãĥ¥
0.90
weights
0.84
ategory
0.83
chio
0.81
agh
0.81
ãĥĥãĤ¯
0.80
ãĥ£
0.79
pole
0.77
Activations Density 0.034%