INDEX
Explanations
phrases mentioning specific names of individuals or locations
names of people, countries, and places
New Auto-Interp
Negative Logits
»Ĵ
-0.89
Norn
-0.74
Marginal
-0.67
STATS
-0.67
IMAGES
-0.67
Archdemon
-0.66
ãĤ£
-0.66
Duffy
-0.66
ãĤµ
-0.64
ãĤ¤ãĥĪ
-0.62
POSITIVE LOGITS
iest
0.75
ÃŃs
0.72
's
0.71
il
0.71
exclusive
0.67
lez
0.64
kind
0.63
ieri
0.62
ottest
0.61
/
0.60
Activations Density 0.228%