INDEX
Explanations
names of people or places
mentions of specific individuals or characters
New Auto-Interp
Negative Logits
xual
-0.79
geist
-0.70
indicative
-0.68
cemic
-0.65
comparative
-0.65
crawl
-0.65
ergy
-0.63
HCR
-0.63
prevailing
-0.62
dose
-0.62
POSITIVE LOGITS
ril
1.13
rolet
0.93
ota
0.88
arna
0.88
aji
0.87
rons
0.87
lov
0.84
atars
0.84
anas
0.82
iva
0.81
Activations Density 0.008%