INDEX
Explanations
proper names or specific entities, such as names of people or places
names and specific terms associated with locations and characters
New Auto-Interp
Negative Logits
dan
-0.72
mits
-0.70
rien
-0.70
nan
-0.70
making
-0.70
RESULTS
-0.70
NOR
-0.69
cru
-0.69
Cru
-0.69
MIT
-0.68
POSITIVE LOGITS
usk
0.91
wana
0.91
ulu
0.90
iple
0.82
zar
0.80
wark
0.76
aiman
0.76
elta
0.76
arde
0.75
Nadu
0.74
Activations Density 0.008%