INDEX
Explanations
phrases related to different types of people
terms related to various categories of people or groups
New Auto-Interp
Negative Logits
SourceFile
-0.92
£ı
-0.67
yss
-0.67
Priest
-0.62
Butt
-0.62
cknow
-0.61
Buk
-0.58
achu
-0.57
Samar
-0.57
?????-
-0.57
POSITIVE LOGITS
hip
1.09
ourcing
0.85
ourced
0.83
pace
0.83
hips
0.78
umers
0.77
'
0.74
']
0.72
/$
0.71
ele
0.71
Activations Density 0.291%