INDEX
Explanations
words related to humans being organized into groups, whether it be ethnic groups, political organizations, medical patients, etc
New Auto-Interp
Negative Logits
ProtoMessage
-0.84
INSEE
-0.83
Theſe
-0.81
IVEREF
-0.79
Välislingid
-0.76
lenker
-0.75
HasFactory
-0.75
utafitiHapana
-0.73
pinulongan
-0.73
שוליים
-0.72
POSITIVE LOGITS
<eos>
0.64
that
0.54
feared
0.50
you
0.47
"
0.46
becoming
0.45
名は
0.45
werden
0.44
望
0.44
↵↵
0.43
Activations Density 3.850%