INDEX
Explanations
references to human-related concepts or attributes
references to human beings and their behaviors
New Auto-Interp
Negative Logits
arella
-0.86
abb
-0.73
anwhile
-0.71
REP
-0.71
ère
-0.70
rypt
-0.70
..........
-0.70
forth
-0.69
illary
-0.69
kick
-0.69
POSITIVE LOGITS
beings
1.25
readable
1.03
oids
1.02
embryonic
0.91
civilization
0.88
itar
0.82
oid
0.82
fingert
0.82
zee
0.81
itarian
0.81
Activations Density 0.024%