INDEX
Explanations
phrases describing actions or states of an individual
information about a person's early life and personal history
New Auto-Interp
Negative Logits
emale
-0.75
Composite
-0.65
selves
-0.65
ogether
-0.65
discrep
-0.63
anwhile
-0.58
nesday
-0.58
[*
-0.56
respectively
-0.56
aminer
-0.54
POSITIVE LOGITS
himself
0.61
Minecraft
0.55
cffffcc
0.54
apolog
0.53
solo
0.51
Nasa
0.51
Annotations
0.49
CVE
0.49
zbollah
0.49
ihad
0.48
Activations Density 0.575%