INDEX
Explanations
descriptions of social status and relationships
New Auto-Interp
Negative Logits
INTERRU
-0.14
енÑĤа
-0.14
mort
-0.14
enstein
-0.14
unaware
-0.14
ParseException
-0.13
-0.13
人æ°Ĺ
-0.13
Zwe
-0.13
Lesser
-0.13
POSITIVE LOGITS
smoker
0.18
gentleman
0.18
.seek
0.17
singles
0.17
Smoke
0.16
ingles
0.16
ozy
0.16
honest
0.16
discrete
0.16
smoke
0.15
Activations Density 0.174%