INDEX
Explanations
negative references or sentiments towards familial relationships
brother-in-law / daughter-in-law
New Auto-Interp
Negative Logits
.
-0.40
หวัด
-0.36
Competition
-0.34
Occupation
-0.34
The
-0.34
Portail
-0.33
HideFlags
-0.32
ceci
-0.32
explotación
-0.31
abestanden
-0.31
POSITIVE LOGITS
ſch
0.71
ſta
0.69
ſind
0.66
Conſ
0.66
ſelves
0.63
pleaſure
0.63
Inſ
0.63
ſte
0.62
purpoſe
0.62
edipus
0.61
Activations Density 0.010%