INDEX
Explanations
references to the speaker or writer
references to personal experiences and identity
New Auto-Interp
Negative Logits
Beir
-0.78
Equality
-0.74
HRC
-0.70
Finish
-0.70
recess
-0.63
Parenthood
-0.63
Mb
-0.62
Doors
-0.59
vacancy
-0.59
Rape
-0.58
POSITIVE LOGITS
imei
1.05
adow
1.04
andering
1.01
adows
0.97
aning
0.95
hers
0.94
atics
0.94
atic
0.94
ubi
0.89
anders
0.88
Activations Density 0.114%