INDEX
Explanations
general statements or facts related to people
references to the general perceptions or experiences of people
New Auto-Interp
Negative Logits
ĪĴ
-0.75
blackmail
-0.63
unification
-0.61
Lithuania
-0.61
hua
-0.60
teammate
-0.60
enegger
-0.59
uniqueness
-0.58
destiny
-0.58
timely
-0.57
POSITIVE LOGITS
imaginable
0.95
EVER
0.79
except
0.77
Cent
0.72
ergic
0.71
alike
0.70
immune
0.70
WER
0.69
dom
0.68
revolves
0.68
Activations Density 0.202%