INDEX
Explanations
expressions of hope or uncertainty regarding future developments or plans
New Auto-Interp
Negative Logits
dehuman
-0.67
pearl
-0.61
odied
-0.61
misogyn
-0.60
discriminated
-0.60
indistinguishable
-0.58
Godd
-0.57
seless
-0.56
falsely
-0.56
vulgar
-0.55
POSITIVE LOGITS
timetable
0.95
2019
0.89
TBD
0.86
meantime
0.86
catentry
0.83
hopefully
0.82
Hopefully
0.81
deadlines
0.80
deadline
0.80
TBA
0.77
Activations Density 0.665%