INDEX
Explanations
actions or decisions that are seen as mistakes or are associated with wrongdoing
actions or decisions that indicate wrongdoing or mistakes
New Auto-Interp
Negative Logits
periodic
-0.68
assian
-0.65
presses
-0.62
enser
-0.62
ageing
-0.60
traveller
-0.59
rouse
-0.58
ilus
-0.57
resumes
-0.56
refill
-0.56
POSITIVE LOGITS
yesterday
1.25
last
1.15
earlier
1.01
LAST
0.94
ago
0.90
wrong
0.87
didn
0.86
terday
0.85
wrong
0.84
last
0.82
Activations Density 0.548%