INDEX
Explanations
phrases indicating ineffectiveness or lack of impact
phrases indicating a lack of effectiveness or minimal impact
New Auto-Interp
Negative Logits
Fever
-0.67
illi
-0.64
Sah
-0.63
icas
-0.63
eez
-0.62
owsky
-0.62
issued
-0.61
uman
-0.61
racuse
-0.61
gypt
-0.61
POSITIVE LOGITS
harm
1.13
damage
0.97
outreach
0.88
wrong
0.84
homework
0.84
damage
0.81
mischief
0.80
wrong
0.80
groundwork
0.75
research
0.72
Activations Density 0.060%