INDEX
Explanations
phrases related to seriousness or severity
instances of the word "serious."
New Auto-Interp
Negative Logits
atu
-0.87
wright
-0.83
enaries
-0.80
tein
-0.79
ucky
-0.77
eez
-0.76
av
-0.71
via
-0.69
orius
-0.69
iver
-0.67
POSITIVE LOGITS
lly
0.92
serious
0.88
serious
0.87
enough
0.81
dent
0.80
consideration
0.77
mond
0.76
seriousness
0.74
enough
0.73
contender
0.72
Activations Density 0.029%