INDEX
Explanations
instances of the word "exception" or its variations in text
references to exceptions and deviations from rules or norms
New Auto-Interp
Negative Logits
destro
-0.70
rall
-0.62
goodbye
-0.60
ching
-0.60
pestic
-0.60
irrad
-0.59
roying
-0.58
raz
-0.57
sonian
-0.57
Beet
-0.57
POSITIVE LOGITS
arily
0.88
ional
0.84
perty
0.82
ality
0.82
ĸļ
0.81
als
0.79
rules
0.79
izzle
0.78
alties
0.76
exceptions
0.72
Activations Density 0.034%