INDEX
Explanations
words related to being an exceptional case or not following a general rule
phrases related to exceptions to rules or generalizations
New Auto-Interp
Negative Logits
destro
-0.72
DCS
-0.71
espie
-0.61
goodbye
-0.60
juven
-0.60
Progress
-0.58
tyr
-0.58
ching
-0.57
yang
-0.57
ched
-0.56
POSITIVE LOGITS
arily
0.89
ĸļ
0.88
ality
0.87
ional
0.84
als
0.80
exceptions
0.80
abl
0.79
aux
0.75
exception
0.72
izzle
0.71
Activations Density 0.051%