INDEX
Explanations
instances where an expectation is surpassed or not met
instances of the word "expected" and its variations
New Auto-Interp
Negative Logits
tex
-0.73
nan
-0.71
neck
-0.71
manship
-0.71
tha
-0.70
fighting
-0.69
reen
-0.69
vet
-0.69
agra
-0.68
below
-0.68
POSITIVE LOGITS
ORY
0.75
FontSize
0.74
laughter
0.68
ICAL
0.68
unexpected
0.64
IAL
0.64
OSH
0.63
ROR
0.62
spont
0.61
ICAN
0.60
Activations Density 0.031%