INDEX
Explanations
statements mentioning actions or events as reported or believed to have occurred
phrases indicating possession or existence
New Auto-Interp
Negative Logits
fortunately
-0.74
ppings
-0.73
Timbers
-0.71
atters
-0.70
Reviewer
-0.67
ortunately
-0.66
Bucs
-0.65
Bundy
-0.64
clip
-0.63
Tacoma
-0.63
POSITIVE LOGITS
leeve
0.73
categor
0.73
âĢİ
0.72
influenced
0.68
accounted
0.67
igious
0.67
exempt
0.67
operated
0.65
composed
0.65
driven
0.64
Activations Density 0.075%