INDEX
Explanations
references to breaks or interruptions in various contexts
instances of the word "break."
New Auto-Interp
Negative Logits
itatively
-0.77
ifice
-0.76
Reviewer
-0.70
ãĥĺãĥ©
-0.68
Cosponsors
-0.65
IFIED
-0.65
obser
-0.62
oka
-0.60
alez
-0.59
portrayal
-0.58
POSITIVE LOGITS
away
1.19
neck
1.10
fast
1.01
points
0.93
water
0.91
down
0.89
downs
0.89
aways
0.88
robe
0.87
breakers
0.83
Activations Density 0.036%