INDEX
Explanations
instances of the word "break" and its variations
New Auto-Interp
Negative Logits
eid
-0.18
BER
-0.17
anges
-0.16
jar
-0.16
irth
-0.15
aut
-0.15
bers
-0.15
ói
-0.15
Ant
-0.14
imos
-0.14
POSITIVE LOGITS
away
0.31
fast
0.30
neck
0.27
FAST
0.26
through
0.26
aways
0.25
ranks
0.22
bulk
0.22
Away
0.21
-even
0.21
Activations Density 0.014%