INDEX
Explanations
occurrences of the word "break" in various forms
New Auto-Interp
Negative Logits
irth
-0.20
idi
-0.16
ensen
-0.15
amus
-0.15
id
-0.15
raz
-0.14
idl
-0.14
Sdk
-0.14
IDI
-0.14
uz
-0.14
POSITIVE LOGITS
away
0.25
ranks
0.25
neck
0.24
barriers
0.23
records
0.22
apart
0.21
down
0.21
away
0.20
-news
0.20
barrier
0.20
Activations Density 0.016%