INDEX
Explanations
instances of the word "break" and its various forms
New Auto-Interp
Negative Logits
eid
-0.16
amet
-0.15
opoulos
-0.15
phy
-0.15
irth
-0.15
دÙĪ
-0.15
aura
-0.15
ikan
-0.14
imos
-0.14
rr
-0.14
POSITIVE LOGITS
fast
0.36
age
0.34
away
0.33
neck
0.32
water
0.28
FAST
0.28
ages
0.27
dance
0.25
through
0.24
aways
0.24
Activations Density 0.040%