INDEX
Explanations
phrases related to breaking laws or rules
instances of the word "breaking" in various contexts
New Auto-Interp
Negative Logits
imental
-0.72
ulp
-0.69
uther
-0.68
gel
-0.67
oft
-0.66
minist
-0.65
apixel
-0.65
oka
-0.65
ateur
-0.64
affer
-0.63
POSITIVE LOGITS
neck
0.92
breakers
0.89
breaks
0.87
break
0.83
break
0.80
necks
0.79
staff
0.79
broke
0.76
breaking
0.76
breaks
0.75
Activations Density 0.026%