INDEX
Explanations
phrases related to accidental actions
occurrences of the word "accidentally" or similar terms indicating unintentional actions
New Auto-Interp
Negative Logits
ciating
-0.79
thood
-0.73
tier
-0.72
Options
-0.70
nosis
-0.68
rates
-0.67
levels
-0.67
riors
-0.67
esian
-0.66
layer
-0.66
POSITIVE LOGITS
accidentally
0.87
inadvertently
0.86
unintentionally
0.85
misc
0.82
unknow
0.82
inadvert
0.79
identally
0.76
idental
0.76
stumbled
0.74
unwittingly
0.72
Activations Density 0.028%