INDEX
Explanations
instances of accidental actions or incidents
New Auto-Interp
Negative Logits
UVWXYZ
-0.50
tament
-0.48
règne
-0.48
érica
-0.47
chêne
-0.47
])]
-0.47
radikal
-0.47
Emanuel
-0.46
Moritz
-0.45
-0.45
POSITIVE LOGITS
tripped
1.06
accidentally
0.95
tripping
0.94
bumped
0.93
accident
0.84
fell
0.83
spilled
0.83
stepped
0.82
accident
0.81
bumping
0.80
Activations Density 0.432%