INDEX
Explanations
sentences with medical or violent content
statements involving incidents and outcomes, particularly injuries or damages
New Auto-Interp
Negative Logits
anting
-0.85
userc
-0.79
isphere
-0.77
itaire
-0.75
itating
-0.70
ogl
-0.69
forwarding
-0.69
ensibly
-0.69
anted
-0.68
intermediate
-0.67
POSITIVE LOGITS
However
1.06
Additionally
1.05
Also
0.97
Photograph
0.95
Meanwhile
0.94
Nevertheless
0.92
Alternatively
0.86
Nonetheless
0.86
Furthermore
0.85
Else
0.85
Activations Density 0.563%