INDEX
Explanations
the name "Gail"
references to legal or bodily harm
New Auto-Interp
Negative Logits
perature
-0.62
aido
-0.61
isolate
-0.60
WHO
-0.59
CPS
-0.57
sensitive
-0.56
basic
-0.56
ngth
-0.56
neutral
-0.56
opausal
-0.56
POSITIVE LOGITS
ail
1.17
ails
1.02
runner
0.91
Runner
0.83
abe
0.80
ily
0.79
hander
0.78
usterity
0.78
ouri
0.72
runners
0.71
Activations Density 0.005%