INDEX
Explanations
phrases emphasizing the significance of roles or functions in various contexts
New Auto-Interp
Negative Logits
arus
-0.68
False
-0.67
agle
-0.64
incinn
-0.63
atri
-0.63
Lev
-0.62
bones
-0.60
jaws
-0.59
fts
-0.59
ilings
-0.58
POSITIVE LOGITS
role
0.83
helping
0.81
playing
0.79
assisting
0.72
facilitating
0.72
mitigating
0.72
roles
0.69
supporting
0.68
influencing
0.67
toward
0.67
Activations Density 0.022%