INDEX
Explanations
verbs related to taking action or making decisions
actions and decisions related to help and intervention
New Auto-Interp
Negative Logits
ove
-0.61
ann
-0.60
joining
-0.60
omach
-0.59
covered
-0.59
rafted
-0.59
ln
-0.58
ãĥ¼ãĥĨãĤ£
-0.58
rive
-0.57
GN
-0.57
POSITIVE LOGITS
anymore
0.78
ãĢĤ
0.75
¯
0.74
?:
0.69
defensively
0.67
loud
0.66
!,
0.66
,''
0.64
Afgh
0.64
ãħĭãħĭ
0.63
Activations Density 0.541%