INDEX
Explanations
statements indicating capability or action in the context of various scenarios
occurrences of the word "able."
New Auto-Interp
Negative Logits
Parents
-0.69
Concern
-0.64
Sins
-0.64
Gone
-0.63
Nose
-0.60
Caval
-0.59
Yose
-0.59
Bots
-0.59
alien
-0.58
Generation
-0.58
POSITIVE LOGITS
bodied
0.88
't
0.86
ioned
0.85
reys
0.83
Reviewer
0.80
untarily
0.78
successfully
0.78
uate
0.72
uced
0.72
ittees
0.72
Activations Density 0.033%