INDEX
Explanations
instances of words related to taking specific actions or measures
references to actions or measures taken
New Auto-Interp
Negative Logits
inately
-0.84
ILLE
-0.69
olls
-0.69
orf
-0.68
bid
-0.67
gdala
-0.66
raid
-0.65
stown
-0.65
ews
-0.64
inite
-0.64
POSITIVE LOGITS
iblings
1.02
steps
0.97
hooting
0.94
Steps
0.87
hops
0.86
hent
0.81
steps
0.78
forward
0.78
isters
0.76
toward
0.76
Activations Density 0.036%