INDEX
Explanations
words related to causing or triggering a specific outcome or state
terms associated with the concept of induction or causing an effect
New Auto-Interp
Negative Logits
achu
-0.70
apest
-0.69
asketball
-0.69
iffin
-0.67
lain
-0.65
=-=-=-=-=-=-=-=-
-0.65
roud
-0.65
phan
-0.64
andon
-0.64
mis
-0.64
POSITIVE LOGITS
induced
1.14
induce
1.00
inducing
0.94
induction
0.92
induces
0.87
induced
0.87
analges
0.85
uced
0.85
uces
0.84
untarily
0.84
Activations Density 0.012%