INDEX
Explanations
words related to provocation or triggering actions
words related to evidence or proof
New Auto-Interp
Negative Logits
morph
-0.79
Halls
-0.76
Lay
-0.72
Reloaded
-0.68
cryst
-0.68
Ago
-0.66
urgy
-0.63
ecycle
-0.62
proficiency
-0.61
Lama
-0.61
POSITIVE LOGITS
prov
0.90
atform
0.87
fare
0.87
iso
0.84
irus
0.84
idential
0.76
inelli
0.76
ably
0.74
rop
0.74
itely
0.73
Activations Density 0.049%