INDEX
Explanations
words related to changes or modifications
terms related to changes or modifications in state or condition
New Auto-Interp
Negative Logits
hang
-0.86
abet
-0.84
yer
-0.78
lay
-0.77
ships
-0.75
went
-0.74
where
-0.74
abilia
-0.74
goal
-0.73
MAT
-0.73
POSITIVE LOGITS
versions
0.95
subur
0.91
iterranean
0.88
ieval
0.86
aback
0.81
satell
0.79
iated
0.78
hijacked
0.77
pione
0.76
nesday
0.75
Activations Density 0.078%