INDEX
Explanations
phrases related to exploring or investigating options or situations
various forms of the word "explain."
New Auto-Interp
Negative Logits
Cups
-0.68
Helsinki
-0.67
Mens
-0.67
rier
-0.67
li
-0.66
ledger
-0.66
cholesterol
-0.64
lam
-0.64
Berk
-0.63
Dane
-0.63
POSITIVE LOGITS
expl
3.67
Expl
3.17
Expl
2.07
unexpl
1.72
expl
1.46
impl
1.42
Exploration
1.29
decomp
1.28
Explosive
1.24
Explos
1.18
Activations Density 0.009%