INDEX
Explanations
words and phrases related to disclosure or uncovering hidden truths
New Auto-Interp
Negative Logits
U
-0.68
-0.68
colspan
-0.67
läufe
-0.66
ymce
-0.64
or
-0.62
ecin
-0.60
ühungen
-0.60
aroni
-0.59
antMatchers
-0.59
POSITIVE LOGITS
Reveal
2.30
reveal
2.18
revealed
2.10
reveals
2.07
reveal
2.02
Reveals
1.98
Reveal
1.97
Revealed
1.95
revealing
1.95
revealed
1.89
Activations Density 0.068%