INDEX
Explanations
phrases indicating the act of uncovering or disclosing secrets or truths
New Auto-Interp
Negative Logits
-0.78
ftagPool
-0.76
U
-0.68
-0.66
verband
-0.64
colspan
-0.64
o
-0.63
I
-0.60
D
-0.60
läng
-0.59
POSITIVE LOGITS
Reveal
1.65
Reveals
1.43
Reveal
1.42
reveal
1.41
Reve
1.37
reveals
1.36
reveal
1.34
disclosure
1.31
Revealed
1.30
disclose
1.29
Activations Density 0.083%