INDEX
Explanations
terms related to falsehoods or deceiving statements
occurrences of the word "lies."
New Auto-Interp
Negative Logits
zoom
-0.66
Chapel
-0.63
crisp
-0.61
quint
-0.61
better
-0.58
curb
-0.58
speed
-0.57
dash
-0.56
tattoo
-0.56
Singer
-0.56
POSITIVE LOGITS
lies
5.06
lie
2.15
lied
1.86
lying
1.84
liest
1.59
pins
1.51
lier
1.43
liness
1.38
mares
1.30
li
1.26
Activations Density 0.008%