INDEX
Explanations
references to the word "Lie" or variations of it
occurrences of the word "lie."
New Auto-Interp
Negative Logits
arthy
-0.77
oun
-0.73
smart
-0.72
iaries
-0.71
irlf
-0.70
atform
-0.67
uploads
-0.66
iles
-0.66
detail
-0.66
icable
-0.65
POSITIVE LOGITS
utenant
1.47
Lie
1.00
ge
0.94
uten
0.91
berman
0.88
ÃŁ
0.86
detector
0.84
pard
0.83
Lie
0.83
yer
0.81
Activations Density 0.023%