INDEX
Explanations
honestly, frankly, truthfully
New Auto-Interp
Negative Logits
geom
0.74
ご注意
0.68
abound
0.67
秫
0.66
students
0.66
beware
0.65
Axes
0.63
よろしく
0.62
破解
0.61
conical
0.61
POSITIVE LOGITS
truthfully
1.41
honestly
1.28
Truth
1.27
Honestly
1.22
truth
1.22
Truth
1.15
Honestly
1.14
truth
1.14
admitting
1.12
Frankly
1.09
Activations Density 0.188%