INDEX
Explanations
sentences discussing issues related to truth, credibility, and the gap between reality and perception
New Auto-Interp
Negative Logits
cair
-0.79
indal
-0.77
artney
-0.73
asus
-0.73
asso
-0.72
edo
-0.67
rosse
-0.67
oyal
-0.67
ucky
-0.65
incinn
-0.65
POSITIVE LOGITS
istically
0.91
ignment
0.86
psons
0.79
reality
0.79
reality
0.74
fulness
0.71
Wiz
0.68
distortion
0.68
Reborn
0.65
isation
0.65
Activations Density 10.508%