INDEX
Explanations
phrases indicating themes of reality and truth
New Auto-Interp
Negative Logits
éĢł
-0.15
suddenly
-0.15
Winn
-0.15
rous
-0.14
elp
-0.14
orgia
-0.14
obe
-0.14
attention
-0.14
gratuite
-0.14
stanov
-0.13
POSITIVE LOGITS
Demp
0.17
globals
0.15
playbook
0.15
.metro
0.15
.pitch
0.14
/pro
0.14
ivery
0.13
anya
0.13
ITU
0.13
zin
0.13
Activations Density 0.085%