INDEX
Explanations
hidden origins or true nature
New Auto-Interp
Negative Logits
seem
0.80
据说
0.77
にならない
0.76
íž
0.76
seemingly
0.76
seems
0.75
seems
0.75
seemed
0.72
rappelle
0.72
fails
0.72
POSITIVE LOGITS
hidden
1.25
genuine
1.18
ulterior
1.13
fraude
1.11
hoax
1.10
involvement
1.10
ocult
1.09
hidden
1.09
origin
1.09
origen
1.08
Activations Density 0.252%