INDEX
Negative Logits
allegiance
0.85
disbelief
0.83
Thoughts
0.75
thoughts
0.73
appreciation
0.69
pensamiento
0.68
instinct
0.68
understanding
0.67
probably
0.66
curiosity
0.66
POSITIVE LOGITS
know
0.95
Know
0.91
Know
0.91
know
0.89
KNOW
0.79
knows
0.77
知道
0.73
anses
0.72
著名的
0.71
Known
0.71
Activations Density 0.073%