INDEX
Negative Logits
用
-0.09
كت
-0.08
Writer
-0.08
純
-0.08
Writers
-0.08
politely
-0.07
Advisor
-0.07
Disclaimer
-0.07
께
-0.07
poser
-0.07
POSITIVE LOGITS
corrobor
0.13
testimon
0.12
beobachten
0.11
beob
0.10
surveys
0.09
testimony
0.09
calor
0.09
_tests
0.09
evidence
0.09
观察
0.09
Activations Density 0.034%