INDEX
Explanations
phrases that express uncertainty or perceptions about situations
New Auto-Interp
Negative Logits
anke
-0.19
ullo
-0.17
undisclosed
-0.16
otron
-0.15
roit
-0.15
ometr
-0.14
iland
-0.14
isman
-0.14
ugi
-0.14
جع
-0.14
POSITIVE LOGITS
obvious
0.25
straightforward
0.19
odd
0.18
ox
0.18
strange
0.18
like
0.16
radical
0.15
simple
0.15
ors
0.15
backward
0.15
Activations Density 0.069%