INDEX
Explanations
sentences that express thoughts, beliefs, or opinions
New Auto-Interp
Negative Logits
apparently
-0.84
seemingly
-0.83
Apparently
-0.77
aparentemente
-0.75
apparently
-0.71
Seem
-0.69
Apparently
-0.68
schein
-0.66
supposedly
-0.64
-0.63
POSITIVE LOGITS
overall
0.55
probably
0.49
yüzden
0.48
Probably
0.45
partly
0.45
もっと
0.44
personally
0.43
mostly
0.42
mainly
0.42
ultimately
0.42
Activations Density 0.244%