INDEX
Explanations
expressing possibility or likelihood
New Auto-Interp
Negative Logits
not
0.53
either
0.48
entweder
0.46
either
0.42
не
0.39
IS
0.38
Either
0.38
remains
0.37
can
0.37
ata
0.37
POSITIVE LOGITS
realistically
0.61
realista
0.57
Realistic
0.57
reasonably
0.55
Realistic
0.54
realistic
0.54
reasonably
0.54
plaus
0.53
realism
0.52
realist
0.52
Activations Density 0.021%