INDEX
Explanations
declare war, ban, close, approve, raise
New Auto-Interp
Negative Logits
desiderio
0.69
consultez
0.68
rarement
0.67
selezione
0.67
vaak
0.66
drowsiness
0.66
meestal
0.65
ప్రత్యర్థి
0.65
饮食
0.64
biasanya
0.64
POSITIVE LOGITS
engineer
0.75
formally
0.67
belated
0.66
dramatically
0.65
unilaterally
0.65
औपचारिक
0.63
abruptly
0.63
ultimately
0.62
bowed
0.62
reversed
0.61
Activations Density 0.049%