INDEX
Explanations
specific word choice or phrasing
New Auto-Interp
Negative Logits
posizione
0.53
maggiore
0.52
abortion
0.51
esimo
0.50
tradizione
0.49
sito
0.48
espécies
0.48
estructura
0.47
abort
0.47
độ
0.47
POSITIVE LOGITS
Result
0.54
Nation
0.51
Simulation
0.49
Anthrop
0.47
Product
0.46
Refresh
0.46
Sampling
0.46
Middleware
0.46
Mel
0.45
Loved
0.44
Activations Density 0.001%