INDEX
Explanations
words related to favorability or preference
New Auto-Interp
Negative Logits
ftet
-0.83
nought
-0.68
足
-0.64
waypoint
-0.62
Giao
-0.60
cos
-0.59
cop
-0.59
Abitanti
-0.59
HEND
-0.59
scrolling
-0.58
POSITIVE LOGITS
favours
1.11
favour
1.09
neighbourhoods
1.05
sceptre
0.95
Favor
0.95
fibres
0.95
honours
0.95
Favor
0.95
honoured
0.94
disambiguazione
0.94
Activations Density 0.278%