INDEX
Explanations
the word "way" with a high activation value
New Auto-Interp
Negative Logits
usters
-1.02
uster
-0.88
livest
-0.84
ĸļ
-0.79
omore
-0.73
oppable
-0.71
asts
-0.69
oubted
-0.69
inately
-0.68
lict
-0.67
POSITIVE LOGITS
fare
1.27
finding
1.21
ward
1.19
forward
1.10
point
1.06
finder
0.90
points
0.89
forward
0.88
bill
0.81
station
0.77
Activations Density 1.161%