INDEX
Explanations
occurrences of the word "new"
New Auto-Interp
Negative Logits
..........
-0.71
Brach
-0.70
dod
-0.60
mishand
-0.59
peanuts
-0.59
ILA
-0.59
sidx
-0.59
ONSORED
-0.58
Cele
-0.57
BILITY
-0.57
POSITIVE LOGITS
bies
1.38
bie
1.29
foundland
1.24
found
1.11
Zealand
1.06
tons
1.05
arrivals
0.98
castle
0.94
sc
0.92
York
0.92
Activations Density 0.082%