INDEX
Explanations
locations or date references
specific references to locations or geographic features
New Auto-Interp
Negative Logits
thood
-0.69
omon
-0.67
aea
-0.65
aukee
-0.64
isite
-0.62
ento
-0.62
olars
-0.61
elist
-0.60
agne
-0.60
acia
-0.59
POSITIVE LOGITS
insk
0.59
Worst
0.54
Thing
0.53
ortunate
0.52
parts
0.52
Dialog
0.51
Dex
0.50
best
0.50
md
0.49
ines
0.49
Activations Density 0.200%