INDEX
Explanations
numbers related to model names, years, or quantities
references to specific numerical values or identifiers
New Auto-Interp
Negative Logits
gets
-0.90
arent
-0.81
thodox
-0.77
cha
-0.71
folk
-0.70
grass
-0.69
board
-0.68
maid
-0.67
isman
-0.67
boarding
-0.67
POSITIVE LOGITS
80
0.88
889
0.78
een
0.77
68
0.77
uador
0.77
88
0.76
888
0.74
60
0.73
644
0.73
70
0.73
Activations Density 0.058%