INDEX
Explanations
comparisons involving numerical multiples or increases
instances of a specific token or concept repeated multiple times
New Auto-Interp
Negative Logits
PLA
-0.82
RAW
-0.75
AMA
-0.72
Leilan
-0.70
ETF
-0.68
GGGG
-0.68
ARA
-0.67
DD
-0.64
ND
-0.64
////////
-0.63
POSITIVE LOGITS
usual
0.99
average
0.90
amount
0.87
oples
0.84
fastest
0.80
same
0.79
extent
0.77
busiest
0.76
thickness
0.75
country
0.74
Activations Density 0.098%