INDEX
Explanations
numerical values related to percentages
numerical data points and percentages
New Auto-Interp
Negative Logits
gets
-0.77
giving
-0.73
holders
-0.71
bub
-0.71
tub
-0.69
shots
-0.67
seeing
-0.67
iky
-0.67
nationalists
-0.66
stract
-0.65
POSITIVE LOGITS
90
0.92
65
0.90
97
0.88
80
0.86
999
0.86
95
0.84
96
0.83
98
0.82
99
0.82
765
0.81
Activations Density 0.023%