INDEX
Explanations
numerical values in a specific format
percentages and numerical data
New Auto-Interp
Negative Logits
raints
-0.73
andestine
-0.72
tremend
-0.70
uesday
-0.68
achus
-0.66
Ens
-0.62
ihad
-0.62
wholes
-0.61
caring
-0.61
iosyn
-0.61
POSITIVE LOGITS
ãĥ¼ãĥ³
0.72
bis
0.69
İ
0.69
394
0.69
245
0.68
attRot
0.67
df
0.67
449
0.67
195
0.66
595
0.65
Activations Density 0.285%