INDEX
Explanations
numerical data, such as statistics or percentages
quantitative or statistical data presented in sentences
New Auto-Interp
Negative Logits
mimic
-0.72
consistency
-0.69
anus
-0.68
performer
-0.67
pudding
-0.67
roam
-0.67
anyl
-0.66
sidel
-0.66
attacker
-0.66
dancing
-0.65
POSITIVE LOGITS
00
1.17
5
1.08
25
1.04
75
1.03
04
1.02
99
1.02
06
1.01
07
1.01
7
1.00
05
1.00
Activations Density 0.109%