INDEX
Explanations
phrases indicating a comparison or a choice between two options
phrases concerning uncertainty or conditional statements
New Auto-Interp
Negative Logits
Pony
-0.67
Room
-0.61
Rocket
-0.59
Symphony
-0.59
Chau
-0.59
://
-0.56
Romeo
-0.56
Trophy
-0.56
Cardinal
-0.56
Tele
-0.56
POSITIVE LOGITS
Else
0.99
acles
0.96
else
0.80
acle
0.79
rogens
0.75
nam
0.73
odd
0.70
rame
0.69
rist
0.68
rha
0.67
Activations Density 0.022%