INDEX
Explanations
proper nouns
phrases indicating actions or commands
New Auto-Interp
Negative Logits
Entered
-0.76
professional
-0.49
Arbit
-0.48
>[
-0.47
Hogan
-0.46
Student
-0.46
-0.45
Judicial
-0.45
>:
-0.45
iliar
-0.44
POSITIVE LOGITS
wings
0.63
mbuds
0.60
è¦ļéĨĴ
0.60
panic
0.59
Ń·
0.58
insky
0.57
¶æ
0.56
fw
0.56
ansk
0.54
itself
0.54
Activations Density 1.222%