INDEX
Explanations
phrases related to commands or instructions
references to personal experiences or possessions
New Auto-Interp
Negative Logits
Rhodes
-0.73
Hercules
-0.70
Kissinger
-0.65
Gork
-0.61
Sussex
-0.60
ktop
-0.59
Nelson
-0.58
Lange
-0.58
Chavez
-0.57
Wellington
-0.57
POSITIVE LOGITS
âĢ
2.56
âĢ
2.13
ãĢ
1.72
¨
1.56
âϦ
1.52
âľ
1.52
âĶ
1.52
âĢł
1.49
âĸł
1.49
âĹ
1.48
Activations Density 1.461%