INDEX
Explanations
learning beginner instructor
New Auto-Interp
Negative Logits
domain
0.42
interrogation
0.41
_
0.40
export
0.39
entendeu
0.39
metadata
0.38
思想
0.38
alertness
0.38
subpoena
0.37
subordinates
0.37
POSITIVE LOGITS
Beginners
0.89
Beginner
0.87
beginners
0.86
beginner
0.83
instructors
0.83
instructor
0.81
Instructor
0.80
प्रशिक्
0.68
课程
0.68
Instructor
0.67
Activations Density 0.205%