INDEX
Explanations
instances of the word "out"
New Auto-Interp
Negative Logits
oleon
-0.72
tyr
-0.65
Municip
-0.64
issance
-0.63
Trailer
-0.61
Corps
-0.61
Tea
-0.60
Lect
-0.57
Conversation
-0.57
Tao
-0.56
POSITIVE LOGITS
fitted
1.05
number
1.03
range
1.01
scoring
1.00
stretched
0.96
liest
0.92
fitting
0.92
ranking
0.91
doing
0.89
crop
0.89
Activations Density 0.029%