INDEX
Explanations
use, like, the, thoughts, casts, for, to, far, give
New Auto-Interp
Negative Logits
Ignoring
0.74
Ref
0.74
Vi
0.72
FromFile
0.72
ستي
0.71
Received
0.70
Rece
0.70
refs
0.69
Cheat
0.69
Done
0.69
POSITIVE LOGITS
rewards
0.82
appreciates
0.75
hémorro
0.73
замети
0.72
carro
0.72
расчета
0.71
sprites
0.71
couche
0.70
caratteristiche
0.69
przedstaw
0.69
Activations Density 0.000%