INDEX
Explanations
instances indicating categories or classifications
New Auto-Interp
Negative Logits
neceff
-0.94
Efq
-0.93
Anſ
-0.92
purpoſe
-0.90
ſtate
-0.85
tvguidetime
-0.85
houſe
-0.84
leſs
-0.84
pleaſure
-0.84
ſch
-0.82
POSITIVE LOGITS
#
0.52
<eos>
0.46
OLVED
0.45
go
0.45
onResume
0.45
#
0.44
pare
0.42
urlopen
0.41
restore
0.40
0.40
Activations Density 0.019%