INDEX
Explanations
suggestive and indicative phrases that convey insights or conclusions drawn from data or observations
New Auto-Interp
Negative Logits
loop
-0.50
base
-0.49
pat
-0.48
base
-0.47
/
-0.45
pase
-0.44
BASE
-0.44
arat
-0.44
-0.44
pa
-0.43
POSITIVE LOGITS
indicates
1.00
indicates
0.95
Indicates
0.93
demuestra
0.88
suggests
0.88
demonstrates
0.87
reflects
0.87
indicate
0.87
Indicates
0.86
testifies
0.86
Activations Density 0.502%